An End-to-End OCR Text Re-organization Sequence Learning for Rich-Text Detail Image Comprehension

Liangcheng Li,Feiyu Gao,Jiajun Bu,Yongpan Wang,Zhi Yu,Qi Zheng
DOI: https://doi.org/10.1007/978-3-030-58595-2_6
2020-01-01
Abstract:Nowadays the description of detailed images helps users know more about the commodities. With the help of OCR technology, the description text can be detected and recognized as auxiliary information to remove the visually impaired users’ comprehension barriers. However, for lack of proper logical structure among these OCR text blocks, it is challenging to comprehend the detailed images accurately. To tackle the above problems, we propose a novel end-to-end OCR text reorganizing model. Specifically, we create a Graph Neural Network with an attention map to encode the text blocks with visual layout features, with which an attention-based sequence decoder inspired by the Pointer Network and a Sinkhorn global optimization will reorder the OCR text into a proper sequence. Experimental results illustrate that our model outperforms the other baselines, and the real experiment of the blind users’ experience shows that our model improves their comprehension.
What problem does this paper attempt to address?