ConvPatchTrans: A script identification network with global and local semantics deeply integrated
Ke Yang,Jizheng Yi,Aibin Chen,Jiaqi Liu,Wenjie Chen,Ze Jin
DOI: https://doi.org/10.1016/j.engappai.2022.104916
IF: 8
2022-08-01
Engineering Applications of Artificial Intelligence
Abstract:Optical Character Recognition (OCR) system serves the need of reading text from images. Script identification that identifies the language of the text in the image is an important part of OCR technology and an indispensable role in the stability and accuracy of the OCR system. The most challenging for script identification is the interference caused by similarities between texts in different languages. In this paper, a two-branch network named ConvPatchTrans is designed to process global and local semantic features separately, focusing on the text and each word in a picture. The ConvPatchTrans extracts feature from different stages of the Visual Geometry Group network (VGGNet) as global and local semantics. For the global branch, the linear classifier is recommended. For the local branch, text image data is converted to image sequence data. Then, multi-layers convolution-enhanced Transformer (MCET) is proposed to bring about the deep fusion of sequence. Finally, the global and local branches are fused by an adaptive weighted fusion method to get the best result. In order to verify the effectiveness of our proposed method, four public script identification datasets are used for comparative experiments. Our method has obtained the highest values among currently published methods on the CVSI2015 and MLE2E datasets, which are 98.90% and 97.50%, respectively. At the same time, satisfactory results are also obtained on the other two datasets.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary