Mlts: A Multi-Language Scene Text Spotter
Yu Zhou,Shancheng Fang,Hongtao Xie,Zheng-Jun Zha,Yongdong Zhang
DOI: https://doi.org/10.1109/ICME.2019.00036
2019-01-01
Abstract:Scene text detection and recognition are popular research topics in computer vision due to its various applications such as autonomous driving, blind assistance and text translation. However, many methods currently can only detect or recognize the text of one language. In scene text images, we can often see text in multi-language appearing on the same image. However, there is no valid model for multi-language text spotting. In this paper, an end-to-end method for multi-language scene text detection, recognition and script identification is proposed. The method, called MLTS, is an abbreviation of a Multi-Language Scene Text Spotter. By designing a special backbone for text and combining two different kinds of attention. MLTS achieves state-of-the-art performance for both joint localization and script identification in natural images and in cropped word script identification, the precision, recall and F-measure are 0.7145, 0.6583 and 0.6852 respectively, while the corresponding values of the best existing methods are 0.5759, 0.6207, 0.5974 respectively. Additionally, our MLTS achieves comparable performance on ICDAR2013 and ICDAR2015, which proves the effectiveness of the model.