End-to-End Multilingual Text Recognition Based on Byte Modeling

Jiajia Wu,Kun Zhao,Zhengyan Yang,Bing Yin,Cong Liu,Lirong Dai
DOI: https://doi.org/10.1007/978-3-031-46311-2_11
2023-01-01
Abstract:Nowadays, multilingual text recognition is more and more widely used in computer vision. However, in practical applications, the independent modeling of each language cannot make full use of the information between different languages and consumes hardware resources very much, which makes the unified modeling of multiple languages very necessary. A natural approach to unified multilingual modeling is to combine modeling units (characters, subwords, or words) from all languages into a large vocabulary, and then use a sequence-to-sequence approach to modeling. However, this vocabulary is often very large making modeling difficult. In this paper, we propose a byte-based multilingual text recognition method, which makes the vocabulary size only 256, which effectively solves the problem of unified modeling. The experiments show that our method effectively utilizes the information between different languages and outperforms the baseline of independent modeling by a large margin.
What problem does this paper attempt to address?