A Decoupled Language Model Based on Contrastive Attention Mechanism for Scene Text Recognition

Jiao Dai,Xi Wang,Jizhong Han,Junwei Zhou
DOI: https://doi.org/10.1145/3594315.3594356
2023-03-17
Abstract:In recent years, approaches for scene text recognition based on the attention mechanism have achieved amazing success. However, the majority of attention mechanism approaches are coupled, and the majority of ways adhere to the concept of locating the most pertinent image regions. In this paper, we present a language model with a contrastive attention mechanism that is detached from the standard encoder-decoder architecture. First, preliminary text recognition results are obtained based on the encoder-decoder framework; second, we perform the two steps of text prediction in the language model and the calculation of the attention weight of the text to the image, and we not only find the most relevant image area, but also look for the least relevant image area; and finally, the loss function is used to make the model pay less attention to irrelevant areas and more attention to relevant areas. On seven datasets, we evaluated the performance of our model and found that it performed exceptionally well, particularly on the IC13, SVT, and SVTP datasets.
Computer Science
What problem does this paper attempt to address?