Dual Relation Network for Scene Text Recognition

Ming Li,Bin Fu,Han Chen,Junjun He,Yu Qiao
DOI: https://doi.org/10.1109/tmm.2022.3171108
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Local visual and long-range contextual features yield two complementary cues for human reading text in natural scene. Existing scene text recognition methods mainly extract local features at a low level and then model long-range dependencies at a high level, this sequential pipeline may be sub-optimal to construct complete and effective representation. Except for high-level features, long-range contextual relation is of importance in low-level features as well since it can help separate different characters based on the intervals between characters and thus enhance the character features. To address this issue, we develop a dual relation module to extract complementary features in a parallel manner for scene text recognition, which consists of a local visual branch and a long-range contextual branch. The local visual branch employs a topological-aware operation to model intra-character characteristic and extract discriminative features of different characters. Meanwhile, the long-range contextual branch utilizes a simple but effective strategy to incorporate inter-character relations into feature maps. Our dual relation module is a plug-and-play block which can be easily incorporated into modern deep architectures. Experimental results demonstrate that our methods achieved top performance on several standard benchmarks. Code and models will become publicly available in the future.
What problem does this paper attempt to address?