Abstract:Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at <a class="link-external link-https" href="https://github.com/bang123-box/CFE" rel="external noopener nofollow">this https URL</a>.

Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Sequential Deformation for Accurate Scene Text Detection

Enhancing Scene Text Recognition by Strengthening Attention Alignment

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Text Font Correction and Alignment Method for Scene Text Recognition

Scene Text Recognition via Transformer

Scene text recognition based on two-stage attention and multi-branch feature fusion module

Deformable Mixed Domain Attention Network for Scene Text Recognition

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition

An End-to-End Scene Text Detector with Dynamic Attention.

Scene Text Recognition with Deeper Convolutional Neural Networks.

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

A holistic representation guided attention network for scene text recognition

FDTA: Fully Convolutional Scene Text Detection with Text Attention.

Scene Text Recognition Via Dual-path Network with Shape-driven Attention Alignment.

Robust Scene Text Recognition Through Adaptive Image Enhancement

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Text recognition in natural scenes based on deep learning

FACLSTM:ConvLSTM with Focused Attention for Scene Text Recognition

FACLSTM: ConvLSTM with focused attention for scene text recognition

Deformable scene text detection using harmonic features and modified pixel aggregation network