SVTR-SRNet: A Deep Learning Model for Scene Text Recognition via SVTR Framework and Spatial Reduction Mechanism
Ming Zhao,Yalong Li,Chaolin Zhang,Quan Du,Shenglung Peng
DOI: https://doi.org/10.3390/electronics13234756
IF: 2.9
2024-12-03
Electronics
Abstract:Most deep learning models suffer from the problems of large computational complexity and insufficient feature extraction. To achieve a dynamic balance and tradeoff between computational complexity and performance, an enhanced SVTR-based scene text recognition model (SVTR-SRNet) was designed in this paper. In the SVTR-SRNet, we first created a bottom-up jump connection network that increases the number of information transfer pathways between the top and bottom features and improves the accuracy of information extraction. Second, we modified the attention mechanism by adding a new intermediate parameter called SR(Q) (Spatial Reduction (Q)), which finds a suitable compromise between the representational power and computing efficiency. In contrast to the conventional attention mechanism, the novel technique maintains the ability to model the global context while also enhancing efficiency. Ultimately, we developed a novel adaptive hybrid loss function to mitigate the shortcomings of a singular loss function's inadequate generalization capacity and enhance the model's resilience in handling a variety of challenging scenarios. Our technique outperforms existing standard models in terms of recognition performance on both the English and Chinese datasets, which deal with a high number of similar characters. As the model possesses great efficiency and outstanding cross-linguistic adaptability, it has a wide range of practical applications.
engineering, electrical & electronic,computer science, information systems,physics, applied