Single Visual Model Based on Transformer For Digital Instrument Reading Recognition
Xiang Li,Changchang Zeng,Yong Yao,Sen Zhang,Haiding Zhang,Suixian Yang
DOI: https://doi.org/10.1088/1361-6501/ad9d64
IF: 2.398
2024-12-12
Measurement Science and Technology
Abstract:Digital instrument reading recognition (DIRR) technology is crucial for industrial digital transformation and the advancement of industrialisation. However, digital instruments differ in character fonts, styles, spacing, and aspect ratios, as well as the scarcity of data pose significant challenges to current recognition technologies. To address these challenges, this study proposed a novel single visual model based on Transformer for digital instrument recognition (SVDIR). The SVDIR model primarily comprised a scaled cosine attention mechanism (SC-attention) and a local Transformer block. First, the SC-attention was designed to calculate the cosine similarity of two image patches. It rendered the attention calculation independent of the input amplitude and produced milder attention weights to alleviate overconcentration issues. Second, a local Transformer block module was proposed for extracting the internal stroke features and dependencies between character components. Fine-grained characteristic features were obtained using this method. In addition, a post-norm structure was introduced into the Local Transformer Block module to reduce the accumulation of activation values following the deepening of the network. Finally, experimental results demonstrated the effectiveness and superiority of the proposed model on two digital instrument datasets.
engineering, multidisciplinary,instruments & instrumentation