Fine-Grained Image Classification Model Based on Improved Transformer

Tian Zhansheng,Liu Libo
DOI: https://doi.org/10.3788/lop220453
2023-01-01
Laser & Optoelectronics Progress
Abstract:For the characteristics of subtle differences between various subclasses and large differences between same subclasses in a fine-grained image, the existing neural network models have some challenges in processing, including insufficient feature extraction ability, redundant feature representation, and weak inductive bias ability; therefore, an enhanced Transformer image classification model is proposed in this study. First, an external attention is employed to replace the self-attention in the original Transformer model, and the model's feature extraction ability is enhanced by capturing the correlation between samples. Second, the feature selection module is introduced to filter differentiating features and eliminate redundant information to improve feature representation capability. Finally, the multivariate loss is added to improve the model's ability to induce bias, differentiate various subclasses, and fuse the same subclasses. The experimental findings demonstrate that the proposed method's classification accuracy on three fine-grained image datasets of CUB-200-2011, Stanford Dogs, and Stanford Cars reaches 89. 8%, 90. 2%, and 94. 7%, respectively; it is better than that of numerous mainstream fine-grained image classification approaches.
What problem does this paper attempt to address?