A Light-Weight Model with Granularity Feature Representation for Fine-Grained Visual Classification

Qiumei Zheng,Tianqi Peng,Ding Huang,Fenghua Wang,Nengxiang Xu
DOI: https://doi.org/10.1504/ijcse.2024.138426
2024-01-01
International Journal of Computational Science and Engineering
Abstract:Fine-grained image recognition can provide a more precise recognition technique for industrial production and applications. However, since it is difficult to capture comprehensive features and discriminative regions in convolutional neural networks (CNN), this ability is largely limited. With a lightweight orientation, we here use the advantage of Transformer in capturing global features by combining the technically mature CNN, and propose a lightweight model MV-GFR based on MobileViT. Further, we also propose three lightweight modules to help the network capture more subtle differences. First, we used the training module to provide the network with richer granularity information while ensuring its global integrity. Second, we used the feature part mask module in combining the diversity of CNN and the saliency of the transformer. Finally, we used the feature fusion module to integrate features of different levels and generate a complement between the global and local features. We then demonstrated the effectiveness of this scheme through experiments on three commonly used datasets.
What problem does this paper attempt to address?