MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
Yao Rao,Chaofeng Li,Feiran Xu,Ya Guo
DOI: https://doi.org/10.1007/s11694-024-02874-3
IF: 3.4
2024-09-28
Journal of Food Measurement & Characterization
Abstract:Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.
food science & technology