Integrating Foreground–background Feature Distillation and Contrastive Feature Learning for Ultra-Fine-grained Visual Classification

Qiupu Chen,Lin Jiao,Fenmei Wang,Jianming Du,Haiyun Liu,Xue Wang,Rujing Wang
DOI: https://doi.org/10.1016/j.patcog.2024.110339
IF: 8
2024-01-01
Pattern Recognition
Abstract:In pattern recognition, ultra -fine-grained visual classification (ultra-FGVC) has emerged as a paramount challenge, focusing on sub -category distinction within fine-grained objects. The near -indistinguishable similarities among such objects, combined with the dearth of sample data, intensify this challenge. In response, our FDCLDA method is introduced, which integrates Foreground-background feature Distillation (FD) and Contrastive feature Learning (CL) with Dual Augmentation (DA). This method uses two different data augmentation techniques, standard and auxiliary augmentation, to enhance model performance and generalization ability. The FD module reduces superfluous features and augments the contrast between the principal entity and its backdrop, while the CL focuses on creating unique data imprints by reducing intra-class resemblances and enhancing inter -class disparities. Integrating this method with different architectures, such as ResNet-50, Vision Transformer, and Swin-Transformer (Swin-T), significantly improves these backbone networks, especially when used with Swin-T, leading to promising results on eight popular datasets for ultra-FGVC tasks.1
What problem does this paper attempt to address?