Dual Guidance Enabled Fuzzy Inference for Enhanced Fine-Grained Recognition
Qiupu Chen,Feng He,Gang Wang,Xiao Bai,Long Cheng,Xin Ning
DOI: https://doi.org/10.1109/tfuzz.2024.3427654
IF: 12.253
2024-01-01
IEEE Transactions on Fuzzy Systems
Abstract:In the field of Fine-Grained Visual Recognition (FGVR), the ability to resolve minute and often subtle differences between highly similar object categories is paramount. The advent of Vision Transformers (ViTs) has marked a significant advancement in this domain, primarily due to their capacity to model the intricate interdependencies among object parts represented as image patches. However, their inherent singlescale processing limitation hampers their effectiveness in FGVR tasks. Furthermore, the challenge of uncertainty inherent in FGVR tasks remains unresolved, necessitating the development of methods that bolster the robustness of these models, particularly across varying scales of visual features. We introduce a new plugin module that can be seamlessly integrated into ViT, called Dual Guidance Enabled Fuzzy Inference (DGEFI), which combines fuzzy inference with dual guidance mechanisms. Dual guidance includes scale-aware guidance and probability guidance. The former strengthens the model's focus on salient scales, and the latter refines the distinction between similar categories by optimizing intra-class compactness and inter-class separability. Fuzzy inference enables the model to adaptively tweak the influence of distinct scales in the final decision-making phase, thereby enhancing the overall accuracy of recognition tasks. We demonstrate the versatility and efficacy of our DGEFI module by integrating it into several leading ViT backbones, including ViT, Swin, Mvitv2, and EVA-02. Empirical results exhibit exceptional performance gains, with the integration of DGEFI into EVA- 02 remarkable accuracy improvements, reaching 93.6% on the CUB-200-2011 dataset and 94.5 respectively improving over the state-of-the-art method 0.5% and 1.5%.