Abstract:The technique of extracting different distinguishing features by locating different part regions to achieve fine-grained visual classification (FGVC) has made significant improvements. Utilizing attention mechanisms for feature extraction has become one of the mainstream methods in computer vision, but these methods have certain limitations. They typically focus on the most discriminative regions and directly combine the features of these parts, neglecting other less prominent yet still discriminative regions. Additionally, these methods may not fully explore the intrinsic connections between higher-order and lower-order features to optimize model classification performance. By considering the potential relationships between different higher-order feature representations in the object image, we can enable the integrated higher-order features to contribute more significantly to the model's classification decision-making capabilities. To this end, we propose a saliency feature suppression and cross-feature fusion network model (SFSCF-Net) to explore the interaction learning between different higher-order feature representations. These include (1) an object-level image generator (OIG): the intersection of the output feature maps of the last two convolutional blocks of the backbone network is used as an object mask and mapped to the original image for cropping to obtain an object-level image, which can effectively reduce the interference caused by complex backgrounds. (2) A saliency feature suppression module (SFSM): the most distinguishing part of the object image is obtained by a feature extractor, and the part is masked by a two-dimensional suppression method, which improves the accuracy of feature suppression. (3) A cross-feature fusion method (CFM) based on inter-layer interaction: the output feature maps of different network layers are interactively integrated to obtain high-dimensional features, and then the high-dimensional features are channel compressed to obtain the inter-layer interaction feature representation, which enriches the output feature semantic information. The proposed SFSCF-Net can be trained end-to-end and achieves state-of-the-art or competitive results on four FGVC benchmark datasets.

Multi-directional guidance network for fine-grained visual classification

Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification

Significant feature suppression and cross-feature fusion networks for fine-grained visual classification

Graph-in-graph Discriminative Feature Enhancement Network for Fine-Grained Visual Classification

Hierarchical Gate Network for Fine-Grained Visual Recognition.

Multi-layer feature fusion and attention enhancement for fine-grained vehicle recognition research

Attention-based Multi-scale ViT Fine-grained Visual Classification

MGFN: A Multi-Granularity Fusion Convolutional Neural Network for Remote Sensing Scene Classification

On the Imaginary Wings: Text-Assisted Complex-Valued Fusion Network for Fine-Grained Visual Classification

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Edge-Enhanced GCIFFNet: A Multiclass Semantic Segmentation Network Based on Edge Enhancement and Multiscale Attention Mechanism

Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation

Multifrequency Graph Convolutional Network With Cross-Modality Mutual Enhancement for Multisource Remote Sensing Data Classification

MVF-Net: A Multi-view Fusion Network for Event-based Object Classification

Multilayer Feature Fusion Network With Spatial Attention and Gated Mechanism for Remote Sensing Scene Classification

Grad-CAM guided channel-spatial attention module for fine-grained visual classification

Exploration of Class Center for Fine-Grained Visual Classification

Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification

Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification.

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.

MLGN:A Multi-Label Guided Network for Improving Text Classification