Abstract:<p>With the rapid development of deep learning, the performance of fine-grained image classification has experienced unprecedented improvement. However, for fine-grained image classification, quickly and effectively focusing on subtle discriminative details that make the sub-classes different from each other has always been challenging. In this paper, we propose a novel Multi-Scale Erasure and Confusion (MSEC) method to tackle the challenge of fine-grained image classification. Firstly, the input image is divided into several sub-regions, and the confidence scores of those sub-regions are calculated by the confidence function. The sub-regions with lower confidence scores are then erased by the Region Erasure Module (REM) and the erased image is confused once by the Multi-scale Region Confusion Module (Multi-scale RCM). Secondly, the sub-regions with higher confidence scores are divided and confused again by the Multi-scale RCM, and then generate an image with multi-scale information. Finally, features in the erased image and the "destructed" image are extracted by the backbone network, and the whole network is optimized by the multi-loss function to realize classification tasks. Extensive experiments on three standard fine-grained benchmark datasets, including Stanford Dogs, CUB-200-2011 and FGVC-Aircraft, show that MSEC can improve the accuracy of fine-grained image classification.</p>

What problem does this paper attempt to address?

The paper primarily addresses the challenges in fine-grained image classification by proposing a new method called Multi-Scale Erasure and Confusion (MSEC). ### Research Background and Problem In fine-grained image classification, identifying subtle differences between different subcategories is a key challenge. These images have small inter-class differences and large intra-class differences, such as different breeds of dogs, birds, etc. Current methods are mainly divided into two categories: one requires manually annotating key areas in the images, which is resource-intensive and difficult to scale; the other uses attention mechanisms to automatically locate discriminative regions, but this method increases the computational load of the network. ### Overview of the MSEC Method The MSEC method aims to address the above issues through two key modules: 1. **Region Erasure Module (REM)**: The input image is evenly divided into multiple sub-regions, and each sub-region is scored based on a confidence function. Sub-regions with lower scores are considered to contain redundant information and are erased. This helps the network better extract detailed features of the target object. 2. **Multi-scale Region Confusion Module (Multi-scale RCM)**: The erased image is randomly confused, disrupting the overall structure of the original image. Additionally, high-scoring sub-regions are further divided and confused to generate images containing information at different scales. This multi-scale confusion helps the network focus more on discriminative local details of the target object. ### Main Contributions 1. REM can effectively remove redundant information from the image, retain information useful for classification, and enhance the network's ability to learn representative features of the target object. 2. Multi-scale RCM confuses images and high-scoring sub-regions at different scales, highlighting the detailed textures of the target object, further improving the network's ability to mine discriminative visual cues. 3. The MSEC network adds almost no extra parameters. Experimental results on three standard fine-grained datasets show that this method achieves more competitive classification results compared to existing techniques. ### Conclusion In summary, the MSEC method is a concise and effective solution for fine-grained image classification. It does not require additional part-object annotations nor relies on attention models, thereby reducing computational costs and improving classification performance.

MSEC: Multi-Scale Erasure and Confusion for fine-grained image classification

Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition

Multi-Similarity Enhancement Network for Few-Shot Segmentation.

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Emmcnn: An Etps-Based Multi-Scale And Multi-Feature Method Using Cnn For High Spatial Resolution Image Land-Cover Classification

Research on image classification based on residual group multi-scale enhanced attention network

Remote Sensing Image Scene Classification by Multiple Granularity Semantic Learning

Dual attention guided multi-scale CNN for fine-grained image classification

Multi-directional guidance network for fine-grained visual classification

MCDet: Multi-Content Collaboration Detector for Multiscale Remote Sensing Object

SCECNet: self-correction feature enhancement fusion network for remote sensing scene classification

A Fine-Grained Image Classification Approach for Dog Feces Using MC-SCMNet under Complex Backgrounds

Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation

MDER-Net: A Multi-Scale Detail-Enhanced Reverse Attention Network for Semantic Segmentation of Bladder Tumors in Cystoscopy Images

Fine-grained image classification method based on hybrid attention module

Subtler mixed attention network on fine-grained image classification

MRMNet: Multi-scale residual multi-branch neural network for object detection

MSDEnet: Multi-scale detail enhanced network based on human visual system for medical image segmentation

Class Semantic Enhancement Network for Semantic Segmentation

Multigranularity Decoupling Network With Pseudolabel Selection for Remote Sensing Image Scene Classification

A Multi-scale CNN-CRF Framework for Environmental Microorganism Image Segmentation