Abstract:Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA.

Instance-Specific Semantic Augmentation for Long-Tailed Image Classification

SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification

ProAug: Prototype-Based Augmentation for Long-Tailed Image Classification.

Feature Space Augmentation for Long-Tailed Data

ECS-SC: Long-tailed classification via data augmentation based on easily confused sample selection and combination

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

Data-Centric Long-Tailed Image Recognition

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Text-guided Fourier Augmentation for long-tailed recognition

DDFA: a displacement and diffusion-based feature augmentation method for imbalanced image recognition

SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation

Class Activation Maps-based Feature Augmentation for long-tailed classification

CUDA: Curriculum of Data Augmentation for Long-Tailed Recognition

Improving Long-Tailed Classification from Instance Level

Semantic Data Augmentation for Long-tailed Facial Expression Recognition

SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

Global- and Local-aware Feature Augmentation with Semantic Orthogonality for Few-shot Image Classification

Fine-Grained Recognition With Learnable Semantic Data Augmentation

Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning

DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection