Abstract:Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA.

Learning Shape-Invariant Representation for Generalizable Semantic Segmentation

IRLSG: Invariant Representation Learning for Single-Domain Generalization in Medical Image Segmentation

LEARNING SHAPE PRIORS BY PAIRWISE COMPARISON FOR ROBUST SEMANTIC SEGMENTATION

IS2Net: Intra-domain Semantic and Inter-domain Style Enhancement for Semi-supervised Medical Domain Generalization

DIRL: Domain-Invariant Representation Learning for Generalizable Semantic Segmentation

Domain-Invariant Information Aggregation for Domain Generalization Semantic Segmentation

Generalizable model-agnostic semantic segmentation via target-specific normalization

DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation

Meta-Learned Feature Critics for Domain Generalized Semantic Segmentation

March on Data Imperfections: Domain Division and Domain Generalization for Semantic Segmentation.

Domain-Incremental Learning for Remote Sensing Semantic Segmentation With Multifeature Constraints in Graph Space

Unsupervised Domain Adaptation with Pseudo Shape Supervision for IC Image Segmentation

Adaptive Texture Filtering for Single-Domain Generalized Segmentation

Invariant Content Representation for Generalizable Medical Image Segmentation

Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains

Fine-Grained Recognition With Learnable Semantic Data Augmentation

Inter-Class and Inter-Domain Semantic Augmentation for Domain Generalization

DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data

Learning intra-domain style-invariant representation for unsupervised domain adaptation of semantic segmentation

Pin the Memory: Learning to Generalize Semantic Segmentation

Shape Guided Gradient Voting for Domain Generalization