Abstract:Deep learning has revolutionized the performance of classification, but meanwhile demands sufficient labeled data for training. Given insufficient data, while many techniques have been developed to help combat overfitting, the challenge remains if one tries to train deep networks, especially in the ill-posed extremely low data regimes: only a small set of labeled data are available, and nothing -- including unlabeled data -- else. Such regimes arise from practical situations where not only data labeling but also data collection itself is expensive. We propose a deep adversarial data augmentation (DADA) technique to address the problem, in which we elaborately formulate data augmentation as a problem of training a class-conditional and supervised generative adversarial network (GAN). Specifically, a new discriminator loss is proposed to fit the goal of data augmentation, through which both real and augmented samples are enforced to contribute to and be consistent in finding the decision boundaries. Tailored training techniques are developed accordingly. To quantitatively validate its effectiveness, we first perform extensive simulations to show that DADA substantially outperforms both traditional data augmentation and a few GAN-based options. We then extend experiments to three real-world small labeled datasets where existing data augmentation and/or transfer learning strategies are either less effective or infeasible. All results endorse the superior capability of DADA in enhancing the generalization ability of deep networks trained in practical extremely low data regimes. Source code is available at <a class="link-external link-https" href="https://github.com/SchafferZhang/DADA" rel="external noopener nofollow">this https URL</a>.

Taming Prompt-Based Data Augmentation for Long-Tailed Extreme Multi-Label Text Classification.

On Data Augmentation for Extreme Multi-label Classification

ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

PromptDA: Label-guided Data Augmentation for Prompt-based Few-shot Learners

Improving Text Classification with Large Language Model-Based Data Augmentation

PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification

Improving Tail Label Prediction for Extreme Multi-label Learning

Few-shot Partial Multi-label Learning with Data Augmentation

Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime

Data Augmentation For Label Enhancement

DAGAM: Data Augmentation with Generation And Modification

Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation

Learning label-label correlations in Extreme Multi-label Classification via Label Features

Large Model-Based Data Augmentation for Imbalanced Text Classification

DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning