Abstract:Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from a label set. In real applications, labels usually follow a long-tailed distribution, where most labels (called as tail-label) only contain a small number of documents and limit the performance of MLTC. To facilitate this low-resource problem, researchers introduced a simple but effective strategy, data augmentation (DA). However, most existing DA approaches struggle in multi-label settings. The main reason is that the augmented documents for one label may inevitably influence the other co-occurring labels and further exaggerate the long-tailed problem. To mitigate this issue, we propose a new pair-level augmentation framework for MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs for the tail-labels. LSFA contains two main parts. The first is for label-specific document representation learning in the high-level latent space, the second is for augmenting tail-label features in latent space by transferring the documents second-order statistics (intra-class semantic variations) from head labels to tail labels. At last, we design a new loss function for adjusting classifiers based on augmented datasets. The whole learning procedure can be effectively trained. Comprehensive experiments on benchmark datasets have shown that the proposed LSFA outperforms the state-of-the-art counterparts.

For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

Robust Classification by Coupling Data Mollification with Label Smoothing

What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel

Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?

Data Augmentation For Label Enhancement

Fine-Grained AutoAugmentation for Multi-Label Classification

Enhance Via Decoupling - Improving Multi-Label Classifiers with Variational Feature Augmentation.

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

On the Generalization Effects of Linear Transformations in Data Augmentation

Feature Augmentation for Adversarial Robustness

Boosting Model Resilience via Implicit Adversarial Data Augmentation

Augment on Manifold: Mixup Regularization with UMAP

On Mixup Regularization

Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise

Provable Benefit of Mixup for Finding Optimal Decision Boundaries

Advanced pseudo-labeling approach in mixing-based text data augmentation method

Label Augmentation for Neural Networks Robustness

Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks

A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Understanding the Detrimental Class-level Effects of Data Augmentation

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification