Abstract:Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8%, while achieving up to 3.3x speedup across various subset sizes. Finally, training on and augmenting 50% subsets using our method on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset. Our code is available at: <a class="link-external link-https" href="https://github.com/tianyu139/data-efficient-augmentation" rel="external noopener nofollow">this https URL</a>

Differentiable Augmentation for Data-Efficient GAN Training

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Incremental Focal Loss GANs.

On Data Augmentation for GAN Training

Image Augmentations for GAN Training

DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

Data-Efficient Augmentation for Training Neural Networks

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

LatentAugment: Data Augmentation via Guided Manipulation of GAN's Latent Space

Improving GAN Training via Feature Space Shrinkage

Training Generative Adversarial Networks in One Stage

Exploring Guided Sampling of Conditional GANs

Data Augmentation Based on Generative Adversarial Network with Mixed Attention Mechanism

Performance Study of Image Data Augmentation by Generative Adversarial Networks

ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data

Diffusion-GAN: Training GANs with Diffusion

Learning Efficient GANs using Differentiable Masks and co-Attention Distillation.

Dist-GAN: An Improved GAN using Distance Constraints

Learning Efficient GANs Using Differentiable Masks and Co-Attention Distillation

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks