Abstract:Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. Automated data augmentation methods aim to automate the process. State-of-the-art approaches typically rely on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. We present extensive discussion of techniques for realizing each of the major subtasks of the data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of traditional data augmentation methods in practical applications, especially the challenges faced in generating optimal data augmentation strategies. Specifically: 1. ** Tedious manual work**: Traditional data augmentation methods require a great deal of manual trial - and - error to create and test different candidate augmentation schemes and their hyperparameters, which is both time - consuming and error - prone. 2. **Augmentation effect depends on the dataset**: Different types of augmentation methods have very different effects on different tasks and datasets, and it is very difficult to find the best augmentation method for a specific task. 3. **Limited generalization ability**: In some cases, the augmentation method suitable for one dataset may not be transferable to other datasets, resulting in performance degradation. 4. **Quality problems of synthetic data**: Although generative models such as generative adversarial networks (GANs) can generate synthetic data, they are prone to overfitting when the data is insufficient, and the generated data may not meet the requirements of the target domain. To solve these problems, this paper focuses on data augmentation techniques based on automated machine learning (AutoML). These techniques aim to optimize the data augmentation process by automated means, thereby improving the generalization ability and performance of machine - learning models. Specifically, the paper discusses the following: - **Data manipulation, data integration and data synthesis**: It introduces how to use AutoML to achieve different types of data augmentation tasks. - **Search space design, hyperparameter optimization and model evaluation**: It discusses in detail the specific implementation methods and techniques for each subtask. - **Performance comparison analysis**: It conducts an extensive performance comparison between AutoML - based data augmentation methods and traditional methods, and the results show that AutoML methods are superior to traditional methods in multiple aspects. In summary, the main purpose of this paper is to provide a more efficient and automated way to solve the limitations of existing methods by systematically summarizing and analyzing the application of AutoML in data augmentation.

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods

A Comprehensive Survey on Data Augmentation

A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classification Tasks

A survey of synthetic data augmentation methods in computer vision

AutoAugment: Learning Augmentation Policies from Data

Deep AutoAugment

A Survey on Data Augmentation in Large Model Era

Auto Machine Learning for Medical Image Analysis by Unifying the Search on Data Augmentation and Neural Architecture

Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

A survey on Image Data Augmentation for Deep Learning

Enhancing Performance of Deep Learning Models with a Novel Data Augmentation Approach

Research Trends and Applications of Data Augmentation Algorithms

Image Data Augmentation for Deep Learning: A Survey

A Good Data Augmentation Policy Is Not All You Need: A Multi-Task Learning Perspective

Image data augmentation techniques based on deep learning: A survey

A Unified Search Framework for Data Augmentation and Neural Architecture on Small-scale Image Datasets

Evaluating the Impact of Data Augmentation on Predictive Model Performance

Automatic Data Augmentation via Invariance-Constrained Learning

Automated data processing and feature engineering for deep learning and big data applications: a survey

Automatic Data Augmentation by Learning the Deterministic Policy