Abstract:Deep learning has been widely used for extracting values from big data. As many other machine learning algorithms, deep learning requires significant training data. Experiments have shown both the volume and the quality of training data can significantly impact the effectiveness of the value extraction. In some cases, the volume of training data is not sufficiently large for effectively training a deep learning model. In other cases, the quality of training data is not high enough to achieve the optimal performance. Many approaches have been proposed for augmenting training data to mitigate the deficiency. However, whether the augmented data are “fit for purpose” of deep learning is still a question. A framework for comprehensively evaluating the effectiveness of the augmented data for deep learning is still not available. In this article, we first discuss a data augmentation approach for deep learning. The approach includes two components: the first one is to remove noisy data in a dataset using a machine learning based classification to improve its quality, and the second one is to increase the volume of the dataset for effectively training a deep learning model. To evaluate the quality of the augmented data in fidelity, variety, and veracity, a data quality evaluation framework is proposed. We demonstrated the effectiveness of the data augmentation approach and the data quality evaluation framework through studying an automated classification of biology cell images using deep learning. The experimental results clearly demonstrated the impact of the volume and quality of training data to the performance of deep learning and the importance of the data quality evaluation. The data augmentation approach and the data quality evaluation framework can be straightforwardly adapted for deep learning study in other domains.

A Preliminary Study on Data Augmentation of Deep Learning for Image Classification

Image Data Augmentation for Deep Learning: A Survey

The Performance Research of the Data Augmentation Method for Image Classification

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

A Case Study of the Augmentation and Evaluation of Training Data for Deep Learning

Learning Optimal Data Augmentation Policies via Bayesian Optimization for Image Classification Tasks

Improving Deep Learning using Generic Data Augmentation

A survey on Image Data Augmentation for Deep Learning

Enhancing Performance of Deep Learning Models with a Novel Data Augmentation Approach

A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classification Tasks

Data Augmentation For Deep Learning Of Judgment Documents

A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition

Data Augmentation in Classification and Segmentation: A Survey and New Strategies

Medical image data augmentation: techniques, comparisons and interpretations

ADQE: Obtain Better Deep Learning Models by Evaluating the Augmented Data Quality Using Information Entropy

EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification

Image data augmentation techniques based on deep learning: A survey

See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification

Data-Efficient Augmentation for Training Neural Networks

Decoupled Data Augmentation for Improving Image Classification

DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification