Investigating the Effectiveness of Data Augmentation from Similarity and Diversity: an Empirical Study

Suorong Yang,Suhan Guo,Jian Zhao,Furao Shen
DOI: https://doi.org/10.1016/j.patcog.2023.110204
IF: 8
2023-01-01
Pattern Recognition
Abstract:Data augmentation has emerged as a widely adopted technique for improving the generalization capabilities of deep neural networks. However, evaluating the effectiveness of data augmentation methods solely based on model training is computationally demanding and lacks interpretability. Moreover, the absence of quantitative standards hinders our understanding of the underlying mechanisms of data augmentation approaches and the development of novel techniques. To this end, we propose interpretable quantitative measures that decompose the effectiveness of data augmentation methods into two key dimensions: similarity and diversity. The proposed similarity measure describes the overall similarity between the original and augmented datasets, while the diversity measure quantifies the divergence in inherent complexity between the original and augmented datasets in terms of categories. Importantly, our proposed measures are model training-agnostic, ensuring efficiency in their calculation. Through experiments on several benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, and ImageNet, we demonstrate the efficacy of our measures in evaluating the effectiveness of various data augmentation methods. Furthermore, although the proposed measures are straightforward, they have the potential to guide the design and parameter tuning of data augmentation techniques and enable the validation of data augmentation methods' efficacy before embarking on large-scale model training.
What problem does this paper attempt to address?