FSDA: Frequency re-scaling in data augmentation for corruption-robust image classification

Ju-Hyeon Nam,Sang-Chul Lee
DOI: https://doi.org/10.2139/ssrn.4560038
IF: 8
2024-02-10
Pattern Recognition
Abstract:Modern convolutional neural networks (CNNs) are used in various applications, including computer vision, speech recognition, and robotics. However, practical usage in various applications requires large-scale datasets, and real-world data contains various corruptions that degrade the model's performance owing to the inconsistencies in the training and testing distributions. In this study, we propose Frequency re-Scaling Data Augmentation (FSDA) to improve the classification performance, robustness against corruption, and localizability of classifiers trained on various image classification datasets. Our method consists of two processes: mask generation process (MGP) and pattern re-scaling process (PSP). MGP clusters the frequency domain spectra to produce similar frequency patterns, and then PSP scales frequency by learning rescaling parameters from frequency patterns. Because the CNN classifies images by focusing on their structural features highlighted with FSDA, CNN trained with the proposed method has more robustness against corruption than that with the other data augmentations (DAs). Our technique outperforms the existing DAs on four public image classification datasets, including the CIFAR-10/100, STL-10, and ImageNet. Particularly, our strategy increases the robustness of the classifier against the different corruption errors by an average of 5.04% over the baseline.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?