Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Rowel Atienza
DOI: https://doi.org/10.48550/arXiv.2110.10536
2021-10-20
Abstract:Data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA) results. With an increasing number of data augmentation algorithms being proposed, the focus is always on optimizing the input-output mapping while not realizing that there might be an untapped value in the transformed images with the same label. We hypothesize that by forcing the representations of two transformations to agree, we can further reduce the model generalization error. We call our proposed method Agreement Maximization or simply AgMax. With this simple constraint applied during training, empirical results show that data augmentation algorithms can further improve the classification accuracy of ResNet50 on ImageNet by up to 1.5%, WideResNet40-2 on CIFAR10 by up to 0.7%, WideResNet40-2 on CIFAR100 by up to 1.6%, and LeNet5 on Speech Commands Dataset by up to 1.4%. Experimental results further show that unlike other regularization terms such as label smoothing, AgMax can take advantage of the data augmentation to consistently improve model generalization by a significant margin. On downstream tasks such as object detection and segmentation on PascalVOC and COCO, AgMax pre-trained models outperforms other data augmentation methods by as much as 1.0mAP (box) and 0.5mAP (mask). Code is available at <a class="link-external link-https" href="https://github.com/roatienza/agmax" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to further improve the generalization ability of the model by enhancing the representation consistency of the model under different data transformations. Specifically, the author proposes a method named Agreement Maximization (AgMax). By adding a constraint during the training process, that is, forcing the representations of two differently transformed input images to be consistent, the generalization error of the model is reduced. This method aims to utilize the large amount of data generated by data augmentation, not only optimize the input - output mapping, but also mine the potential value between differently transformed inputs under the same label, thereby further improving the classification accuracy of the model. The paper points out that the existing data augmentation techniques mainly focus on optimizing the mapping relationship from input to output, while ignoring the potentially under - utilized value that may exist between differently transformed input images under the same label. The AgMax method fills this gap by introducing the concept of maximizing representation consistency. Experimental results show that AgMax can significantly improve the performance of multiple models on multiple datasets, especially when using heavy data augmentation, and its effect is particularly obvious. In addition, AgMax can also improve the performance of the model in downstream tasks such as object detection and segmentation.