TransformMix: Learning Transformation and Mixing Strategies from Data

Tsz-Him Cheung,Dit-Yan Yeung
2024-03-19
Abstract:Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in data augmentation, the existing sample mixing methods (such as Mixup and Cutmix) have insufficient generalization ability on different datasets, and these methods do not consider the image content when mixing images, which may lead to the generation of misleading mixed images and thus affect the model performance. Specifically: 1. **Limitations of existing methods**: - **Blind mixing**: Existing methods (such as Mixup and Cutmix) do not consider the image content when mixing images, which may lead to important regions being occluded or diluted. - **Poor generalization ability**: A mixing strategy that is effective on a specific dataset may perform poorly on other datasets. - **Complex manual configuration**: Manual adjustment of the mixing strategy is required, increasing the difficulty of use. 2. **Proposed new method**: - **TransformMix**: The paper proposes an automated method to generate higher - quality mixed images by learning better transformation and mixing strategies from the data. - **Adaptive mixing**: TransformMix can automatically adjust the mixing strategy according to the content of the input image, avoiding the destruction of important regions. - **Generalization ability**: The learned mixing strategy can be effectively transferred to new datasets, reducing the computational overhead on new datasets. 3. **Specific objectives**: - **Maximize the preservation of visual saliency**: The mixing strategy should preserve the visual saliency of the input image as much as possible. - **Automated learning**: The mixing strategy should be automatically learned from the dataset, reducing human intervention. - **Improve model performance**: By generating higher - quality mixed images, improve the performance of the model on various tasks (such as classification, object detection, and knowledge distillation). In summary, the main objective of this paper is to solve the problems of insufficient generalization ability on different datasets and not considering the image content of existing sample mixing methods by proposing the TransformMix method, thereby improving the performance and efficiency of the model.