Cross-Modal Data Augmentation for Tasks of Different Modalities
Dong Chen,Yueting Zhuang,Zijin Shen,Carl Yang,Guoming Wang,Siliang Tang,Yi Yang
DOI: https://doi.org/10.1109/tmm.2022.3228696
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Data augmentation has become one of the keys to alleviating the over-fitting of models on training data and improving the generalization capabilities on testing data. Most existing data augmentation methods only focus on one modality, which is incapable when facing multiple data modalities. Some prior works try to interpolate with random coefficients in the latent space to generate new samples, which can generically work for any data modality. However, these works ignore the extra information conveyed by multimodality data. In fact, the extra information in one modality can provide semantic directions to generate more meaningful samples in another modality. This paper proposes Cross-modal Data Augmentation (CMDA), a simple yet effective data augmentation method to alleviate the over-fitting issue and improve the generalization performance. We evaluate CMDA on unsupervised and supervised tasks of different modalities, on which CMDA consistently and significantly outperforms baselines. For instance, CMDA improves the unsupervised anomaly detection baseline in vision modality from the AUROC $76.46\%, 73.07\%$ and 64.36% to $83.25\%, 76.22\%$ and 70.57% on three different datasets, respectively. Besides, extensive experiments demonstrate that CMDA is applicable to various neural network architectures. Furthermore, prior methods that interpolate in the latent space need to work with downstream tasks to construct the latent space. In contrast, CMDA can work with or without downstream tasks, which makes the applicability of CMDA more extensive. The source code is publicly available for non-commercial or research use at https://github.com/Anfeather/CMDA