Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

Rongkang Xu,Yongcheng Zhang,Kai Ren,Yu Huang,Xiaomei Wei
DOI: https://doi.org/10.1007/978-981-97-5672-8_35
2024-01-01
Abstract:Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.
What problem does this paper attempt to address?