An Introspective Data Augmentation Method for Training Math Word Problem Solvers

Jinghui Qin,Zhongzhan Huang,Ying Zeng,Quanshi Zhang,Liang Lin
DOI: https://doi.org/10.1109/taslp.2024.3408067
2024-01-01
Abstract:Though quite challenging, training a deep neural network for automatically solving Math Word Problems (MWPs) has increasingly attracted attention due to its significance in investigating how a machine can understand and reason complex problems like a human. However, the data volume of existing high-quality MWP datasets is far from sufficient to train a robust solver since collecting these datasets would cost a very high price, i.e., they require professional knowledge that accords with the educational standard and massive accessible data. This data bottleneck inspires us to consider using cost-effective data augmentation methods to improve the utilization of the existing data and enhance the performance of an MWP solver. Nevertheless, the traditional input-based data augmentation methods for training natural image or language models are incompetent for training MWP solvers due to the following two reasons. First, MWPs are concise yet comprehensive, so these data augmentation methods are prone to make them more ambiguous. Second, the mathematical dependencies grounded in the problems must be maintained when a batch of augmented examples is generated during the data augmentation process. To address these issues, we propose a simple yet effective data augmentation method called the Introspective Data Augmentation Method (IDAM) that allows the MWP examples to be augmented in latent space during the training of the neural network, instead of making perturbations over the input data. In particular, our IDAM is capable of applying different data augmentation operations on the latent feature representations of MWPs to produce new examples. Moreover, a new training objective is developed to constrain the mathematical dependency consistency between the original MWP and the produced ones. Extensive experiments conducted on standard benchmarks demonstrate the effectiveness of IDAM in generally improving the performance of existing MWP solvers without any elaborated model crafting.
What problem does this paper attempt to address?