A Framework for Math Word Problem Solving Based on Pre-training Models and Spatial Optimization Strategies.

Weijiang Fan,Jing Xiao,Yang Cao
DOI: https://doi.org/10.1007/978-981-99-2385-4_37
2022-01-01
Abstract:Automatic Math Word Problem (MWP) solving plays an important role in AI-tutoring, which aims to generate corresponding math expressions and results from a series of MWP. For the applicability of the MWP solving model, two aspects are considered to be optimized. Firstly, to address the weak linguistic representation of RNN which leads to the poor accuracy of MWP solution models, we propose to use Bidirectional Encoder Representation from Transformers (BERT) as an encoder and combine it with Transformer decoder to form a model framework. It is about 8% higher on the dataset Math23K compared to GTS, reaching 82.6%. However, pre-trained models tend to be large in size, which is not conducive to the deployment on the web server. A knowledge distillation strategy integrating teacher model’s evaluation is proposed. By enabling a model to patiently learn from and imitate the teacher through multi-layer distillation, the above BERT based model is compressed into a shallow structured student model. It achieves accuracy of 76.3% on Math23K, while the model weighs only 0.61 times that of the teacher’s model, and improves the prediction speed by 1.71 times.
What problem does this paper attempt to address?