Maths: Multimodal Transformer-Based Human-Readable Solver

Yicheng Pan,Zhenrong Zhang,Jiefeng Ma,Pengfei Hu,Jun Du,Qing Wang,Jianshu Zhang,Dan Liu,Si Wei
DOI: https://doi.org/10.1109/icme57554.2024.10687434
2024-01-01
Abstract:Multimodal mathematical reasoning has gained increasing attention in recent times. However, previous effective methods have not tried to reason in the form of natural language. In this paper, we introduce a model named MATHS (MultimodAl Transformer-based Human-readable Solver) for visual arithmetic and geometry problems in multimodal mathematical reasoning tasks. Drawing inspiration from Multimodal Large Language Models (MLLMs), our approach involves generating problem-solving processes expressed in natural language, in order to leverage the inherent reasoning capabilities embedded within language models. To address the challenge of precise calculations for language models, our work proposes a Math-Constrained Generation (MCG) method to impose hard constraints on generated outputs. Extensive experiments demonstrate our model excels in visual arithmetic task, and achieves results that are either better or comparable to existing methods in geometry problems. Code is available at https://github.com/ycpNotFound/MATHS.
What problem does this paper attempt to address?