What Makes Math Word Problems Challenging for LLMs?

KV Aditya Srivatsa,Ekaterina Kochmar
2024-04-01
Abstract:This paper investigates the question of what makes math word problems (MWPs) in English challenging for large language models (LLMs). We conduct an in-depth analysis of the key linguistic and mathematical characteristics of MWPs. In addition, we train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent LLMs and investigate whether this helps predict how well LLMs fare against specific categories of MWPs.
Computation and Language
What problem does this paper attempt to address?
The paper aims to explore the reasons why mathematical word problems (MWPs) pose challenges to large language models (LLMs) and to predict whether a specific LLM can correctly solve these problems by analyzing key linguistic and mathematical features. The specific objectives of the study include: 1. **Identify which features of the input mathematical word problems make them complex for LLMs**: By conducting an in-depth analysis of MWPs, identify the key factors that influence their difficulty. 2. **Predict LLM performance based on these features**: Use the extracted features to train classifiers to determine whether a specific LLM can correctly solve a particular MWP. To achieve these goals, the authors selected several open-source large-scale language models for experimentation, including Llama2, Mistral-7B, and MetaMath-13B, and used the GSM8K dataset for training and testing. The study found that problems with a high number and variety of mathematical operations, and those using uncommon numerical notations, are particularly difficult to solve. Additionally, long sentences, low readability scores, and problems requiring external knowledge are often not correctly answered. Based on these findings, future work will attempt to modify the problems to further explore the capabilities of LLMs in reasoning and solving mathematical word problems.