Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Siyuan Guo,Aniket Didolkar,Nan Rosemary Ke,Anirudh Goyal,Ferenc Huszár,Bernhard Schölkopf
2024-05-24
Abstract:We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate whether large language models (LLMs) can go beyond pattern matching and truly understand mathematical skills when solving mathematical problems. Specifically, the researchers hope to explore whether these models can effectively learn from information and use the learned knowledge to solve new problems by analyzing the performance of LLMs on different mathematical skills. The focus of the paper is: 1. **Evaluating the learning ability of LLMs**: Not only examine the knowledge already possessed by the pre - trained models, but also pay attention to how they learn new knowledge from information through in - context learning or instruction - tuning. 2. **Distinguishing between deep structure and surface structure**: The researchers designed experiments to test whether LLMs solve problems based on a deep understanding of mathematical skills or merely rely on the surface cues in the problem statements. 3. **Proposing the NTKEval method**: Inspired by the neural tangent kernel (NTK), the researchers proposed a method named NTKEval to evaluate the changes in the probability distribution of LLMs during the training process, thereby measuring the learning effect of the model more efficiently. Through these studies, the paper aims to gain a deep understanding of the learning mechanisms of LLMs in the field of mathematics and whether they can show true understanding ability rather than just pattern - matching ability when handling complex tasks. This not only helps to evaluate the capabilities of existing LLMs, but also provides a theoretical basis and practical guidance for improving these models in the future.