Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Siyuan Guo,Aniket Didolkar,Nan Rosemary Ke,Anirudh Goyal,Ferenc Huszár,Bernhard Schölkopf

2024-05-24

Abstract:We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.

Artificial Intelligence,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate whether large language models (LLMs) can go beyond pattern matching and truly understand mathematical skills when solving mathematical problems. Specifically, the researchers hope to explore whether these models can effectively learn from information and use the learned knowledge to solve new problems by analyzing the performance of LLMs on different mathematical skills. The focus of the paper is: 1. **Evaluating the learning ability of LLMs**: Not only examine the knowledge already possessed by the pre - trained models, but also pay attention to how they learn new knowledge from information through in - context learning or instruction - tuning. 2. **Distinguishing between deep structure and surface structure**: The researchers designed experiments to test whether LLMs solve problems based on a deep understanding of mathematical skills or merely rely on the surface cues in the problem statements. 3. **Proposing the NTKEval method**: Inspired by the neural tangent kernel (NTK), the researchers proposed a method named NTKEval to evaluate the changes in the probability distribution of LLMs during the training process, thereby measuring the learning effect of the model more efficiently. Through these studies, the paper aims to gain a deep understanding of the learning mechanisms of LLMs in the field of mathematics and whether they can show true understanding ability rather than just pattern - matching ability when handling complex tasks. This not only helps to evaluate the capabilities of existing LLMs, but also provides a theoretical basis and practical guidance for improving these models in the future.

Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective

Mathematical Language Models: A Survey

Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning

Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses

Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit

AI-Assisted Generation of Difficult Math Questions

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

Large Language Models for Mathematical Reasoning: Progresses and Challenges