Abstract:Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to explore whether large language models (LLMs) possess metacognitive abilities, particularly in their performance in solving mathematical problems. Metacognitive knowledge refers to humans' intuitive understanding of their own thinking and reasoning processes. Although the current best LLMs have demonstrated some reasoning capabilities, whether they possess metacognitive knowledge (such as the ability to name skills and procedures) has not been fully studied. ### Specific Questions 1. **Existence of Metacognitive Abilities**: Do LLMs possess metacognitive abilities, enabling them to identify and name the skills required to solve mathematical problems? 2. **Effectiveness of Skill Labels**: Are the skill labels generated by LLMs meaningful and effective for other LLMs in solving mathematical problems? 3. **Application of Skill Labels**: How can these skill labels be utilized to improve LLM performance in solving mathematical problems? ### Solutions 1. **Skill Label Generation**: - Use powerful LLMs (such as GPT-4) to assign detailed skill labels to mathematical problems. - Merge these fine-grained skill labels into coarser skill categories through semantic clustering. 2. **Experimental Validation**: - Use GPT-4 to generate skill labels on the GSM8K and MATH datasets and validate the effectiveness of these labels. - During the testing phase, provide LLMs with examples containing skill labels to guide problem-solving and improve accuracy. 3. **Cross-Model and Dataset Applicability**: - Apply skill labels generated from the GSM8K dataset to other mathematical problem datasets to verify their generality and effectiveness. - Test whether these skill labels can improve the performance of weaker LLMs (such as Mixtral) in solving mathematical problems. ### Experimental Results - **Performance Improvement**: On the MATH dataset, the method using skill labels improved accuracy by 11.6% compared to the traditional Chain-of-Thought (CoT) method. - **Cross-Model Applicability**: Skill labels are not only effective for GPT-4 but also significantly improve the performance of weaker LLMs (such as Mixtral). - **Multi-Method Integration**: Skill labels can be combined with existing prompting methods (such as CoT, PAL, etc.) to further enhance the reasoning capabilities of LLMs. ### Conclusion The paper experimentally demonstrates that LLMs do possess metacognitive abilities, capable of generating effective skill labels, and that these skill labels significantly improve performance in solving mathematical problems. Additionally, these skill labels exhibit good cross-model and dataset generality, providing new directions for future research.

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

AI-Assisted Generation of Difficult Math Questions

LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Can LLMs Compute with Reasons?

Investigating Symbolic Capabilities of Large Language Models

Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs

From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time