Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Didolkar,Anirudh Goyal,Nan Rosemary Ke,Siyuan Guo,Michal Valko,Timothy Lillicrap,Danilo Rezende,Yoshua Bengio,Michael Mozer,Sanjeev Arora
2024-05-21
Abstract:Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to explore whether large language models (LLMs) possess metacognitive abilities, particularly in their performance in solving mathematical problems. Metacognitive knowledge refers to humans' intuitive understanding of their own thinking and reasoning processes. Although the current best LLMs have demonstrated some reasoning capabilities, whether they possess metacognitive knowledge (such as the ability to name skills and procedures) has not been fully studied. ### Specific Questions 1. **Existence of Metacognitive Abilities**: Do LLMs possess metacognitive abilities, enabling them to identify and name the skills required to solve mathematical problems? 2. **Effectiveness of Skill Labels**: Are the skill labels generated by LLMs meaningful and effective for other LLMs in solving mathematical problems? 3. **Application of Skill Labels**: How can these skill labels be utilized to improve LLM performance in solving mathematical problems? ### Solutions 1. **Skill Label Generation**: - Use powerful LLMs (such as GPT-4) to assign detailed skill labels to mathematical problems. - Merge these fine-grained skill labels into coarser skill categories through semantic clustering. 2. **Experimental Validation**: - Use GPT-4 to generate skill labels on the GSM8K and MATH datasets and validate the effectiveness of these labels. - During the testing phase, provide LLMs with examples containing skill labels to guide problem-solving and improve accuracy. 3. **Cross-Model and Dataset Applicability**: - Apply skill labels generated from the GSM8K dataset to other mathematical problem datasets to verify their generality and effectiveness. - Test whether these skill labels can improve the performance of weaker LLMs (such as Mixtral) in solving mathematical problems. ### Experimental Results - **Performance Improvement**: On the MATH dataset, the method using skill labels improved accuracy by 11.6% compared to the traditional Chain-of-Thought (CoT) method. - **Cross-Model Applicability**: Skill labels are not only effective for GPT-4 but also significantly improve the performance of weaker LLMs (such as Mixtral). - **Multi-Method Integration**: Skill labels can be combined with existing prompting methods (such as CoT, PAL, etc.) to further enhance the reasoning capabilities of LLMs. ### Conclusion The paper experimentally demonstrates that LLMs do possess metacognitive abilities, capable of generating effective skill labels, and that these skill labels significantly improve performance in solving mathematical problems. Additionally, these skill labels exhibit good cross-model and dataset generality, providing new directions for future research.