Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

Wenting Tan,Dongxiao Chen,Jieting Xue,Zihao Wang,Taijie Chen
2024-10-11
Abstract:Large Language Models (LLMs) exhibit impressive performance across various domains but still struggle with arithmetic reasoning tasks. Recent work shows the effectiveness of prompt design methods in enhancing reasoning capabilities. However, these approaches overlook crucial requirements for prior knowledge of specific concepts, theorems, and tricks to tackle most arithmetic reasoning problems successfully. To address this issue, we propose a novel and effective Teaching-Inspired Integrated Framework, which emulates the instructional process of a teacher guiding students. This method equips LLMs with essential concepts, relevant theorems, and similar problems with analogous solution approaches, facilitating the enhancement of reasoning abilities. Additionally, we introduce two new Chinese datasets, MathMC and MathToF, both with detailed explanations and answers. Experiments are conducted on nine benchmarks which demonstrates that our approach improves the reasoning accuracy of LLMs. With GPT-4 and our framework, we achieve new state-of-the-art performance on four math benchmarks (AddSub, SVAMP, Math23K and AQuA) with accuracies of 98.2% (+3.3%), 93.9% (+0.2%), 94.3% (+7.2%) and 81.1% (+1.2%). Our data and code are available at <a class="link-external link-https" href="https://github.com/SallyTan13/Teaching-Inspired-Prompting" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the performance deficiencies of large language models (LLMs) in arithmetic reasoning tasks. Although LLMs excel in the field of natural language processing (NLP), they still face difficulties when dealing with tasks that require complex reasoning. The paper proposes a novel teaching heuristic comprehensive prompting framework that enhances the model's reasoning ability by providing necessary concepts, theorems, and background knowledge of similar problems, mimicking the way a teacher guides students. Additionally, the paper introduces two new Chinese datasets, MathMC and MathToF, for further research on arithmetic reasoning tasks. Experimental results show that this method significantly improves the reasoning accuracy of LLMs in multiple benchmarks, especially when using the GPT-4 model, achieving the latest best performance on 4 mathematical benchmarks.