Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

Xuefei Ning,Zifu Wang,Shiyao Li,Zinan Lin,Peiran Yao,Tianyu Fu,Matthew B. Blaschko,Guohao Dai,Huazhong Yang,Yu Wang

2024-10-30

Abstract:Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching improves not only students but also teachers, by fostering more rigorous and clear reasoning as well as knowledge building. We ask: Can LLMs also learn by teaching (LbT) for better reasoning? If the answer is yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration on this question. We show that LbT ideas can be incorporated into existing LLM training/prompting pipelines and bring improvements. Specifically, we design three methods, each mimicking one of the three levels of LbT: observing students' feedback, learning from the feedback, and learning iteratively, with the goals of improving answer accuracy without training or improving models' inherent capability with fine-tuning. We reveal some findings: (1) Teaching materials that make it easier for students to learn have clearer and more accurate logic when using in-context learning as the student's "learning" method; (2) Weak-to-strong generalization: LbT might help improve strong models by teaching weak models; (3) Diversity in students might help: teaching multiple students could be better than teaching one student or the teacher itself. We hope that our exploration can inspire future research on LbT and more broadly adopting the advanced techniques in education to improve LLMs. The code and website are at <a class="link-external link-https" href="https://github.com/imagination-research/lbt" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://sites.google.com/view/llm-learning-by-teaching" rel="external noopener nofollow">this https URL</a>.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Can large - language models (LLMs) improve their reasoning abilities through teaching? Specifically, the paper explores whether LLMs can learn more from the process of teaching other models (possibly weaker models) and thus improve their own performance, including the quality of answers and the internal capabilities of the models. If this can be achieved, it will mean that LLMs can continue to progress without relying entirely on human - generated data or stronger models. The paper explores this problem by designing three methods (M1, M2, M3), which respectively correspond to the three levels of learning - by - teaching (LbT): observing student feedback, learning from feedback, and iteratively learning from feedback. Each method aims to improve on different goals, such as improving the quality of answers without training, or improving the inherent capabilities of the model through training. The main findings include: 1. The more helpful the teaching materials are for students to learn, the clearer and more accurate their logic is. 2. Stronger models may also be improved by teaching weaker models. 3. A diverse group of students may be more helpful for the teacher model to learn than a single student or self - teaching. These preliminary studies show that through appropriate methods and teacher - student settings, LbT can help improve the answer quality and internal capabilities of LLMs, providing a new direction for future research.

Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Learning From Mistakes Makes LLM Better Reasoner

Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization

Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios

Thinking LLMs: General Instruction Following with Thought Generation

At Which Training Stage Does Code Data Help LLMs Reasoning?

Large Language Models are In-context Teachers for Knowledge Reasoning

Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

Can LLMs Reason in the Wild with Programs?

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs