Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

Xuefei Ning,Zifu Wang,Shiyao Li,Zinan Lin,Peiran Yao,Tianyu Fu,Matthew B. Blaschko,Guohao Dai,Huazhong Yang,Yu Wang
2024-10-30
Abstract:Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching improves not only students but also teachers, by fostering more rigorous and clear reasoning as well as knowledge building. We ask: Can LLMs also learn by teaching (LbT) for better reasoning? If the answer is yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration on this question. We show that LbT ideas can be incorporated into existing LLM training/prompting pipelines and bring improvements. Specifically, we design three methods, each mimicking one of the three levels of LbT: observing students' feedback, learning from the feedback, and learning iteratively, with the goals of improving answer accuracy without training or improving models' inherent capability with fine-tuning. We reveal some findings: (1) Teaching materials that make it easier for students to learn have clearer and more accurate logic when using in-context learning as the student's "learning" method; (2) Weak-to-strong generalization: LbT might help improve strong models by teaching weak models; (3) Diversity in students might help: teaching multiple students could be better than teaching one student or the teacher itself. We hope that our exploration can inspire future research on LbT and more broadly adopting the advanced techniques in education to improve LLMs. The code and website are at <a class="link-external link-https" href="https://github.com/imagination-research/lbt" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://sites.google.com/view/llm-learning-by-teaching" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Can large - language models (LLMs) improve their reasoning abilities through teaching? Specifically, the paper explores whether LLMs can learn more from the process of teaching other models (possibly weaker models) and thus improve their own performance, including the quality of answers and the internal capabilities of the models. If this can be achieved, it will mean that LLMs can continue to progress without relying entirely on human - generated data or stronger models. The paper explores this problem by designing three methods (M1, M2, M3), which respectively correspond to the three levels of learning - by - teaching (LbT): observing student feedback, learning from feedback, and iteratively learning from feedback. Each method aims to improve on different goals, such as improving the quality of answers without training, or improving the inherent capabilities of the model through training. The main findings include: 1. The more helpful the teaching materials are for students to learn, the clearer and more accurate their logic is. 2. Stronger models may also be improved by teaching weaker models. 3. A diverse group of students may be more helpful for the teacher model to learn than a single student or self - teaching. These preliminary studies show that through appropriate methods and teacher - student settings, LbT can help improve the answer quality and internal capabilities of LLMs, providing a new direction for future research.