Abstract:A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task. While Large Language Models perform complex reasoning by generating explanations for their predictions, it is unclear whether they also make good teachers for weaker agents. To address this, we consider a student-teacher framework between two LLM agents and study if, when, and how the teacher should intervene with natural language explanations to improve the student's performance. Since communication is expensive, we define a budget such that the teacher only communicates explanations for a fraction of the data, after which the student should perform well on its own. We decompose the teaching problem along four axes: (1) if teacher's test time intervention improve student predictions, (2) when it is worth explaining a data point, (3) how the teacher should personalize explanations to better teach the student, and (4) if teacher explanations also improve students on future unexplained data. We first show that teacher LLMs can indeed intervene on student reasoning to improve their performance. Next, inspired by the Theory of Mind abilities of effective teachers, we propose building two few-shot mental models of the student. The first model defines an Intervention Function that simulates the utility of an intervention, allowing the teacher to intervene when this utility is the highest and improving student performance at lower budgets. The second model enables the teacher to personalize explanations for a particular student and outperform unpersonalized teachers. We also demonstrate that in multi-turn interactions, teacher explanations generalize and learning from explained data improves student performance on future unexplained data. Finally, we verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.

Teaching Language Models to Self-Improve through Interactive Demonstrations

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Teaching Models to Improve on Tape

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Teaching Language Models to Self-Improve by Learning from Language Feedback

Teaching Large Language Models to Self-Debug

Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization

Large Language Models Can Self-Improve in Long-context Reasoning

Targeted training for numerical reasoning with large language models

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance

Advancing Large Language Model Attribution through Self-Improving