Abstract:Large Language Models (LLMs), when used in educational settings without pedagogical fine-tuning, often provide immediate answers rather than guiding students through the problem-solving process. This approach falls short of pedagogically best practices and limits their effectiveness as educational tools. We term the objective of training LLMs to emulate effective teaching strategies as `pedagogical alignment.' In this paper, we investigate Learning from Human Preferences (LHP) algorithms to achieve this alignment objective. A key challenge in this process is the scarcity of high-quality preference datasets to guide the alignment. To address this, we propose a novel approach for constructing a large-scale dataset using synthetic data generation techniques, eliminating the need for time-consuming and costly manual annotation. Leveraging this dataset, our experiments with Llama and Mistral models demonstrate that LHP methods outperform standard supervised fine-tuning (SFT), improving pedagogical alignment accuracy by 13.1% and 8.7% respectively. Existing evaluation methods also lack quantitative metrics to adequately measure the pedagogical alignment of LLMs. To address this gap, we propose novel perplexity-based metrics that quantify LLMs' tendency to provide scaffolded guidance versus direct answers, offering a robust measure of pedagogical alignment. Our analysis provides compelling evidence for the superiority of LHP methods over SFT in optimizing LLMs' behavior, underscoring the potential of LHP methods in better aligning LLMs with educational objectives and fostering effective learning experiences. Code and models are available \href{<a class="link-external link-https" href="https://github.com/luffycodes/Tutorbot-Spock" rel="external noopener nofollow">this https URL</a>}{here}.

Unintended Impacts of LLM Alignment on Global Representation

Understanding the Learning Dynamics of Alignment with Human Feedback

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

Aligning Language Models to User Opinions

Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

Aligning LLMs with Individual Preferences via Interaction

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Your Weak LLM is Secretly a Strong Teacher for Alignment

Aligning Large Language Models with Human Preferences through Representation Engineering

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Dissecting Human and LLM Preferences

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Aligning (Medical) LLMs for (Counterfactual) Fairness

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning

PURE: Aligning LLM Via Pluggable Query Reformulation for Enhanced Helpfulness

Personalized soups: Personalized large language model alignment via post-hoc parameter merging

Pedagogical Alignment of Large Language Models

Aligners: Decoupling LLMs and Alignment