Abstract:Large Language Models (LLMs), when used in educational settings without pedagogical fine-tuning, often provide immediate answers rather than guiding students through the problem-solving process. This approach falls short of pedagogically best practices and limits their effectiveness as educational tools. We term the objective of training LLMs to emulate effective teaching strategies as `pedagogical alignment.' In this paper, we investigate Learning from Human Preferences (LHP) algorithms to achieve this alignment objective. A key challenge in this process is the scarcity of high-quality preference datasets to guide the alignment. To address this, we propose a novel approach for constructing a large-scale dataset using synthetic data generation techniques, eliminating the need for time-consuming and costly manual annotation. Leveraging this dataset, our experiments with Llama and Mistral models demonstrate that LHP methods outperform standard supervised fine-tuning (SFT), improving pedagogical alignment accuracy by 13.1% and 8.7% respectively. Existing evaluation methods also lack quantitative metrics to adequately measure the pedagogical alignment of LLMs. To address this gap, we propose novel perplexity-based metrics that quantify LLMs' tendency to provide scaffolded guidance versus direct answers, offering a robust measure of pedagogical alignment. Our analysis provides compelling evidence for the superiority of LHP methods over SFT in optimizing LLMs' behavior, underscoring the potential of LHP methods in better aligning LLMs with educational objectives and fostering effective learning experiences. Code and models are available \href{<a class="link-external link-https" href="https://github.com/luffycodes/Tutorbot-Spock" rel="external noopener nofollow">this https URL</a>}{here}.

Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Aligning Large Language Models with Counterfactual DPO

Outcome-Constrained Large Language Models for Countering Hate Speech

On Zero-Shot Counterspeech Generation by LLMs

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

Aligning Large Language Models via Fine-grained Supervision

Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF

Preference Alignment Improves Language Model-Based TTS

Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization

Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

Unintended Impacts of LLM Alignment on Global Representation

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

Pedagogical Alignment of Large Language Models

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time