Abstract:The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training, while post-training is about giving a model the right style and format. We re-examine these claims by empirically studying the scaling behavior of post-training with increasing finetuning examples and evaluating them using objective task-specific standardized benchmarks. Through experiments with the Llama-3, Mistral, and Llama-2 model families of multiple sizes, we observe that, similar to the pre-training scaling laws, post-training task performance scales as a power law against the number of finetuning examples. This power law relationship holds across a broad array of capabilities, including mathematical reasoning, coding, instruction following, and multihop-reasoning. In addition, for tasks like math and multihop reasoning, we observe that a handful of examples merely align the model stylistically but do not saturate performance on the benchmarks. Model performance is instead correlated with its reasoning ability and it improves significantly with more examples, illustrating the need for holistic evaluation programs leveraging objective benchmarks in addition to measurement of alignment to human preferences. We also observe that language models are not necessarily limited to using knowledge learned during pre-training. With appropriate post-training, a model's ability to integrate new knowledge greatly improves on downstream tasks like multihop question-answering. Taken together, these results shed new light on the Superficial Alignment Hypothesis, suggesting that it is, at best, an over-simplification.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to re - examine and verify the "Superficial Alignment Hypothesis" and explore the performance of large - language models (LLMs) in the fine - tuning stage and the mechanism of their capacity improvement. Specifically, the researchers focus on the following key issues: 1. **How does the performance of the fine - tuned model change with the size of the data set?** - The researchers experimentally verified the power - law relationship between the performance of the fine - tuned model and the amount of data, that is: \[ P\propto D^{1 / b} \] where \(P\) is the task performance, \(D\) is the number of fine - tuning samples, and \(b\) is a constant. 2. **Does the model significantly improve task - related abilities, or does it only learn the response style?** - Through in - depth analysis of tasks such as mathematical reasoning and multi - step reasoning, the researchers found that during the fine - tuning process, the model not only improves the format and style but also significantly enhances its reasoning ability and task - execution ability. 3. **Can the model integrate new knowledge beyond the pre - training knowledge cut - off date?** - Experiments show that through appropriate fine - tuning or retrieval - augmented generation (RAG), the model can effectively learn and utilize new knowledge, especially in multi - step reasoning tasks. ### Main contributions of the paper - **Re - evaluating the Superficial Alignment Hypothesis**: The research shows that the Superficial Alignment Hypothesis is overly simplified and ignores the improvement of the model's reasoning ability and new - knowledge - integration ability during the fine - tuning process. - **Proposing a more comprehensive evaluation method**: Emphasize using objective task - specific benchmark tests to evaluate model performance rather than relying solely on subjective win - rate comparisons. - **Demonstrating the substantial improvement of the model's ability by fine - tuning**: Experiments on multiple model families and tasks prove that fine - tuning can not only improve the model's style and format but also significantly improve its reasoning and task - execution ability. - **Exploring the learning and integration of new knowledge**: It is shown that fine - tuning can help the model overcome the problem of pre - training knowledge cut - off and better utilize new knowledge. ### Conclusion This research shows that fine - tuning not only makes the model adapt to a certain style or format but also can significantly improve its reasoning ability and task - execution ability. Therefore, future fine - tuning work should pay more attention to the improvement of task - specific abilities rather than just superficial alignment. In addition, the research also points out effective methods for introducing new knowledge, such as further fine - tuning and retrieval - augmented generation, which are of great significance for expanding the knowledge boundaries of the model.

Revisiting the Superficial Alignment Hypothesis

An Emulator for Fine-Tuning Large Language Models using Small Language Models

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Progress or Regress? Self-Improvement Reversal in Post-training

Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Pedagogical Alignment of Large Language Models

Making Large Language Models Better Reasoners with Alignment

Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

A Post-Training Enhanced Optimization Approach for Small Language Models

Aligning the Pretraining and Finetuning Objectives of Language Models

Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold