Abstract:Instruction tuning has become an important step for finetuning pretrained language models to better follow human instructions and generalize on various tasks. Nowadays, pretrained language models become increasingly larger, and full parameter finetuning is overwhelmingly costly. Therefore, Parameter Efficient Finetuning (PEFT) has arisen as a cost-effective practice for instruction tuning because of significantly smaller computational, memory, and storage cost compared to full finetuning. Despite their widespread adaptations, the vast hyperparameter spaces, the number of PEFT methods, the different focus of instruction tuning capabilities make disentangling the impact of each aspect difficult. This study systematically investigates several representative PEFT methods, surveying the effect of hyperparameter choices including training hyperparameters and PEFT-specific hyperparameters, how different models sizes and the number of instruction tasks affect the performance, in-task-distribution memorization and open instruction following capability. Our empirical study shows that only LoRA and adapter can get close to full finetuning with ideal training settings. The ideal training setting includes an appropriate learning rate, largest LoRA rank or adapter size allowed and diverse training tasks. On the other hand, LoRA and adapter suffer from training instability if such an ideal training condition is not met. Additionally, LoRA requires a greater number of tasks for effective unseen task generalization, exhibit slower learning speed. Moreover, LoRA has weaker task-level memorization. Lastly, LoRA and adapter fall short in complex reasoning, coding and long-form generation compared to finetuning in open instruction tuning settings but it shows stronger capabilities compared to adapter.

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance

On the loss of context-awareness in general instruction fine-tuning

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Instruction Following without Instruction Tuning

Context-dependent Instruction Tuning for Dialogue Response Generation

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Exploring the Relationship between In-Context Learning and Instruction Tuning

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods

Parameter Efficient Instruction Tuning: An Empirical Study

Learning or Self-aligning? Rethinking Instruction Fine-tuning

Exploring Format Consistency for Instruction Tuning

Learning Dynamics of LLM Finetuning

A Closer Look at the Limitations of Instruction Tuning

Contrastive Instruction Tuning

Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Instruction Tuning With Loss Over Instructions

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

On Instruction-Finetuning Neural Machine Translation Models

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?