Abstract:Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task.We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization.ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Less-forgetting Multi-lingual Fine-tuning

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models Via MoE-Style Plugin.

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization

Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning

MoExtend: Tuning New Experts for Modality and Task Extension

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning