LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Bo Zou,Chao Yang,Yu Qiao,Chengbin Quan,Youjian Zhao

2024-04-01

Abstract:Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore, we unify the modeling of multi-modal tuning and language-only tuning, extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our proposed approach is evaluated in language-only and multi-modal tuning experimental scenarios. Notably, LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement (+6%) on the MMLU benchmark. In the visual instruction tuning, we achieve a new state-of-the-art image captioning performance of 157.5 CIDEr on MSCOCO, and a comparable performance (88.39%) on ScienceQA to cutting-edge models with more parameters and extensive vision-language pertaining.

Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issues encountered by large language models (LLMs) during fine-tuning, specifically how to enhance instruction-following capabilities without sacrificing their pre-trained abilities. Existing fine-tuning methods such as Adapter, Prefix-tuning, and LoRA, while capable of injecting new skills or knowledge, may impair the inherent capabilities of LLMs, leading to catastrophic forgetting and other issues. To tackle these problems, the paper proposes LLaMA-Excitor, a lightweight approach that stimulates the potential of LLMs through indirect feature interaction, enabling them to better follow instructions. LLaMA-Excitor introduces learnable prompts in the self-attention mechanism, gradually increasing the focus on valuable information without directly altering the intermediate hidden states. This method ensures that the pre-trained knowledge of LLMs is effectively preserved during fine-tuning on low-quality or non-target datasets. Additionally, LLaMA-Excitor unifies the fine-tuning of multimodal and pure text tasks, allowing language models to be cost-effectively extended into powerful vision-language models without complex multimodal alignment. This is particularly notable in tasks such as image caption generation and visual question answering.

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Pay Attention to What Matters

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

TAIA: Large Language Models are Out-of-Distribution Data Learners

Multimodal Instruction Tuning with Conditional Mixture of LoRA

Label Supervised LLaMA Finetuning

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching

Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning