Abstract:Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with $<0.1\%$ trainable parameters; (2) RoAd facilitates the efficient serving of requests requiring different adapters within a batch, with an overhead comparable to element-wise multiplication instead of batch matrix multiplication; (3) RoAd enhances LLM's interpretability through integration within a framework of distributed interchange intervention, demonstrated via composition experiments.

What problem does this paper attempt to address?

This paper attempts to solve the following three main problems: 1. **Parameter - efficient finetuning (PEFT)**: - Although existing PEFT methods can significantly reduce storage and GPU memory requirements, they encounter challenges when dealing with multi - task or multi - user adapters. Specifically, when different requests in the same batch require different adapters, how to efficiently deploy these adapters is a key issue. - The paper points out that existing methods usually rely on batch matrix multiplication when dealing with this situation, which will introduce large computational overhead. 2. **Interpretability of LLMs**: - Large language models (LLMs) contain billions of parameters, and it is very difficult to understand how they work. PEFT methods improve the interpretability of models by limiting the number of training parameters. - Although existing methods can improve interpretability to a certain extent, there is still room for improvement. In particular, how to combine PEFT methods with the intervention framework to further improve interpretability. 3. **Batching efficiency and Composability**: - When multiple users submit requests simultaneously, how to efficiently process these requests in one batch, and each request may require a different adapter, which poses a challenge to batching efficiency. - In addition, how to design a method so that the weights of different tasks can be combined to achieve multi - task learning ability is also an important issue. To solve these problems, the author proposes a new method - **2D Rotation Adaptation (RoAd)**. RoAd solves the above problems in the following ways: - **Parameter - efficient**: RoAd can achieve optimal performance using less than 0.1% of the trainable parameters, and is suitable for GLUE benchmarks, common - sense reasoning and arithmetic reasoning tasks. - **Batching - efficient**: RoAd processes requests for different adapters through element - level multiplication instead of matrix multiplication, thereby significantly improving batching efficiency, with a throughput twice that of LoRA. - **Interpretability**: RoAd can enhance the interpretability of the model through the distributed interchange intervention framework, demonstrating its ability to combine the weights of different tasks. In summary, RoAd not only performs well in parameter efficiency, but also makes significant progress in batching efficiency and model interpretability, thus solving various challenges faced by existing PEFT methods.

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Parameter-Efficient Fine-Tuning With Adapters

Composing Parameter-Efficient Modules with Arithmetic Operations

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile Scenarios

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation

MoRe Fine-Tuning with 10x Fewer Parameters

Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: an Empirical Study

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

Inducing Generalization across Languages and Tasks using Featurized Low-Rank Mixtures

Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning