3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

Baohao Liao,Christof Monz
2024-08-28
Abstract:Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with $<0.1\%$ trainable parameters; (2) RoAd facilitates the efficient serving of requests requiring different adapters within a batch, with an overhead comparable to element-wise multiplication instead of batch matrix multiplication; (3) RoAd enhances LLM's interpretability through integration within a framework of distributed interchange intervention, demonstrated via composition experiments.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
This paper attempts to solve the following three main problems: 1. **Parameter - efficient finetuning (PEFT)**: - Although existing PEFT methods can significantly reduce storage and GPU memory requirements, they encounter challenges when dealing with multi - task or multi - user adapters. Specifically, when different requests in the same batch require different adapters, how to efficiently deploy these adapters is a key issue. - The paper points out that existing methods usually rely on batch matrix multiplication when dealing with this situation, which will introduce large computational overhead. 2. **Interpretability of LLMs**: - Large language models (LLMs) contain billions of parameters, and it is very difficult to understand how they work. PEFT methods improve the interpretability of models by limiting the number of training parameters. - Although existing methods can improve interpretability to a certain extent, there is still room for improvement. In particular, how to combine PEFT methods with the intervention framework to further improve interpretability. 3. **Batching efficiency and Composability**: - When multiple users submit requests simultaneously, how to efficiently process these requests in one batch, and each request may require a different adapter, which poses a challenge to batching efficiency. - In addition, how to design a method so that the weights of different tasks can be combined to achieve multi - task learning ability is also an important issue. To solve these problems, the author proposes a new method - **2D Rotation Adaptation (RoAd)**. RoAd solves the above problems in the following ways: - **Parameter - efficient**: RoAd can achieve optimal performance using less than 0.1% of the trainable parameters, and is suitable for GLUE benchmarks, common - sense reasoning and arithmetic reasoning tasks. - **Batching - efficient**: RoAd processes requests for different adapters through element - level multiplication instead of matrix multiplication, thereby significantly improving batching efficiency, with a throughput twice that of LoRA. - **Interpretability**: RoAd can enhance the interpretability of the model through the distributed interchange intervention framework, demonstrating its ability to combine the weights of different tasks. In summary, RoAd not only performs well in parameter efficiency, but also makes significant progress in batching efficiency and model interpretability, thus solving various challenges faced by existing PEFT methods.