VeRA: Vector-based Random Matrix Adaptation

Dawid J. Kopiczko,Tijmen Blankevoort,Yuki M. Asano
2024-01-17
Abstract:Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B language models.
Computation and Language
What problem does this paper attempt to address?
This paper mainly discusses how to adapt large-scale language models more effectively when fine-tuning them for specific tasks, in order to address the storage challenge and excessive parameter count. The study proposes the Vector-based Random Matrix Adaptation (VeRA) method, which requires fewer training parameters compared to the current popular Low-rank Adaptation (LoRA) method while maintaining similar performance. VeRA reduces the parameter count by sharing a pair of low-rank matrices and trainable scaling vectors, which are frozen across all layers, allowing the model to store more versions in limited GPU memory. The main contributions of the paper are as follows: 1. It proposes a new fine-tuning method, VeRA, which incurs no additional runtime cost during inference and uses fewer training parameters compared to the LoRA method. 2. VeRA is compared with LoRA and other parameter-efficient adaptation methods on GLUE, E2E benchmark tests, and image classification tasks. 3. A ablation study is conducted to understand the impact of each component in the VeRA method. The paper also points out that despite the reduction of parameters by existing methods such as LoRA, there are still a large number of trainable parameters. VeRA further reduces parameters by utilizing random matrices and vectors, achieving efficient adaptation through sharing matrices and adapting only scaling vectors. Experimental results show that VeRA performs similarly to LoRA on multiple tasks but with significantly reduced parameter count, making it suitable for fine-tuning large and complex models.