VeRA: Vector-based Random Matrix Adaptation

Dawid J. Kopiczko,Tijmen Blankevoort,Yuki M. Asano

2024-01-17

Abstract:Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B language models.

Computation and Language

What problem does this paper attempt to address?

This paper mainly discusses how to adapt large-scale language models more effectively when fine-tuning them for specific tasks, in order to address the storage challenge and excessive parameter count. The study proposes the Vector-based Random Matrix Adaptation (VeRA) method, which requires fewer training parameters compared to the current popular Low-rank Adaptation (LoRA) method while maintaining similar performance. VeRA reduces the parameter count by sharing a pair of low-rank matrices and trainable scaling vectors, which are frozen across all layers, allowing the model to store more versions in limited GPU memory. The main contributions of the paper are as follows: 1. It proposes a new fine-tuning method, VeRA, which incurs no additional runtime cost during inference and uses fewer training parameters compared to the LoRA method. 2. VeRA is compared with LoRA and other parameter-efficient adaptation methods on GLUE, E2E benchmark tests, and image classification tasks. 3. A ablation study is conducted to understand the impact of each component in the VeRA method. The paper also points out that despite the reduction of parameters by existing methods such as LoRA, there are still a large number of trainable parameters. VeRA further reduces parameters by utilizing random matrices and vectors, achieving efficient adaptation through sharing matrices and adapting only scaling vectors. Experimental results show that VeRA performs similarly to LoRA on multiple tasks but with significantly reduced parameter count, making it suitable for fine-tuning large and complex models.

VeRA: Vector-based Random Matrix Adaptation

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Batched Low-Rank Adaptation of Foundation Models

Sparse Low-rank Adaptation of Pre-trained Language Models

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers

LoRA+: Efficient Low Rank Adaptation of Large Models

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

GeoLoRA: Geometric integration for parameter efficient fine-tuning

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

LoRTA: Low Rank Tensor Adaptation of Large Language Models

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning