Abstract:Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

What problem does this paper attempt to address?

The paper primarily focuses on the effective fine-tuning of large foundational models, particularly on how to reduce the number of trainable parameters while maintaining model performance. To address this issue, the paper proposes a new method called "Butterfly Orthogonal Fine-Tuning" (BOFT). ### Research Background As large foundational models like ChatGPT and Stable Diffusion demonstrate exceptional generalization capabilities, the number of parameters in these models has also increased dramatically (for example, GPT-3 has about 175 billion parameters). This makes training these models from scratch extremely expensive and difficult to achieve. Therefore, efficiently adapting these powerful pre-trained models to downstream tasks becomes particularly important. Currently, common efficient task adaptation methods include model fine-tuning, adapter fine-tuning, and prompt fine-tuning. ### Main Contributions 1. **Orthogonal Fine-Tuning from the Perspective of Information Transmission**: The authors first re-examine Orthogonal Fine-Tuning (OFT) from the perspective of information transmission and identify several key requirements to achieve better parameter efficiency. Inspired by the butterfly structure in the Cooley-Tukey Fast Fourier Transform algorithm, they propose an efficient orthogonal parameterization method based on the butterfly structure. 2. **Butterfly Orthogonal Fine-Tuning (BOFT)**: By applying the butterfly structure to OFT, a new parameter-efficient fine-tuning method called BOFT is created. This method not only significantly reduces the number of trainable parameters but also retains good expressive power and training stability. 3. **Theoretical Insights**: The paper provides several theoretical insights into why BOFT can maintain good expressiveness and training stability while significantly reducing the number of trainable parameters. Additionally, through matrix decomposition, BOFT also exhibits an interesting weight interpolation property. 4. **Extensive Application Demonstration**: The paper is the first to apply orthogonal fine-tuning to various tasks beyond controllable text-to-image generation, demonstrating its great potential as a general model fine-tuning method. Specifically, BOFT is applied to downstream tasks in multiple fields such as computer vision and natural language processing, showing significant advantages over existing state-of-the-art methods. ### Core Innovations - **Application of Butterfly Structure**: Utilizing the butterfly structure to enhance the parameter efficiency of OFT, thereby enabling the construction of dense orthogonal matrices without losing parameter efficiency. - **Theoretical and Empirical Analysis**: Not only theoretically proving that BOFT has higher expressiveness compared to OFT but also conducting extensive experimental validation on multiple downstream tasks, demonstrating its superior parameter efficiency and generalization capability. In summary, the paper proposes a new fine-tuning method, BOFT, which significantly reduces the number of trainable parameters while ensuring model performance, making it highly significant for the effective application of large-scale foundational models.

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform

Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-Trained Code Models.

Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning

Visual Fourier Prompt Tuning

Sparse Orthogonal Parameters Tuning for Continual Learning

BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models

HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

Large Convolutional Model Tuning via Filter Subspace

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

Rethinking Efficient Tuning Methods from a Unified Perspective

Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

Parameter-Efficient Fine-Tuning With Adapters

Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications