Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

Xinyu Ma,Xu Chu,Zhibang Yang,Yang Lin,Xin Gao,Junfeng Zhao

2024-06-07

Abstract:With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge. Despite the empirical effectiveness, OFT still suffers low parameter efficiency at $\mathcal{O}(d^2)$ and limited capability of downstream adaptation. Inspired by Givens rotation, in this paper, we proposed quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems. We first use $\mathcal{O}(d)$ Givens rotations to accomplish arbitrary orthogonal transformation in $SO(d)$ with provable equivalence, reducing parameter complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$. Then we introduce flexible norm and relative angular adjustments under soft orthogonality regularization to enhance the adaptation capability of downstream semantic deviations. Extensive experiments on various tasks and pretrained models validate the effectiveness of our methods.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The paper mainly addresses the issue of parameter-efficient fine-tuning of pre-trained models for downstream tasks. Specifically, the paper attempts to solve the following two key problems: 1. **Improving the parameter efficiency of Orthogonal Fine-Tuning (OFT) methods**: Although existing OFT methods can effectively retain the knowledge of pre-trained models, their parameter complexity is high, being O(d^2), where d represents the hidden dimension of the linear layer. This results in a large number of parameters, thereby affecting the adaptability and efficiency of the model. The paper proposes a method based on Givens rotations (referred to as GOFT), which can achieve arbitrary orthogonal transformations using only O(d) parameters, significantly reducing the parameter complexity. 2. **Enhancing the adaptability to semantic shifts in downstream tasks**: While OFT performs well in retaining pre-trained knowledge, it strictly maintains the relative angles and norms between weight vectors, which limits its adaptability to subtle semantic changes in downstream tasks. To address this issue, the paper further proposes quasi-Givens Orthogonal Fine-Tuning (qGOFT), a method that allows fine-tuning adjustments of the norms and relative angles of weight vectors, thereby enhancing the model's ability to learn semantic shifts in downstream tasks. In summary, the main contributions of the paper are the proposal of two new fine-tuning methods—GOFT and qGOFT. These methods not only improve parameter efficiency during the fine-tuning process but also enhance the model's adaptability to downstream tasks. Through theoretical analysis and experimental validation, both methods have demonstrated good performance.

Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Gradient-based Parameter Selection for Efficient Fine-Tuning

Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization

Effective and Efficient Few-shot Fine-tuning for Vision Transformers

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

Generative Parameter-Efficient Fine-Tuning

On the Effectiveness of Parameter-Efficient Fine-Tuning

Orthogonal Finetuning for Direct Preference Optimization

CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition

HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy

QEFT: Quantization for Efficient Fine-Tuning of LLMs

Efficient coordinate-descent for orthogonal matrices through Givens rotations

Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning