Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

Xinyu Ma,Xu Chu,Zhibang Yang,Yang Lin,Xin Gao,Junfeng Zhao
2024-06-07
Abstract:With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge. Despite the empirical effectiveness, OFT still suffers low parameter efficiency at $\mathcal{O}(d^2)$ and limited capability of downstream adaptation. Inspired by Givens rotation, in this paper, we proposed quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems. We first use $\mathcal{O}(d)$ Givens rotations to accomplish arbitrary orthogonal transformation in $SO(d)$ with provable equivalence, reducing parameter complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$. Then we introduce flexible norm and relative angular adjustments under soft orthogonality regularization to enhance the adaptation capability of downstream semantic deviations. Extensive experiments on various tasks and pretrained models validate the effectiveness of our methods.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper mainly addresses the issue of parameter-efficient fine-tuning of pre-trained models for downstream tasks. Specifically, the paper attempts to solve the following two key problems: 1. **Improving the parameter efficiency of Orthogonal Fine-Tuning (OFT) methods**: Although existing OFT methods can effectively retain the knowledge of pre-trained models, their parameter complexity is high, being O(d^2), where d represents the hidden dimension of the linear layer. This results in a large number of parameters, thereby affecting the adaptability and efficiency of the model. The paper proposes a method based on Givens rotations (referred to as GOFT), which can achieve arbitrary orthogonal transformations using only O(d) parameters, significantly reducing the parameter complexity. 2. **Enhancing the adaptability to semantic shifts in downstream tasks**: While OFT performs well in retaining pre-trained knowledge, it strictly maintains the relative angles and norms between weight vectors, which limits its adaptability to subtle semantic changes in downstream tasks. To address this issue, the paper further proposes quasi-Givens Orthogonal Fine-Tuning (qGOFT), a method that allows fine-tuning adjustments of the norms and relative angles of weight vectors, thereby enhancing the model's ability to learn semantic shifts in downstream tasks. In summary, the main contributions of the paper are the proposal of two new fine-tuning methods—GOFT and qGOFT. These methods not only improve parameter efficiency during the fine-tuning process but also enhance the model's adaptability to downstream tasks. Through theoretical analysis and experimental validation, both methods have demonstrated good performance.