EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
Song Guo,Fan Wu,Lei Zhang,Xiawu Zheng,Shengchuan Zhang,Fei Chao,Yiyu Shi,Rongrong Ji
2024-02-19
Abstract:Existing methods for fine-tuning sparse LLMs often suffer from
resource-intensive requirements and high retraining costs. Additionally, many
fine-tuning methods often rely on approximations or heuristic optimization
strategies, which may lead to suboptimal solutions. To address these issues, we
propose an efficient and fast framework for fine-tuning sparse LLMs based on
minimizing reconstruction error. Our approach involves sampling a small dataset
for calibration and utilizing backpropagation to iteratively optimize
block-wise reconstruction error, on a block-by-block basis, aiming for optimal
solutions. Extensive experiments on various benchmarks consistently demonstrate
the superiority of our method over other baselines. For instance, on the
Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a
perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of
75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a
perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the
fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes,
and the entire framework can be executed on a single 16GB GPU. The source code
is available at https://github.com/sunggo/EBFT.
Computation and Language,Machine Learning,Artificial Intelligence