Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin,Baizhou Huang,Haotian Ye,Qinyu Chen,Zihao Wang,Sujian Li,Jianzhu Ma,Xiaojun Wan,James Zou,Yitao Liang
2024-05-29
Abstract:The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law. Unlike pre-training, we find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing Scaling Law fails to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our Rectified Scaling Law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection. The project page is available at <a class="link-external link-http" href="http://rectified-scaling-law.github.io" rel="external noopener nofollow">this http URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the problem of how to select the most suitable large-scale language model (LLM) for fine-tuning under resource-constrained conditions. As the ecosystem of large-scale language models grows increasingly vast, choosing the most appropriate model for fine-tuning from numerous pre-trained models becomes highly challenging. Due to resource limitations (such as time, computational power, and storage space), it is impossible to fine-tune all candidate models before making a selection. Additionally, relying on empirical intuition-based selection methods (such as choosing the model with the most parameters or the best zero-shot performance) is also unreliable. Specifically, the paper focuses on the following aspects: 1. **Model Selection under Resource Constraints**: How to efficiently and accurately predict the performance of models after fine-tuning and select the best model under limited resources. 2. **Limitations of Existing Methods**: Most existing model selection methods are suitable for classification and regression tasks, but not for fine-tuning generative language models. 3. **Fine-Tuning Scaling Laws**: Exploring the laws of performance changes with data volume during fine-tuning, particularly discovering a previously unobserved "pre-power phase." ### Solutions To address the above problems, the paper proposes the following solutions: 1. **Rectified Scaling Law**: Introducing the concept of "pre-learned data size" to improve existing scaling laws, making them better fit experimental results. 2. **Novel LLM Selection Algorithm**: Based on the rectified scaling law, designing an algorithm called "Accept then Stop" (AtS), which can select a near-optimal model while reducing resource consumption by hundreds of times. ### Main Contributions 1. **Theoretical Analysis**: Explaining why existing scaling laws fail to capture phase transitions during the fine-tuning process and proposing a rectified scaling law. 2. **Experimental Validation**: Extensively validating the effectiveness of the rectified scaling law and the AtS algorithm through experiments, demonstrating their robustness and accuracy across different datasets and resource constraints. 3. **Practical Application**: Providing new methods and tools for selecting suitable LLMs for fine-tuning in resource-constrained environments, significantly improving the efficiency and accuracy of the selection process. In summary, through theoretical analysis and experimental proof, this paper demonstrates that the proposed methods can efficiently and accurately select the most suitable large-scale language model for fine-tuning under resource-constrained conditions.