Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

Minghao Wu,Thuy-Trang Vu,Lizhen Qu,Gholamreza Haffari
2024-10-06
Abstract:Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To validate the effectiveness of MoS, we conduct extensive experiments using three diverse LLM backbones on two widely used benchmarks and demonstrate that MoS substantially enhances model performance. Building on the success of MoS, we propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses the utilities of various datasets for a specific purpose. Our work underlines the significance of dataset rebalancing and present MoS as a powerful, general solution for optimizing data usage in the fine-tuning of LLMs for various purposes.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper attempts to address the issue of how to optimize data usage when fine-tuning large-scale language models (LLMs) to balance the development of different skills and ensure overall model performance. Specifically, the paper focuses on the following challenges: 1. **Heterogeneity and Imbalance of Datasets**: Different datasets have varying characteristics and scales, making the fine-tuning process highly challenging. Traditional static sampling methods cannot effectively handle this heterogeneity and imbalance. 2. **Maximizing Data Utilization**: Existing methods often limit the use of datasets to prevent the model from being overwhelmed by large amounts of data, but this restricts the full utilization of all available data. 3. **Dynamic Adjustment of Data Usage**: There is a need for a framework that can dynamically adjust data usage based on the model's current learning state to optimize the development of different skills. To address these issues, the paper proposes a general, model-agnostic reinforcement learning framework called MIXTURE-OF-SKILLS (MOS), which can automatically optimize data usage during the fine-tuning process. MOS dynamically adjusts the sampling probabilities of different datasets to ensure optimal development of the model across various skills. Additionally, the paper introduces a variant called MOSPEC, tailored for fine-tuning specific tasks, further demonstrating the flexibility and effectiveness of MOS in practical applications.