Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

Minghao Wu,Thuy-Trang Vu,Lizhen Qu,Gholamreza Haffari

2024-10-06

Abstract:Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To validate the effectiveness of MoS, we conduct extensive experiments using three diverse LLM backbones on two widely used benchmarks and demonstrate that MoS substantially enhances model performance. Building on the success of MoS, we propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses the utilities of various datasets for a specific purpose. Our work underlines the significance of dataset rebalancing and present MoS as a powerful, general solution for optimizing data usage in the fine-tuning of LLMs for various purposes.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper attempts to address the issue of how to optimize data usage when fine-tuning large-scale language models (LLMs) to balance the development of different skills and ensure overall model performance. Specifically, the paper focuses on the following challenges: 1. **Heterogeneity and Imbalance of Datasets**: Different datasets have varying characteristics and scales, making the fine-tuning process highly challenging. Traditional static sampling methods cannot effectively handle this heterogeneity and imbalance. 2. **Maximizing Data Utilization**: Existing methods often limit the use of datasets to prevent the model from being overwhelmed by large amounts of data, but this restricts the full utilization of all available data. 3. **Dynamic Adjustment of Data Usage**: There is a need for a framework that can dynamically adjust data usage based on the model's current learning state to optimize the development of different skills. To address these issues, the paper proposes a general, model-agnostic reinforcement learning framework called MIXTURE-OF-SKILLS (MOS), which can automatically optimize data usage during the fine-tuning process. MOS dynamically adjusts the sampling probabilities of different datasets to ensure optimal development of the model across various skills. Additionally, the paper introduces a variant called MOSPEC, tailored for fine-tuning specific tasks, further demonstrating the flexibility and effectiveness of MOS in practical applications.

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

MoExtend: Tuning New Experts for Modality and Task Extension

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents

MultiSkill: Evaluating Large Multimodal Models for Fine-grained Alignment Skills

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Enhancing Subtask Performance of Multi-modal Large Language Model

A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training