Abstract:The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Label Supervised LLaMA Finetuning

Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision

LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language Models

Instruction Mining: Instruction Data Selection for Tuning Large Language Models

Scaling Instruction-Finetuned Language Models

A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Fine-tuning Language Models with Generative Adversarial Feedback

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection

Learning Dynamics of LLM Finetuning

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models