Abstract:The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Achieving Peak Performance for Large Language Models: A Systematic Review

Exploring Design Choices for Building Language-Specific LLMs

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks

Scaling Law for Language Models Training Considering Batch Size

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting

Large Language Models aren't all that you need

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation

How to Train Data-Efficient LLMs

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis

How do Large Language Models Handle Multilingualism?

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

A Practical Guide to Fine-tuning Language Models with Limited Data

OptLLM: Optimal Assignment of Queries to Large Language Models

Crafting Efficient Fine-Tuning Strategies for Large Language Models

On Speeding Up Language Model Evaluation