Abstract:The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

Early Exit is a Natural Capability in Transformer-based Models: an Empirical Study on Early Exit Without Joint Optimization

Maybe Only 0.5 Training Data Instruction Tuning

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Tuning Language Models by Mixture-of-Depths Ensemble

Large Language Models for Tuning Evolution Strategies

Learning Global Controller in Latent Space for Parameter-Efficient Fine-Tuning

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

LATuner: an LLM-Enhanced Database Tuning System Based on Adaptive Surrogate Model

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models