Abstract:The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

Cross-model Control: Improving Multiple Large Language Models in One-time Training

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Collaborative Training of Tiny-Large Vision Language Models

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

CPM-2: Large-scale Cost-effective Pre-trained Language Models

A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA Algorithm

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Unlocking Continual Learning Abilities in Language Models

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models