52B to 1T: Lessons Learned via Tele-FLM Series

Xiang Li,Yiqun Yao,Xin Jiang,Xuezhi Fang,Chao Wang,Xinzhang Liu,Zihan Wang,Yu Zhao,Xin Wang,Yuyao Huang,Shuangyong Song,Yongxiang Li,Zheng Zhang,Bo Zhao,Aixin Sun,Yequan Wang,Zhongjiang He,Zhongyuan Wang,Xuelong Li,Tiejun Huang

2024-07-03

Abstract:Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper mainly addresses two issues: 1. **Exploration of Supervised Fine-tuning Strategies**: Researchers have found that supervised fine-tuning on large-scale language models can achieve good results using relatively small but high-quality datasets. Specifically, they fine-tuned the 5.2 billion parameter Tele-FLM model on a small amount of data in fields including mathematical problems, coding tasks, and multi-turn dialogues, achieving performance similar to or even better than that with larger datasets. This indicates that the strong capabilities of the base model can be well leveraged with a small amount of guided tasks, especially in conventional language understanding and generation tasks. 2. **Method of Gradually Increasing Model Size**: The paper also details the process of gradually expanding from a 5.2 billion parameter model to a 1 trillion parameter model, while maintaining the consistency of model functions and the effectiveness of training. The researchers used a technique called "Function-Preserving Growth," which allows the model to maintain the knowledge learned in previous stages while increasing the number of parameters. In this way, they successfully expanded the model from 5.2 billion parameters to 1 trillion parameters and plan to open-source the final 1 trillion parameter model checkpoint, Tele-FLM-1T, to facilitate further research and development. In summary, this paper aims to explore how to effectively use a small amount of high-quality data to improve the performance of large-scale language models and proposes a method to gradually increase the model size to overcome resource limitations and achieve training of ultra-large-scale models.

52B to 1T: Lessons Learned via Tele-FLM Series

Tele-FLM Technical Report

FLM-101B: An Open LLM and How to Train It with $100K Budget

Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Optimizing Distributed Training on Frontier for Large Language Models

FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data

Super Tiny Language Models

GeoGalactica: A Scientific Large Language Model in Geoscience

A Survey of Large Language Models

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Textbooks Are All You Need II: phi-1.5 technical report

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Xmodel-LM Technical Report

YuLan: An Open-source Large Language Model