Tele-FLM Technical Report

Xiang Li,Yiqun Yao,Xin Jiang,Xuezhi Fang,Chao Wang,Xinzhang Liu,Zihan Wang,Yu Zhao,Xin Wang,Yuyao Huang,Shuangyong Song,Yongxiang Li,Zheng Zhang,Bo Zhao,Aixin Sun,Yequan Wang,Zhongjiang He,Zhongyuan Wang,Xuelong Li,Tiejun Huang
2024-04-25
Abstract:Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper addresses the problem of efficient scaling of large-scale language models (LLMs), particularly how to scale the parameter size of LLMs to over 50 billion while minimizing trial-and-error costs and computational resources. In this study, the authors introduce Tele-FLM, a multilingual open-source LLM with 52 billion parameters, which possesses a stable pre-training paradigm and enhanced fact-checking ability. By sharing the model design, engineering practices, and training details, the paper aims to reduce redundant experiments and optimize resource utilization, fostering advancements in academia and industry. Tele-FLM demonstrates impressive performance in benchmark tests for both English and Chinese, comparable to larger-scale models.