RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

Bowen Tan,Yun Zhu,Lijuan Liu,Hongyi Wang,Yonghao Zhuang,Jindong Chen,Eric Xing,Zhiting Hu
2024-06-13
Abstract:The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present RedCoast (Redco), a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallelism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, avoiding redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. As a result, Redco implementations exhibit significantly fewer lines of code compared to their official counterparts.
Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address several key challenges in the distributed training of large language models (LLMs): 1. **High Memory Requirements**: As the number of parameters in LLMs continues to increase, the memory of a single GPU or TPU often cannot meet the model's needs, necessitating the partitioning of the model and distributed training across multiple devices. 2. **Complex Model Parallelism Techniques**: Existing model parallelism tools (such as Megatron-LM, DeepSpeed, and Alpa) provide solutions but require users to have deep knowledge of machine learning systems (MLSys) and involve a significant amount of coding and configuration work. 3. **Low Development Efficiency**: In traditional ML pipeline development, there is a lot of repetitive boilerplate code, such as backpropagation, gradient application, and batch iteration, which increases the complexity and time cost of development. To address these challenges, the paper introduces RedCoast (Redco), a lightweight and user-friendly tool designed to automate the distributed training and inference of LLMs, simplifying the development process of ML pipelines. Specifically, Redco achieves this goal through the following two key aspects: 1. **Automatic Model Parallelism**: Redco automatically generates tensor parallelism strategies suitable for any given LLM by identifying and integrating two simple rules, thereby eliminating the need for additional coding and complex configuration. 2. **Concise ML Pipeline Development Mechanism**: Redco allows users to design ML pipelines by defining three intuitive functions, while Redco handles all underlying execution details such as data parallelism, multi-host related processing, checkpoint management, and more. Through these designs, Redco not only improves development efficiency but also makes it easy for users without an MLSys background to use the tool.