ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Zhiyu Mei,Wei Fu,Kaiwei Li,Guangju Wang,Huanchen Zhang,Yi Wu
2024-06-20
Abstract:Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to $4\times70$ billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of $2.0-10.6\times$ compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of $26\%$ performance improvement over heuristic approaches based on Megatron-LM. The source code of ReaLHF is publicly available at <a class="link-external link-https" href="https://github.com/openpsi-project/ReaLHF" rel="external noopener nofollow">this https URL</a> .
Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the training process of reinforcement learning from human feedback (RLHF) for large - scale language models (LLM), how to improve training efficiency and resource utilization. Specifically, existing RLHF systems directly adopt the parallelization techniques in supervised training, which leads to two main problems: 1. **Over - parallelization**: When the system adopts the same parallelization strategy for each GPU node, it will lead to a large amount of synchronization and communication overhead, thus reducing the performance of the overall system. 2. **Insufficient resource utilization caused by asymmetric parallelization**: Different computing tasks require different parallelization strategies, but the fixed task allocation method will cause some GPUs to be idle and fail to fully utilize hardware resources. To solve these problems, the paper proposes a new method - parameter reallocation, that is, dynamically adjusting the distribution of model parameters among different GPUs during the training process. In this way, redundant communication can be eliminated and the utilization rate of GPUs can be maximized, thereby significantly improving the efficiency of RLHF training. ### Main contributions 1. **Propose a method for dynamically reallocating model parameters**: Dynamically adjust the distribution of model parameters among different GPUs to meet the needs of different computing tasks. 2. **Introduce a general formulating method and an effective search algorithm**: Used to discover efficient RLHF execution plans. 3. **Design and implement the ReaLHF system**: This system can automatically discover and run fast execution plans with high throughput. 4. **Conduct a comprehensive experimental evaluation**: It shows that ReaLHF has a significant performance improvement compared to the baseline system, with a speed increase of 2.0 to 10.6 times, and in specific cases, the performance is improved by 80%. ### Technical details The ReaLHF system consists of two parts: - **Execution plan generator**: Use the Markov Chain Monte Carlo (MCMC) algorithm for searching, and combine it with a lightweight cost estimator to find the optimal execution plan. - **Runtime engine**: According to the generated execution plan, effectively parallelize the calculation and reallocate the model parameters. Through these innovations, ReaLHF can achieve higher efficiency and better resource utilization in the RLHF training of large - scale language models.