PUZZLE: Efficiently Aligning Large Language Models Through Light-Weight Context Switch.

Kinman Lei,Yuyang Jin,Mingshu Zhai,Kezhao Huang,Haoxing Ye,Jidong Zhai
2024-01-01
Abstract:Aligning Large Language Models (LLMs) is currently the primary method to ensure AI systems operate in an ethically responsible and socially beneficial manner. Its paradigm differs significantly from standard pre-training or fine-tuning processes, involving multiple models and workloads (context), and necessitates frequently switching execution, introducing significant overhead, such as parameter updates and data transfer, which poses a critical challenge: efficiently switching between different models and workloads. To address these challenges, we introduce PUZZLE, an efficient system for LLM alignment. We explore model orchestration as well as light-weight and smooth workload switching in aligning LLMs by considering the similarity between different workloads, Specifically, PUZZLE a two-dimensional approach for efficient switching, focusing on both Mira- and inter-stage switching. Within each stage, switching costs are minimized by exploring model affinities and overlapping computation via time-sharing. Furthermore, a similarity-oriented strategy is employed to find the optimal inter-stage switch plan with the minimum communication cost. We evaluate PUZZLE on various clusters with up to 32 GPUs. Results show that PUZZLE achieves up to 2.12x speedup compared with the state-of-the-art RLHF training system DeepSpeed-Chat.
What problem does this paper attempt to address?