Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers

Zijie Tian,Youyou Lu,Yangyang Feng,J. Shu,Minhui Xie,Shuo Wang
DOI: https://doi.org/10.1145/3575693.3575703
2023-01-27
Abstract:Fine-tuning on cheap commodity GPU servers makes large-scale deep learning models benefit more people. However, the low inter-GPU communication bandwidth and pressing communication contention on the commodity GPU server obstruct training efficiency. In this paper, we present Mobius, a communication-efficient system for fine tuning large-scale models on commodity GPU servers. The key idea is a novel pipeline parallelism scheme enabling heterogeneous memory for large-scale model training, while bringing fewer communications than existing systems. Mobius partitions the model into stages and carefully schedules them between GPU memory and DRAM to overlap communication with computation. It formulates pipeline execution into a mixed-integer program problem to find the optimal pipeline partition. It also features a new stage-to-GPU mapping method termed cross mapping, to minimize communication contention. Experiments on various scale models and GPU topologies show that Mobius significantly reduces the training time by 3.8-5.1× compared with the prior art.
Computer Science,Engineering
What problem does this paper attempt to address?