Mercury: Fast and Optimal Device Placement for Large Deep Learning Models.

Hengwei Xu,Pengyuan Zhou,Haiyong Xie,Yong Liao
DOI: https://doi.org/10.1145/3605573.3605603
2023-01-01
Abstract:The rapidly expanding neural network models are becoming increasingly challenging to run on a single device. Hence, model parallelism over multiple devices is critical to guaranteeing the efficiency of training large models. Recent proposals either have long processing time or poor performance. Therefore, we propose Mercury, a fast framework for optimizing device placement for large models. Mercury employs a simple but efficient model parallelization strategy in the baseline measurement, and generates placement policies through a series of scheduling algorithms. We conduct experiments to deploy and evaluate Mercury on numerous large models. The results show that Mercury not only reduces the placement policy generation time by 26.4% but also improves the model throughput by 218.5% compared to the most advanced methods.
What problem does this paper attempt to address?