GPU-Efficient Deployment of Ring All-Reduce-Based Distributed Model Training in Tidal Computing Power Network

Yingbo Fan,Yajie Li,Boxin Zhang,Ling Chen,Yahui Wang,Jiaxing Guo,Wei Wang,Yongli Zhao,Jie Zhang
DOI: https://doi.org/10.1109/ACP/POEM59049.2023.10369900
2023-01-01
Abstract:This paper proposes a tidal-aware deployment algorithm for RAR-based DMT services in tidal CPN. The algorithm performance is evaluated in resource sufficient and constrained cases, respectively. Simulation results verify the benefit of reducing 20.6% GPU usage by dynamically partitioning training data and allocating GPU resources.
What problem does this paper attempt to address?