SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers

Daliang Xu,Mengwei Xu,Chiheng Lou,Li Zhang,Gang Huang,Xin Jin,Xuanzhe Liu
DOI: https://doi.org/10.1145/3617232.3624847
2024-01-01
Abstract:SoC-Cluster, a novel server architecture composed of massive mobile system-on-chips (SoCs), is gaining popularity in industrial edge computing due to its energy efficiency and compatibility with existing mobile applications. However, we observe that the deployed SoC-Cluster servers are not fully utilized, because the hosted workloads are mostly usertriggered and have significant tidal phenomena. To harvest the free cycles, we propose to co-locate deep learning tasks on them. We present SoCFlow, the first framework that can efficiently train deep learning models on SoC-Cluster. To deal with the intrinsic inadequacy of commercial SoC-Cluster servers, SoCFlow incorporates two novel techniques: (1) the group-wise parallelism with delayed aggregation that can train deep learning models fast and scalably without being influenced by the network bottleneck; (2) the data-parallel mixed-precision training algorithm that can fully unleash the heterogeneous processors' capability of mobile SoCs. We have fully implemented SoCFlow and demonstrated its effectiveness through extensive experiments. The experiments show that SoCFlow significantly and consistently outperforms all baselines regarding the training speed while preserving the convergence accuracy, e.g., 1.6x-740x convergence speedup with 32 SoCs. Compared to commodity GPU (NVIDIA V100) under the same power budget, SoCFlow achieves comparable training speed but reduces energy consumption by 2.31x-10.23x with the same convergence accuracy.
What problem does this paper attempt to address?