Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

Mingjin Zhang,Jiannong Cao,Yuvraj Sahni,Xiangchun Chen,Shan Jiang
2024-03-23
Abstract:Edge AI has been recently proposed to facilitate the training and deployment of Deep Neural Network (DNN) models in proximity to the sources of data. To enable the training of large models on resource-constraint edge devices and protect data privacy, parallel split learning is becoming a practical and popular approach. However, current parallel split learning neglects the resource heterogeneity of edge devices, which may lead to the straggler issue. In this paper, we propose EdgeSplit, a novel parallel split learning framework to better accelerate distributed model training on heterogeneous and resource-constraint edge devices. EdgeSplit enhances the efficiency of model training on less powerful edge devices by adaptively segmenting the model into varying depths. Our approach focuses on reducing total training time by formulating and solving a task scheduling problem, which determines the most efficient model partition points and bandwidth allocation for each device. We employ a straightforward yet effective alternating algorithm for this purpose. Comprehensive tests conducted with a range of DNN models and datasets demonstrate that EdgeSplit not only facilitates the training of large models on resource-restricted edge devices but also surpasses existing baselines in performance.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper aims to address the issue of efficient distributed model training on resource-constrained and heterogeneous edge devices. Specifically, the paper proposes the EdgeSplit framework, a novel parallel split learning framework that improves the efficiency of distributed model training through the following methods: 1. **Adaptive Model Splitting**: Dynamically splits the complete model into parts of different depths based on the heterogeneous computational capabilities of edge devices, thereby optimizing the model training tasks on each device. 2. **Task Scheduling and Bandwidth Allocation**: Determines the most effective model split points and bandwidth allocation strategies between devices and the server through mathematical modeling and solving the task scheduling problem, minimizing the overall training time. 3. **Improving Training Speed**: Experimental results show that EdgeSplit significantly improves training speed compared to other baseline methods, achieving up to 5.5 times acceleration on the ResNet50 model without loss of accuracy. Through these technical means, EdgeSplit enables efficient training of large-scale models on resource-constrained edge devices and significantly reduces the total training time by offloading part of the computational tasks to more powerful Federated Learning (FL) servers.