Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Meng Wang,Liang Qian,Na Meng,Yusong Cheng,Weiwei Fang
DOI: https://doi.org/10.1109/PAAP60200.2023.10391646
2023-01-01
Abstract:Deep neural networks (DNNs) have recently gained widespread application in various domains. However, the computational and memory requirements of DNN models pose challenges for deploying them on resource-constrained edge devices. With the widespread use of the Internet of Things (IoT), heterogeneous edge devices with diverse computational capabilities and network conditions are increasingly employed for DNN inference. This paper proposes a distributed DNN model deployment scheme for edge device clusters. The DNN model partitioning is performed using two algorithms: Edge Layer Partitioning (EdgeLP), which partitions a single neural network layer, and Edge Model Partitioning (EdgeMP), which performs complete model partitioning. These algorithms consider both the computational capabilities and network conditions of the edge devices. To address the transmission overhead between collaborative edge devices, layer fusion and data quantization are applied to reduce the amount of transmitted data. Experimental results show that our method dramatically improves the performance of distributed DNN inference in heterogeneous scenarios. Specifically, on a cluster of three edge devices, the proposed scheme achieves DNN inference time acceleration speedup of 1.38–1.72× without accuracy loss compared to a state-of-the-art scheme.
What problem does this paper attempt to address?