A Survey on Deep Neural Network Partition over Cloud, Edge and End Devices

Di Xu,Xiang He,Tonghua Su,Zhongjie Wang
2023-04-20
Abstract:Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations. Because of the recent advancement in multi-access edge computing and edge intelligence, DNN partition has been considered as a powerful tool for improving DNN inference performance when the computing resources of edge and end devices are limited and the remote transmission of data from these devices to clouds is costly. This paper provides a comprehensive survey on the recent advances and challenges in DNN partition approaches over the cloud, edge, and end devices based on a detailed literature collection. We review how DNN partition works in various application scenarios, and provide a unified mathematical model of the DNN partition problem. We developed a five-dimensional classification framework for DNN partition approaches, consisting of deployment locations, partition granularity, partition constraints, optimization objectives, and optimization algorithms. Each existing DNN partition approache can be perfectly defined in this framework by instantiating each dimension into specific values. In addition, we suggest a set of metrics for comparing and evaluating the DNN partition approaches. Based on this, we identify and discuss research challenges that have not yet been investigated or fully addressed. We hope that this work helps DNN partition researchers by highlighting significant future research directions in this domain.
Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of partitioning deep neural networks (DNNs) across cloud computing, edge computing, and terminal devices. Specifically: 1. **Background and Challenges**: - With the increase in Internet of Things (IoT) devices, DNNs need to process a large amount of data. - Terminal devices, due to limited hardware resources, cannot meet the real-time inference requirements of complex DNNs. - Although cloud servers are resource-rich, they suffer from high latency and bandwidth limitations. 2. **Research Objectives**: - Propose a systematic framework to evaluate and compare different DNN partitioning methods. - Address performance optimization issues of DNN partitioning at different deployment locations (cloud, edge, and terminal devices). - Provide a unified mathematical model to describe the DNN partitioning problem. - Introduce a five-dimensional classification framework covering deployment location, partitioning granularity, constraints, optimization objectives, and optimization algorithms. 3. **Application Cases**: - Smart Home: such as fall detection systems. - Intelligent Transportation: such as edge video aggregation nodes. - Industrial Control: such as real-time monitoring and target recognition. - Virtual Reality/Augmented Reality (VR/AR): such as multiplayer online games. 4. **Technical Implementation**: - Partition the DNN model into multiple microservices and deploy them using container technology. - Use tools like Kubernetes to manage containerized applications. - Provide a detailed mathematical model to describe partitioning strategies and their performance metrics. 5. **Main Contributions**: - Summarize the technical contributions of related research and propose a five-dimensional classification framework. - Propose a series of metrics to evaluate and compare DNN partitioning methods. - Highlight and discuss the challenges in current research and propose future research directions. Through these efforts, the paper aims to provide important future research directions for researchers in the field of DNN partitioning.