Abstract:Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations. Because of the recent advancement in multi-access edge computing and edge intelligence, DNN partition has been considered as a powerful tool for improving DNN inference performance when the computing resources of edge and end devices are limited and the remote transmission of data from these devices to clouds is costly. This paper provides a comprehensive survey on the recent advances and challenges in DNN partition approaches over the cloud, edge, and end devices based on a detailed literature collection. We review how DNN partition works in various application scenarios, and provide a unified mathematical model of the DNN partition problem. We developed a five-dimensional classification framework for DNN partition approaches, consisting of deployment locations, partition granularity, partition constraints, optimization objectives, and optimization algorithms. Each existing DNN partition approache can be perfectly defined in this framework by instantiating each dimension into specific values. In addition, we suggest a set of metrics for comparing and evaluating the DNN partition approaches. Based on this, we identify and discuss research challenges that have not yet been investigated or fully addressed. We hope that this work helps DNN partition researchers by highlighting significant future research directions in this domain.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of partitioning deep neural networks (DNNs) across cloud computing, edge computing, and terminal devices. Specifically: 1. **Background and Challenges**: - With the increase in Internet of Things (IoT) devices, DNNs need to process a large amount of data. - Terminal devices, due to limited hardware resources, cannot meet the real-time inference requirements of complex DNNs. - Although cloud servers are resource-rich, they suffer from high latency and bandwidth limitations. 2. **Research Objectives**: - Propose a systematic framework to evaluate and compare different DNN partitioning methods. - Address performance optimization issues of DNN partitioning at different deployment locations (cloud, edge, and terminal devices). - Provide a unified mathematical model to describe the DNN partitioning problem. - Introduce a five-dimensional classification framework covering deployment location, partitioning granularity, constraints, optimization objectives, and optimization algorithms. 3. **Application Cases**: - Smart Home: such as fall detection systems. - Intelligent Transportation: such as edge video aggregation nodes. - Industrial Control: such as real-time monitoring and target recognition. - Virtual Reality/Augmented Reality (VR/AR): such as multiplayer online games. 4. **Technical Implementation**: - Partition the DNN model into multiple microservices and deploy them using container technology. - Use tools like Kubernetes to manage containerized applications. - Provide a detailed mathematical model to describe partitioning strategies and their performance metrics. 5. **Main Contributions**: - Summarize the technical contributions of related research and propose a five-dimensional classification framework. - Propose a series of metrics to evaluate and compare DNN partitioning methods. - Highlight and discuss the challenges in current research and propose future research directions. Through these efforts, the paper aims to provide important future research directions for researchers in the field of DNN partitioning.

A Survey on Deep Neural Network Partition over Cloud, Edge and End Devices

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Review of Inference Time Prediction Approaches of DNN: Emphasis on Service Robots with Cloud-Edge-device Architecture

Cloud-Edge Inference under Communication Constraints: Data Quantization and Early Exit.

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Towards Resource-aware DNN Partitioning for Edge Devices with Heterogeneous Resources

Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural Networks

PArtNNer: Platform-agnostic Adaptive Edge-Cloud DNN Partitioning for minimizing End-to-End Latency

A privacy protection approach in edge-computing based on maximized dnn partition strategy with energy saving

Dynamic DNN Decomposition for Lossless Synergistic Inference

Learning the Optimal Partition for Collaborative DNN Training with Privacy Requirements

A Case For Adaptive Deep Neural Networks in Edge Computing

Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

Joint Optimization With DNN Partitioning and Resource Allocation in Mobile Edge Computing

Partitioning and Deployment of Deep Neural Networks on Edge Clusters

Energy-Efficient DNN Partitioning and Offloading for Task Completion Rate Maximization in Multiuser Edge Intelligence

The Effects of Partitioning Strategies on Energy Consumption in Distributed CNN Inference at The Edge

A Survey on the Use of Partitioning in IoT-Edge-AI Applications

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning

Joint DNN Partition and Resource Allocation Optimization for Energy-Constrained Hierarchical Edge-Cloud Systems