Abstract:For model inference of convolutional neural networks (CNNs), we nowadays witness a shift from the Cloud to the Edge. Unfortunately, deploying and inferring large, compute- and memory-intensive CNNs on Internet of Things devices at the Edge is challenging as they typically have limited resources. One approach to address this challenge is to leverage all available resources across multiple edge devices to execute a large CNN by properly partitioning it and running each CNN partition on a separate edge device. However, there currently does not exist a design and programming framework that takes a trained CNN model as input and subsequently allows for efficiently exploring and automatically implementing a range of different CNN partitions on multiple edge devices to facilitate distributed CNN inference. Therefore, in this article, we propose a novel framework that automates the splitting of a CNN model into a set of submodels as well as the code generation needed for the distributed and collaborative execution of these submodels on multiple, possibly heterogeneous, edge devices, while supporting the exploitation of parallelism among and within the edge devices. In addition, since the number of different CNN mapping possibilities on multiple edge devices is vast, our framework also features a multistage and hierarchical design space exploration methodology to efficiently search for (near-)optimal distributed CNN inference implementations. Our experimental results demonstrate that our work allows for rapidly finding and realizing distributed CNN inference implementations with reduced energy consumption and memory usage per edge device, and under certain conditions, with improved system throughput as well.

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

Collaborative Inference for Deep Neural Networks in Edge Environments

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Approach for Heterogeneous Edge Devices

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

Auto-Split: A General Framework of Collaborative Edge-Cloud AI

Edge-PRUNE: Flexible Distributed Deep Learning Inference

Multi-Model Running Latency Optimization in an Edge Computing Paradigm

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices

Researching the CNN Collaborative Inference Mechanism for Heterogeneous Edge Devices

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

Automated Exploration and Implementation of Distributed CNN Inference at the Edge