Abstract:The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.

Partitioning Convolutional Neural Networks for Inference on Constrained Internet-of-Things Devices

Partitioning Convolutional Neural Networks for Inference on Constrained Internet-of-Things Devices

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

End-Edge Collaborative Inference of Convolutional Fuzzy Neural Networks for Big Data-Driven Internet of Things

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms

Enhancing Distributed In-Situ CNN Inference in the Internet of Things

Distributed Deep Convolutional Neural Networks for the Internet-of-Things

Cooperative Inference with Interleaved Operator Partitioning for CNNs

Learning the Optimal Partition for Collaborative DNN Training with Privacy Requirements

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Distributed Inference in Resource-Constrained IoT for Real-Time Video Surveillance