Abstract:Deep learning (DL) applications have attracted significant attention with the rapidly growing demand for Internet of Things (IoT) systems. However, performing the inference tasks for DL applications on IoT devices is challenging due to the large computational demands of DL models. Recently, edge computing has offered us a solution by deploying resources near the end users. However, resources at the edge are still limited; thus, management issues, such as allocating the networking resources as well as the computing capabilities and configuring the devices appropriately for different applications, become essential. For knobs in such edge management, we consider multiple application tasks with different options of DL models and different hyperparameter settings, along with possible decomposition points that utilize the split DL concept to design the configuration tables. Layer-level decomposition in split DL provides greater flexibility by splitting a single DL inference model into parts on different computing devices, and each part consists of several consecutive layers. We then propose the SplitDL-Image and the SplitDL-Video algorithms based on the Vickrey–Clarke–Groves (VCG) mechanism by considering model performance and frames per second (FPS) requirements with the preferences of the heterogeneous IoT devices. The proposed method allocates networking and edge server computing resources according to the designed configuration tables by assigning the appropriate configuration to each IoT device. Simulation results based on real-world applications show that the proposed method indeed allocates more resources to IoT devices with more urgent/important tasks, preference for better accuracy, or higher local computational cost. In addition, other desired properties, such as truthful bidding, individual rationality, and weakly budget balance, are also guaranteed.

Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

Distributed Inference in Resource-Constrained IoT for Real-Time Video Surveillance

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN

Edge–IoT Computing and Networking Resource Allocation for Decomposable Deep Learning Inference

Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Self-aware distributed deep learning framework for heterogeneous IoT edge devices

EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices

Hierarchical and Distributed Machine Learning Inference Beyond the Edge

Partitioning and Deployment of Deep Neural Networks on Edge Clusters

Edge-PRUNE: Flexible Distributed Deep Learning Inference