Abstract:Deep learning (DL) applications have attracted significant attention with the rapidly growing demand for Internet of Things (IoT) systems. However, performing the inference tasks for DL applications on IoT devices is challenging due to the large computational demands of DL models. Recently, edge computing has offered us a solution by deploying resources near the end users. However, resources at the edge are still limited; thus, management issues, such as allocating the networking resources as well as the computing capabilities and configuring the devices appropriately for different applications, become essential. For knobs in such edge management, we consider multiple application tasks with different options of DL models and different hyperparameter settings, along with possible decomposition points that utilize the split DL concept to design the configuration tables. Layer-level decomposition in split DL provides greater flexibility by splitting a single DL inference model into parts on different computing devices, and each part consists of several consecutive layers. We then propose the SplitDL-Image and the SplitDL-Video algorithms based on the Vickrey–Clarke–Groves (VCG) mechanism by considering model performance and frames per second (FPS) requirements with the preferences of the heterogeneous IoT devices. The proposed method allocates networking and edge server computing resources according to the designed configuration tables by assigning the appropriate configuration to each IoT device. Simulation results based on real-world applications show that the proposed method indeed allocates more resources to IoT devices with more urgent/important tasks, preference for better accuracy, or higher local computational cost. In addition, other desired properties, such as truthful bidding, individual rationality, and weakly budget balance, are also guaranteed.

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

Distributed Assignment With Load Balancing for DNN Inference at the Edge

Cloud-Edge Inference under Communication Constraints: Data Quantization and Early Exit.

Edge–IoT Computing and Networking Resource Allocation for Decomposable Deep Learning Inference

Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy

The Case for Hierarchical Deep Learning Inference at the Network Edge

Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems

A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference

Dynamic DNN Decomposition for Lossless Synergistic Inference

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative Architecture for DNN Inference

DNN Deployment, Task Offloading, and Resource Allocation for Joint Task Inference in IIoT

Collaborative Inference for Deep Neural Networks in Edge Environments

Deep Learning for Hybrid 5G Services in Mobile Edge Computing Systems: Learn From a Digital Twin