Abstract:Deep learning (DL) applications have attracted significant attention with the rapidly growing demand for Internet of Things (IoT) systems. However, performing the inference tasks for DL applications on IoT devices is challenging due to the large computational demands of DL models. Recently, edge computing has offered us a solution by deploying resources near the end users. However, resources at the edge are still limited; thus, management issues, such as allocating the networking resources as well as the computing capabilities and configuring the devices appropriately for different applications, become essential. For knobs in such edge management, we consider multiple application tasks with different options of DL models and different hyperparameter settings, along with possible decomposition points that utilize the split DL concept to design the configuration tables. Layer-level decomposition in split DL provides greater flexibility by splitting a single DL inference model into parts on different computing devices, and each part consists of several consecutive layers. We then propose the SplitDL-Image and the SplitDL-Video algorithms based on the Vickrey–Clarke–Groves (VCG) mechanism by considering model performance and frames per second (FPS) requirements with the preferences of the heterogeneous IoT devices. The proposed method allocates networking and edge server computing resources according to the designed configuration tables by assigning the appropriate configuration to each IoT device. Simulation results based on real-world applications show that the proposed method indeed allocates more resources to IoT devices with more urgent/important tasks, preference for better accuracy, or higher local computational cost. In addition, other desired properties, such as truthful bidding, individual rationality, and weakly budget balance, are also guaranteed.

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Efficient Architecture Paradigm for Deep Learning Inference As a Service.

Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning

Liquid: Intelligent Resource Estimation and Network-Efficient Scheduling for Deep Learning Jobs on Distributed GPU Clusters

PowerAI DDL

DxPU: Large Scale Disaggregated GPU Pools in the Datacenter

P/D-Serve: Serving Disaggregated Large Language Model at Scale

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Edge–IoT Computing and Networking Resource Allocation for Decomposable Deep Learning Inference

RT-mDL

Joint DNN Partition and Resource Allocation Optimization for Energy-Constrained Hierarchical Edge-Cloud Systems

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs