Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms

Zain Taufique,Antonio Miele,Pasi Liljeberg,Anil Kanduri
DOI: https://doi.org/10.1109/ASP-DAC58780.2024.10473987
2023-10-16
Abstract:DNN inference can be accelerated by distributing the workload among a cluster of collaborative edge nodes. Heterogeneity among edge devices and accuracy-performance trade-offs of DNN models present a complex exploration space while catering to the inference performance requirements. In this work, we propose adaptive workload distribution for DNN inference, jointly considering node-level heterogeneity of edge devices, and application-specific accuracy and performance requirements. Our proposed approach combinatorially optimizes heterogeneity-aware workload partitioning and dynamic accuracy configuration of DNN models to ensure performance and accuracy guarantees. We tested our approach on an edge cluster of Odroid XU4, Raspberry Pi4, and Jetson Nano boards and achieved an average gain of 41.52% in performance and 5.2% in output accuracy as compared to state-of-the-art workload distribution strategies.
Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Machine Learning,Performance,Systems and Control
What problem does this paper attempt to address?
The paper aims to address the issue of workload allocation for DNN inference on collaborative heterogeneous edge platforms. Specifically, the paper focuses on the following points: 1. **Heterogeneity**: There are differences in hardware architecture and computational capabilities among edge devices, leading to device-level heterogeneity. 2. **Accuracy and Performance Trade-off**: Different DNN models have different trade-offs between accuracy and performance. 3. **Adaptation to Dynamic Scenarios**: Existing methods show limited performance in handling runtime workload variations and device availability. The paper proposes an adaptive workload allocation strategy that ensures performance and accuracy guarantees by comprehensively optimizing heterogeneity-aware workload partitioning and dynamic precision configuration of DNN models. Experimental results show that, compared to existing workload allocation strategies, this method improves performance by an average of 41.52% and output accuracy by 5.2%. Additionally, the paper demonstrates the performance under different device availability scenarios and proves the effectiveness of the proposed method in reducing performance and accuracy violations.