MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Sokratis Nikolaidis,Stylianos I. Venieris,Iakovos S. Venieris
DOI: https://doi.org/10.52953/TBYB6219
2024-12-05
Abstract:Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper attempts to address the challenges faced when multiple intelligent devices share an edge server for deep learning (DL) inference tasks in a multi - device cascading architecture. Specifically, with the expansion of intelligent indoor environments (such as smart homes), new scenarios have emerged where multiple devices use shared heavy models simultaneously. In this case, the system needs to be scalable to balance fast response times and high accuracy, and avoid problems such as system overload caused by traditional methods or loss of accuracy advantages due to complete reliance on local execution. ### Core Problems of the Paper 1. **Concurrent Access by Multiple Devices**: When multiple devices simultaneously request the edge server for complex - model inference, how to ensure the efficiency and response speed of the system. 2. **Resource Allocation and Model Selection**: In a multi - device environment, how to dynamically adjust the forwarding decision function of each device to optimize system throughput, maintain high accuracy, and low latency. 3. **Dynamic Adaptability**: Facing constantly changing workloads and device requirements, how to achieve continuous adaptive scheduling to deal with the challenges under different device levels and workload conditions. ### Solutions To solve these problems, the paper proposes **MultiTASC++**, a continuously adaptive multi - tenant - aware scheduler aimed at optimizing the inference request arrival rate in a multi - device cascading architecture. Its main contributions include: - **System Model**: Expands the cascading architecture to adapt to the multi - device environment, reveals adjustable parameters, enabling system designers to systematically study its trade - offs. - **New Scheduler**: Introduces a new multi - tenant - aware scheduler. Through more refined reconfiguration of the forwarding decision function, taking into account the latency requirements of each device, more effective device - customized adaptation is achieved. - **Continuous Adjustment**: Enables continuous adjustment of the forwarding decision function instead of discrete steps, thereby improving adaptability. - **Server Model Switching**: Allows server - side models to be dynamically switched according to different latency - accuracy trade - offs, increasing the flexibility of the architecture. ### Key Formulas - **BvSB Metric**: \[ \text{BvSB}(f(x))=P_1 - P_2 \] where \(P_1\) and \(P_2\) are the highest and second - highest values in the softmax output of the model respectively. - **Threshold Update Rule**: \[ \Delta \text{thresh}=-a\cdot(SR_{\text{target}} - SR_{\text{update}}) \] where \(\Delta \text{thresh}\) is the threshold adjustment amount, \(SR_{\text{target}}\) is the target SLO satisfaction rate, \(SR_{\text{update}}\) is the SLO satisfaction rate sent by the device, and \(a\) is a scaling factor. Through these improvements, MultiTASC++ can better cope with the unique challenges of dynamic and diverse Internet of Things environments in multi - device cascading architectures, providing higher scalability and efficiency.