System Support and Mechanisms for Adaptive Edge-to-cloud DNN Model Serving

Matthias Reisinger,Pantelis A. Frangoudis,Schahram Dustdar
DOI: https://doi.org/10.1109/ic2e52221.2021.00046
2021-01-01
Abstract:We present an orchestration scheme for Deep Neural Network (DNN) model serving, capable of computation distribution over the device-to-cloud continuum and low-latency inference. Our system allows automated layer-wise splitting of DNN structures and their adaptive distribution over compute hosts, providing an execution environment for collaborative inference. Model deployment and its self-adaptation at runtime are implemented by optimization algorithms supported in a plug-in manner. These follow service and infrastructure provider criteria and constraints, expressed via well-defined interfaces. Our framework can serve diverse neural architectures, including DNNs with early exits, with zero to minimal modifications.
What problem does this paper attempt to address?