SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Stefanos Laskaridis,Stylianos I. Venieris,Mario Almeida,Ilias Leontiadis,Nicholas D. Lane
DOI: https://doi.org/10.1145/3372224.3419194
2020-08-24
Abstract:Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.
Machine Learning,Computer Vision and Pattern Recognition,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced when performing convolutional neural network (CNN) inference efficiently on mobile devices. Specifically, modern CNNs are difficult to achieve high - performance inference on resource - constrained mobile devices due to their excessive computational requirements. In addition, emerging user - oriented and mission - critical CNN applications require low - latency processing to ensure high - quality experience (QoE) and security. Although the CNN processing tasks can be offloaded to cloud servers to solve the problem of insufficient computing power, this will lead to performance highly depending on dynamic network conditions, and there are also privacy and cost issues. To solve these problems, the paper proposes SPINN (Synergistic Progressive Inference of Neural Networks over Device and Cloud), a distributed inference system. This system aims to provide fast and robust CNN inference services while adapting to different application scenarios by coordinating device and cloud computing and adopting a progressive inference method. SPINN introduces a novel scheduler that jointly optimizes the early - exit strategy and CNN split points at runtime to adapt to dynamic conditions and meet user - defined service - level requirements (SLAs). Experimental results show that under changing network conditions, SPINN can increase the throughput by up to 2 times, reduce the server cost by up to 6.8 times, and improve the accuracy by 20.7% under latency constraints compared to the state - of - the - art collaborative inference methods, while providing robust operation under uncertain connection conditions and significantly saving energy.