SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Stefanos Laskaridis,Stylianos I. Venieris,Mario Almeida,Ilias Leontiadis,Nicholas D. Lane

DOI: https://doi.org/10.1145/3372224.3419194

2020-08-24

Abstract:Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

Machine Learning,Computer Vision and Pattern Recognition,Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced when performing convolutional neural network (CNN) inference efficiently on mobile devices. Specifically, modern CNNs are difficult to achieve high - performance inference on resource - constrained mobile devices due to their excessive computational requirements. In addition, emerging user - oriented and mission - critical CNN applications require low - latency processing to ensure high - quality experience (QoE) and security. Although the CNN processing tasks can be offloaded to cloud servers to solve the problem of insufficient computing power, this will lead to performance highly depending on dynamic network conditions, and there are also privacy and cost issues. To solve these problems, the paper proposes SPINN (Synergistic Progressive Inference of Neural Networks over Device and Cloud), a distributed inference system. This system aims to provide fast and robust CNN inference services while adapting to different application scenarios by coordinating device and cloud computing and adopting a progressive inference method. SPINN introduces a novel scheduler that jointly optimizes the early - exit strategy and CNN split points at runtime to adapt to dynamic conditions and meet user - defined service - level requirements (SLAs). Experimental results show that under changing network conditions, SPINN can increase the throughput by up to 2 times, reduce the server cost by up to 6.8 times, and improve the accuracy by 20.7% under latency constraints compared to the state - of - the - art collaborative inference methods, while providing robust operation under uncertain connection conditions and significantly saving energy.

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

A Proposal for Energy-Efficient Cellular Neural Network based on Spintronic Devices

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

Collaborative Inference for Deep Neural Networks in Edge Environments

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

Dynamic DNN Decomposition for Lossless Synergistic Inference

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Enhancing Distributed In-Situ CNN Inference in the Internet of Things

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems

Spin: An Efficient Secure Computation Framework with GPU Acceleration

ABM-SpConv-SIMD: Accelerating Convolutional Neural Network Inference for Industrial IoT Applications on Edge Devices

Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing

JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services

Inference Acceleration with Adaptive Distributed DNN Partition over Dynamic Video Stream

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms