Abstract:In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at <a class="link-external link-https" href="https://github.com/intelligolabs/sparsity_sc_ee" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to reduce the computational, storage and energy - consumption burdens of deep neural networks (DNNs) in split computing (SC) and early exit (EE) applications through predefined sparsity, thereby improving the performance of these application scenarios. ### Background and Motivation With deep neural networks (DNNs) making remarkable progress in fields such as computer vision, speech recognition, and autonomous driving, the scale and complexity of these models are also increasing continuously. However, this growth has brought huge computational resources and storage requirements, making it very difficult to deploy DNNs on resource - constrained edge devices. Existing solutions are usually to train large - scale DNNs on servers and then perform inference on edge devices. Although this method can achieve real - time tasks, there are still challenges in terms of computational resources, storage and energy consumption. ### Solution This paper proposes a new method that combines the concepts of predefined sparsity, split computing (SC) and early exit (EE). Specifically: 1. **Predefined Sparsity**: Before training, a preset sparse connection pattern is defined and remains unchanged throughout the training and inference processes. This can be achieved by setting a fixed out - degree and in - degree between each layer, thereby reducing the number of connections. 2. **Split Computing (SC)**: The DNN is divided into two parts, one part is deployed on the edge device, and the other part is deployed on the remote server. In this way, the low - latency advantage of the edge device and the powerful computing power of the remote server can be utilized. 3. **Early Exit (EE)**: Multiple exit points are inserted in the intermediate layers of the network. If the intermediate results are accurate enough, the computational results of the edge device can be directly used without relying on the remote server. ### Experimental Results The experimental results show that by applying predefined sparsity, the storage and computational complexity can be significantly reduced without sacrificing performance. Specifically: - The storage complexity is reduced by more than 4 times. - The computational complexity is also reduced by more than 4 times. ### Application Scenarios This paper takes the real - time quality control system in intelligent manufacturing as an example to demonstrate the practical application value of this method. On the production line, the edge device can quickly determine whether the product is defective and temporarily move the suspected defective products to the buffer area. At the same time, the remote device continues to complete the remaining inference tasks and finally determines the quality status of the product. ### Main Contributions 1. A predefined sparsity strategy is proposed, that is, a set of sparse neuron connections is defined before training and remains unchanged throughout the training and inference processes. 2. By removing specific connections, the computational and storage complexity is reduced. 3. This method is applied to split computing (SC) and early exit (EE) scenarios, further improving the application performance. ### Conclusion This paper effectively solves the computational, storage and energy - consumption problems faced by the deployment of DNNs on edge devices by combining the methods of predefined sparsity, split computing and early exit, providing new ideas for realizing real - time and efficient distributed deep - learning applications.

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

Dynamic Split Computing for Efficient Deep EDGE Intelligence

I-SplitEE: Image classification in Split Computing DNNs with Early Exits

MTL-Split: Multi-Task Learning for Edge Devices using Split Computing

Enabling Edge Artificial Intelligence via Goal-oriented Deep Neural Network Splitting

Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural Networks

DynaSplit: A Hardware-Software Co-Design Framework for Energy-Aware Inference on Edge

Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems

Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

SplitPlace: AI Augmented Splitting and Placement of Large-Scale Neural Networks in Mobile Edge Environments

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Efficient Communication-Computation Tradeoff for Split Computing: A Multi-Tier Deep Reinforcement Learning Approach

Partitioning and Deployment of Deep Neural Networks on Edge Clusters

Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems

Reconfigurable Spatial-Parallel Stochastic Computing for Accelerating Sparse Convolutional Neural Networks

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

A Case For Adaptive Deep Neural Networks in Edge Computing