Luigi Capogrosso,Enrico Fraccaroli,Giulio Petrozziello,Francesco Setti,Samarjit Chakraborty,Franco Fummi,Marco Cristani
Abstract:In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at <a class="link-external link-https" href="https://github.com/intelligolabs/sparsity_sc_ee" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computer Vision and Pattern Recognition,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to reduce the computational, storage and energy - consumption burdens of deep neural networks (DNNs) in split computing (SC) and early exit (EE) applications through predefined sparsity, thereby improving the performance of these application scenarios.
### Background and Motivation
With deep neural networks (DNNs) making remarkable progress in fields such as computer vision, speech recognition, and autonomous driving, the scale and complexity of these models are also increasing continuously. However, this growth has brought huge computational resources and storage requirements, making it very difficult to deploy DNNs on resource - constrained edge devices. Existing solutions are usually to train large - scale DNNs on servers and then perform inference on edge devices. Although this method can achieve real - time tasks, there are still challenges in terms of computational resources, storage and energy consumption.
### Solution
This paper proposes a new method that combines the concepts of predefined sparsity, split computing (SC) and early exit (EE). Specifically:
1. **Predefined Sparsity**: Before training, a preset sparse connection pattern is defined and remains unchanged throughout the training and inference processes. This can be achieved by setting a fixed out - degree and in - degree between each layer, thereby reducing the number of connections.
2. **Split Computing (SC)**: The DNN is divided into two parts, one part is deployed on the edge device, and the other part is deployed on the remote server. In this way, the low - latency advantage of the edge device and the powerful computing power of the remote server can be utilized.
3. **Early Exit (EE)**: Multiple exit points are inserted in the intermediate layers of the network. If the intermediate results are accurate enough, the computational results of the edge device can be directly used without relying on the remote server.
### Experimental Results
The experimental results show that by applying predefined sparsity, the storage and computational complexity can be significantly reduced without sacrificing performance. Specifically:
- The storage complexity is reduced by more than 4 times.
- The computational complexity is also reduced by more than 4 times.
### Application Scenarios
This paper takes the real - time quality control system in intelligent manufacturing as an example to demonstrate the practical application value of this method. On the production line, the edge device can quickly determine whether the product is defective and temporarily move the suspected defective products to the buffer area. At the same time, the remote device continues to complete the remaining inference tasks and finally determines the quality status of the product.
### Main Contributions
1. A predefined sparsity strategy is proposed, that is, a set of sparse neuron connections is defined before training and remains unchanged throughout the training and inference processes.
2. By removing specific connections, the computational and storage complexity is reduced.
3. This method is applied to split computing (SC) and early exit (EE) scenarios, further improving the application performance.
### Conclusion
This paper effectively solves the computational, storage and energy - consumption problems faced by the deployment of DNNs on edge devices by combining the methods of predefined sparsity, split computing and early exit, providing new ideas for realizing real - time and efficient distributed deep - learning applications.