STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage

Ali HeydariGorji,Mahdi Torabzadehkashi,Siavash Rezaei,Hossein Bobarshad,Vladimir Alves,Pai H. Chou
DOI: https://doi.org/10.1109/DAC18072.2020.9218687
2020-02-20
Abstract:This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.
Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Bottlenecks in data processing and transmission**: With the exponential growth of data generation, traditional data processing methods (sending data to the data center for processing and then returning the results to the client) face challenges such as high latency, identity management, and data protection. Especially in applications that require real - time or near - real - time processing of large amounts of data, such as self - driving cars, the efficiency of data processing becomes a key issue. 2. **High energy consumption and long duration of neural network training**: Neural network training is not only time - consuming but also consumes a large amount of energy. For large - scale neural networks, even on the most powerful processors, the training process may take weeks. This limits the wide application and development of neural networks. 3. **Data privacy and security**: In the distributed training process, data security and privacy protection are important issues. In traditional methods, data needs to be transmitted between different nodes, increasing the risk of data leakage. To solve the above problems, the paper proposes a distributed, in - storage training framework based on Computational Storage Devices (CSD) - STANNIS. The main contributions of this framework include: - Developing a new low - power CSD named Newport with enhanced processing capabilities. - Proposing a framework named Stannis that can effectively parallelize training tasks on CSD clusters. - Designing an adjustment algorithm to maximize the utilization rate of heterogeneous systems. - Protecting data privacy by combining private and public data and ensuring that private data does not leave the storage system. Experimental results show that using the STANNIS framework can achieve a speed increase of up to 2.7 times and reduce energy consumption by 69% without significantly reducing accuracy. These improvements not only increase the processing speed and energy efficiency but also enhance data security and privacy protection.