Expanding the Edge: Enabling Efficient Winograd CNN Inference with Deep Reuse on Edge Device

Feng Zhang,Ruofan Wu,Jiawei Guan,Zhen Zheng,Xiaoguang Guo,Xiao Zhang,Xiaoyong Du,Xipeng Shen
DOI: https://doi.org/10.1109/tkde.2023.3269017
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Deep learning on edge devices is becoming increasingly important, especially with the explosion of IoT devices. For example, the total number of devices connected to IoT reaches 29 billion in 2022. Convolutional neural networks (CNNs), as common deep learning representatives, are among the most popular neural networks in knowledge and data engineering. However, CNN employs a high degree of computing. In comparison to the training phase, the inference process is more frequently done on low-power computing equipments, such as edge devices. The limited computing resource and high computation pressure limit the effective use of CNN algorithms at the edge. Fortunately, a minimal filtering algorithm called Winograd can reduce convolution calculations by minimizing multiplication operations. We find that Winograd convolution can be accelerated further by deep reuse technique, which reuses the similar data and computation processes. In this paper, we propose a new inference method, called DREW, which combines deep reuse with Winograd for further accelerating CNNs. DREW handles three difficulties. First, it can detect the similarities from the complex minimal filtering patterns by clustering. Second, it reduces the online clustering cost in a reasonable range. Third, it provides an adjustable method in clustering granularity balancing the performance and accuracy. We perform evaluation on Raspberry PI and NVIDIA Jetson AGX Xavier edge devices, and experiments show that on five popular networks, 1) DREW further accelerates the Winograd convolution by an average of 8.27× speedup. Even for the highly parallel Winograd implementation, DREW still can provide 2.21× speedup. 2) When DREW is applied to end-to-end Winograd CNN inferences, DREW achieves 5.94× the average performance speedup with no ( $< $ 0.4%) accuracy loss. 3) Energy consumption is an important factor for inference in practice. DREW reduces the number of convolution operations to 10% of the original operations, thus achieving up to 60% energy-efficiency benefits than the original Winograd inference.
What problem does this paper attempt to address?