Performance Analysis of Different Convolution Algorithms in GPU Environment

Rui Xu,Sheng Ma,Yang Guo
DOI: https://doi.org/10.1109/NAS.2018.8515695
2018-01-01
Abstract:Convolutional neural networks (CNNs) have a wide range of applications in image and video recognition, recommender systems and natural language processing. But CNNs are computationally intensive, and its computational cost is hard to accept. In order to speed up the calculations, people focus on optimizing convolution that account for most of the proportion of CNNs' operation. So, many algorithms have been proposed to accelerate the operation of convolution layers. However, each algorithm has its advantages and disadvantages, and there is no one algorithm that can handle all situations. In this paper, we examine the performance of various algorithms in GPU environment. By building a customized CNN model, we have fully explored the impact of the neural structure on the performance of algorithms, including inference/training speed, memory consumption and power consumption. In addition to the algorithms, we also focus on how their implementations in GPU environment affect their performance. We trace the kernel functions of these implementations to further generalize the characteristics of these algorithms. Finally, we summarize the characteristics of each algorithm., and design a strategy to assigns the appropriate implementation for different convolutional layers in CNNs. With our strategy, we can make AlexNet run 1.2× to 2.8× faster than other strategies in GPU environment. This work has very important meaning for understanding these algorithms and may provide insights for further optimizations of the architecture of GPUs and accelerators.
What problem does this paper attempt to address?