Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

Qinglin Wang,Dongsheng Li,Xiandong Huang,Siqi Shen,Songzhu Mei,Jie Liu
DOI: https://doi.org/10.1007/978-3-030-57675-2_16
2020-01-01
Abstract:Convolutional Neural Networks (CNNs) are widely applied in various machine learning applications and very time-consuming. Most of CNNs' execution time is consumed by convolutional layers. A common approach to implementing convolutions is the FFT-based one, which can reduce the arithmetic complexity of convolutions without losing too much precision. As the performance of ARMv8 multi-core CPUs improves, they can also be utilized to perform CNNs like Intel X86 CPUs. In this paper, we present a new parallel FFT-based convolution implementation on ARMv8 multi-core CPUs. The implementation makes efficient use of ARMv8 multi-core CPUs through a series of computation and memory optimizations. The experiment results on two ARMv8 multicore CPUs demonstrate that our new implementation gives much better performance than two existing approaches in most cases.
What problem does this paper attempt to address?