Optimizing Half Precision Winograd Convolution on ARM Many-Core Processors

Dedong Xie,Zhen Jia,Zili Zhang,Xin Jin
DOI: https://doi.org/10.1145/3546591.3547529
2022-01-01
Abstract:Convolutional Neural Networks (CNNs) are widely used in real world applications, e.g, computer vision. Winograd based convolution is usually applied due to its low computation complexity. For the underling hardware, ARM many-core CPUs, by their price performance, are favored by cloud providers like Amazon Web Services (AWS). However, existing Winograd convolution implementations for ARM architecture are mostly optimized for mobile devices, and usually cannot fully utilize hardware resources of many-core processors. In this paper, we propose HAWC, an optimized half precision floating-point (FP16) Winograd convolution implementation for ARM many-core processors. HAWC employs a series of optimization methods, which are suitable for ARM NEON architecture, and assembles them as an entire solution to improve performance. Our evaluation shows that HAWC achieves on average 10.74X and up to 27.56X speedup on representative convolution layers over state-of-the-art solutions.
What problem does this paper attempt to address?