Automatic Mapping and Code Optimization for OpenCL Kernels on FT-matrix Architecture (WIP Paper)

Xiaolei Zhao,Mei Wen,Zhaoyun Chen,Yang Shi,Chunyuan Zhang
DOI: https://doi.org/10.1145/3461648.3463845
2021-01-01
Abstract:FT-Matrix is a typical vector-SIMD architecture that refines the cooperation between scalar and vector units. This approach is widely used in digital signal processing, high-performance computing, and artificial intelligence, among other fields. FT-Matrix currently adopts C vector extension as the main programming model, improving the utilization efficiency of SIMD by providing explicit vector extension API. Moreover, it is difficult to efficiently transplant parallel programs (OpenCL, CUDA) adopted by users. This paper proposes an automatic mapping and code optimization method for OpenCL kernels on FT-Matrix architecture. The proposed approach solves these challenges by means of work item coalescing, slicing and rotation, and instruction-level code optimization. Preliminary results show that our method can achieve high performance and good hardware utilization for OpenCL kernels, as well as decreasing the programming difficulty on FT-Matrix.
What problem does this paper attempt to address?