Design and implementation of two-dimensional matrix convolution based on vector processor

Junyang ZHANG,Yang GUO
DOI: https://doi.org/10.11887/j.cn.201803011
2018-01-01
Abstract:In order to accelerate the computational speed of convolution neural network model and facilitate the implementation of large-scale neural network model in embedded microprocessor, the FT-matrix2000 vector processor architecture was taken as the research background.Through the analysis of the multi-core vector processor architecture and convolution neural network algorithm, a data layout scheme was proposed in which a smaller convolution kernel data was placed in a scalar memory bank and a larger convolution matrix was placed in a vector bank.Aimed at the problem that the data in the matrix convolution is hard to reuse, a dynamic shuffling pattern with different dynamic configurable parameters based on the moving steps of the convolution kernel was proposed, by carrying out different shift operations on the convolution matrix elements, the multiplexing rate of convolution matrix data was greatly improved.Aimed at the problem that two-dimensional matrix convolution is difficult to multi-core parallelism due to the existence of data correlation, a multi-core parallel scheme with convolution matrix sharing and convolution kernel matrix multi-core exclusive was proposed.Two computing methods of convolution kernel size unchanged, convolution matrix size changed and convolution matrix size unchanged and convolution kernel size changed were designed, a performance comparison and an analysis were carried out in mainstream CPU, GPU, TI6678 and FT-matrix2000.The final experimental results show that compared with the multi-core, the CPU can be accelerated up to 238 times, compared with TI6678, the speed can be accelerated 21 times, and compared with the high-performance GPU, the speed can accelerate 663 805 times.
What problem does this paper attempt to address?