A Versatile Acceleration Framework for Machine Learning Algorithms

Xianfeng Li,Yuanxun Wang
DOI: https://doi.org/10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00076
2019-01-01
Abstract:Current Machine Learning (ML) accelerators are based on custom designs targeted to specific algorithms, like the NPU coprocessor for neural networks. We propose a Versatile Acceleration Framework based on a key concept called Performance Semantics, which is an abstraction on data-level parallel behaviors for execution kernels nested in loops. It includes a Versatile Accelerator Pipeline and a software library of common Performance Semantics. The ML programmers only need to invoke the library functions in their algorithms and get accelerated transparently. We implement our framework on FPGA with an embedded ARM CPU, and test it with a set of popular ML algorithms. The results show that our framework can successfully cover the computation kernels in these ML algorithms, and achieves enormous performance speedup ranging from 15x to 40x over ARM CPU and 2.x to 12.x over x86 CPU.
What problem does this paper attempt to address?