Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture

Bangyan Wang,Lei Deng,Zheng Qu,Shuangchen Li,Zheng Zhang,Yuan Xie
DOI: https://doi.org/10.1109/tc.2020.3046617
IF: 3.183
2022-02-01
IEEE Transactions on Computers
Abstract:We propose a novel architecture to efficiently perform sparse tensor decomposition/completion. As the generalization of vectors and matrices, tensors are widely used to process high-dimensional data. Sparse tensor decomposition (SpTD) is not only an emerging tensor analysis technique but also an effective tool to reduce the storage and computation costs of tensors. However, conventional general-purpose processors are inefficient to perform SpTD, mainly due to: i) variable sparsity degree and flexible buffer size requirement; ii) difficulties of fusing multiple execution kernels to pursue better performance. For domain-specific accelerator designers on the other hand, the diversity of decomposition algorithms is also an important problem that must be considered. To solve these challenges, we propose a unified abstraction for SpTD algorithms and design a specialized accelerator. First, we formulate two types of core kernels (SpLrMM and LrSampling) that serve as a standard form to fit a broad range of SpTD algorithms. Second, we design a sparse tensor engine (STE) to efficiently perform SpTD. STE uses a processing element (PE)-interactive architecture where PEs can be flexibly grouped together via Network-on-Chip (NoC) to share the buffer capacity, bandwidth, and compute resources. We evaluate our accelerator with extensive experiments, and it can achieve an average speedup of 45× over CPU and 29× over GPU.
engineering, electrical & electronic,computer science, hardware & architecture
What problem does this paper attempt to address?