A-Tucker: Fast Input-Adaptive and Matricization-Free Tucker Decomposition of Higher-Order Tensors on GPUs

Lian Duan,Chuanfu Xiao,Min Li,Mingshuo Ding,Chao Yang
DOI: https://doi.org/10.1007/s42514-022-00119-7
2022-01-01
Abstract:Tucker decomposition is one of the most popular models for analyzing and compressing large-scale tensorial data. Existing Tucker decomposition algorithms are usually based on a single solver to compute the factor matrices and intermediate tensor in a predetermined order, and are not flexible enough to adapt with the diversities of the input data and the hardware. Moreover, to exploit highly efficient matrix multiplication kernels, most Tucker decomposition implementations rely on explicit matricizations, which could introduce extra costs of data conversion. In this paper, we present a-Tucker, a new framework for input-adaptive and matricization-free Tucker decomposition of higher-order tensors on GPUs. A two-level flexible Tucker decomposition algorithm is proposed to enable the switch of different calculation orders and different factor solvers, and a machine-learning adaptive order-solver selector is applied to automatically cope with change of the application scenarios. To further improve the performance, we implement a-Tucker in a fully matricization-free manner without any conversion between tensors and matrices. Experiments show that a-Tucker can substantially outperform existing works while keeping similar accuracy with a variety of synthetic and real-world tensors.
What problem does this paper attempt to address?