SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq1-3047460.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></alternatives></inline-formula>Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Hao Li,Zixuan Li,Kenli Li,Jan S. Rellermeyer,Lydia Chen,Keqin Li
DOI: https://doi.org/10.1109/TPDS.2020.3047460
IF: 5.3
2021-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the High-Order, High-Dimension, and Sparse Tensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD Tucker features the two distinct advancements over the state of the art. First, SGD Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD Tucker enables fine-grained parallelization, which makes SGD Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD Tucker runs at least 2X faster than the state of the art.
What problem does this paper attempt to address?