Abstract:Low-rank Tucker and CP tensor decompositions are powerful tools in data analytics. The widely used alternating least squares (ALS) method, which solves a sequence of over-determined least squares subproblems, is costly for large and sparse tensors. We propose a fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems. Theoretical sketch size upper bounds are provided to achieve $O(\epsilon)$ relative error for each subproblem with two sketching techniques, TensorSketch and leverage score sampling. Experimental results show that this new ALS algorithm, combined with a new initialization scheme based on randomized range finder, yields up to $22.0\%$ relative decomposition residual improvement compared to the state-of-the-art sketched randomized algorithm for Tucker decomposition of various synthetic and real datasets. This Tucker-ALS algorithm is further used to accelerate CP decomposition, by using randomized Tucker compression followed by CP decomposition of the Tucker core tensor. Experimental results show that this algorithm not only converges faster, but also yields more accurate CP decompositions.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the computational efficiency and accuracy of low - rank Tucker and CP decompositions of large and sparse tensors. Specifically:
1. **Problem Background**:
- Tucker and CP decompositions are powerful data analysis tools, but the traditional alternating least squares (ALS) method has a high computational cost when dealing with large and sparse tensors.
- The ALS method solves through a series of over - determined least - squares sub - problems, and for large tensors, the computational complexity of these sub - problems is very high.
2. **Bottlenecks of Existing Methods**:
- For Tucker decomposition, the bottleneck of the ALS algorithm (called high - order orthogonal iteration HOOI) lies in the tensor - to - matrix - chain multiplication operation (TTMc), whose cost per sweep is \( \Omega(\text{nnz}(\mathcal{T})R) \), where \(\mathcal{T}\) is the input tensor, \(\text{nnz}(\mathcal{T})\) is the number of non - zero elements, and \(R\) is the target rank.
- For CP decomposition, the bottleneck of the ALS algorithm lies in the operation of matrix - ized tensor and Khatri - Rao product (MTTKRP), which also has a relatively high computational complexity.
3. **Solutions Proposed in the Paper**:
- A fast and accurate randomized ALS algorithm for Tucker decomposition is proposed, which accelerates the computation by solving a series of randomized rank - constrained linear least - squares sub - problems.
- Through two randomized techniques - TensorSketch and leverage score sampling, an upper bound on the theoretical sketch size is provided to ensure that the relative error of each sub - problem is within the range of \( O(\epsilon) \).
- An initialization scheme based on the randomized range finder (RRF) is introduced to improve the robustness and accuracy of leverage score sampling.
- The proposed Tucker - ALS algorithm is applied to accelerate CP decomposition. By first performing randomized Tucker compression and then performing CP decomposition on the core tensor, the efficiency and accuracy of CP decomposition are improved.
4. **Main Contributions**:
- A new randomized ALS algorithm for Tucker decomposition is proposed, and an upper bound on the theoretical sketch size is provided. Experimental results show that this algorithm can provide up to 22.0% improvement in relative decomposition residuals compared with existing methods.
- The advantages and disadvantages of TensorSketch and leverage score sampling in terms of efficiency and accuracy are compared. Theoretical analysis shows that leverage score sampling is superior in both aspects.
- An initialization scheme based on RRF is proposed to improve the accuracy of leverage score sampling on highly coherent tensors.
- It is shown that CP decomposition based on the randomized Tucker + CP method is more efficient and accurate than directly applying randomized CP - ALS.
In summary, this paper aims to significantly improve the efficiency and accuracy of low - rank decomposition of large and sparse tensors by introducing randomized techniques and optimizing the algorithm structure.