Lftf: A Framework for Efficient Tensor Analytics at Scale
Fan Yang,Fanhua Shang,Yuzhen Huang,James Cheng,Jinfeng Li,Yunjian Zhao,Ruihao Zhao
DOI: https://doi.org/10.14778/3067421.3067424
IF: 2.5
2017-01-01
Proceedings of the VLDB Endowment
Abstract:Tensors are higher order generalizations of matrices to model multi-aspect data, e.g., a set of purchase records with the schema (user_id, product_id, timestamp, feedback). Tensor factorization is a powerful technique for generating a model from a tensor, just like matrix factorization generates a model from a matrix, but with higher accuracy and richer information as more attributes are available in a higher- order tensor than a matrix. The data model obtained by tensor factorization can be used for classification, recommendation, anomaly detection, and so on. Though having a broad range of applications, tensor factorization has not been popularly applied compared with matrix factorization that has been widely used in recommender systems, mainly due to the high computational cost and poor scalability of existing tensor factorization methods. Efficient and scalable tensor factorization is particularly challenging because real world tensor data are mostly sparse and massive. In this paper, we propose a novel distributed algorithm, called Lock-Free Tensor Factorization (LFTF), which significantly improves the efficiency and scalability of distributed tensor factorization by exploiting asynchronous execution in a re-formulated problem. Our experiments show that LFTF achieves much higher CPU and network throughput than existing methods, converges at least 17 times faster and scales to much larger datasets.