Research on Distributed Heterogeneous Data PCA Algorithm Based on Cloud Platform

Jin Zhang,Gang Huang
DOI: https://doi.org/10.1063/1.5038988
2018-01-01
Abstract:Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
What problem does this paper attempt to address?