A Holistic Approach for Distributed Dimensionality Reduction of Big Data

Liwei Kuang,Laurence T. Yang,Jinjun Chen,Fei Hao,Changqing Luo
DOI: https://doi.org/10.1109/tcc.2015.2449855
IF: 5.697
2015-01-01
IEEE Transactions on Cloud Computing
Abstract:With the exponential growth of data volume, big data have placed an unprecedented burden on current computing infrastructure. Dimensionality reduction of big data attracts a great deal of attention in recent years as an efficient method to extract the core data which is smaller to store and faster to process. This paper aims at addressing the three fundamental problems closely related to distributed dimensionality reduction of big data, i.e., big data fusion, dimensionality reduction algorithm and construction of distributed computing platform. A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders. A Lanczos based high order singular value decomposition algorithm is proposed to reduce dimensionality of the unified model. Theoretical analyses of the algorithm are provided in terms of storage scheme, convergence property and computation cost. To execute the dimensionality reduction task, this paper employs the transparent computing paradigm to construct a distributed computing platform as well as utilizes a four-objectives optimization model to schedule the tasks. Experimental results demonstrate that the proposed holistic approach is efficient for distributed dimensionality reduction of big data.
What problem does this paper attempt to address?