Research on Dimensional Reduction of Sparse Matrix Data Based on Information Entropy

HE Xing-gao,LI Chan-juan,WANG Rui-jin,DENG Fu-hu,LIU Xing
DOI: https://doi.org/10.3969/j.issn.1001-0548.2018.02.012
2018-01-01
Abstract:Data dimensionality reduction is a necessary step in mining effective information from high-dimensional data. When applying the traditional principal component analysis (PCA) algorithm to high-dimensional sparse data dimensionality reduction, there is a problem that unable to read all data features at once into memory for analysis and calculation, furthermore, the improved block processing PCA algorithm also can not meet the actual requirements because of the time consuming. In this paper, we propose the E-PCA algorithm by introducing the concept of information entropy to improve the PCA algorithm. First, the useless features are eliminated through feature selection based on information entropy, and then PCA algorithm is used to reduce the dimensionality of large, high-dimensional sparse data. The experimental results show that in the case of keeping the same proportion of raw data, the information entropy-based E-PCA algorithm proposed in this paper is superior to block processing PCA algorithm in terms of memory usage, run time and the results of dimension reduction.
What problem does this paper attempt to address?