Matrix-based Kernel Principal Component Analysis for Large-Scale Data Set.

Weiya Shi,Yue-Fei Guo,Xiangyang Xue
DOI: https://doi.org/10.1109/ijcnn.2009.5178692
2009-01-01
Abstract:Kernel Principal Component Analysis (KPCA) is a nonlinear feature extraction approach, which generally needs to eigen-decompose the kernel matrix. But the size of kernel matrix scales with the number of data points, it is infeasible to store and compute the kernel matrix when faced with the large-scale data set. To overcome computational and storage problem for large-scale data set, a new framework, Matrixbased Kernel Principal Component Analysis (M-KPCA), is proposed. By dividing the large scale data set into small subsets, we could treat the autocorrelation matrix of each subset as the special computational unit. A novel polynomial-matrix kernel function is adopted to compute the similarity between the data matrices in place of vectors. It is also proved that the polynomial kernel is the extreme case of the polynomial-matrix one. The proposed M-KPCA can greatly reduce the size of kernel matrix, which makes its computation possible. The effectiveness is demonstrated by the experimental results on the artificial and real data set.
What problem does this paper attempt to address?