Fuzzy Granular Principal Curves Algorithm For Large Data Sets

Hongyun Zhang,Duoqian Miao,Witold Pedrycz
DOI: https://doi.org/10.1109/IFSA-NAFIPS.2013.6608529
2013-01-01
Abstract:Principal curves, as a nonlinear generalization of principal components, are a common tool used in multivariate analysis for ends like dimensionality reduction and feature extraction. However, one of the difficulties that arise when utilizing this technique is that efficiency of existing principal curves algorithms is often low when dealing with large data set owing to high computational complexity. In the paper, a new method based on the idea of "information granulation and fuzzy sets" is proposed to improve efficiency and noise robustness. First, large amounts of numerical data are granulated into C interval (granular) data based on the fuzzy C-means cluster and two criteria of granulation, which significantly reduces the amount of data that is to be processed in the later step. Then granular principal curves are constructed according to the upper and the lower bounds of the interval data. Finally we introduce a quantitative index based on the parameter alpha to evaluate the fuzziness of granular principal curves output, where alpha is a positive parameter delivering some flexibility when optimizing the information granule. A series of numeric studies completed for synthetic data set provide a useful insight into the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?