A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications

Hongquan JIANG,Gang WANG,Jianmin GAO,Zhiyong GAO,Ruiqi GAO,Qi GUO
DOI: https://doi.org/10.7652/xjtuxb201712008
2017-01-01
Abstract:Aiming at the problems caused by the nonlinear relations between the attributes of high dimensional data in cluster analysis,such as uneven distribution of data,invalidation of traditional similarity measures and difficulty of accurate representation of the result class,a clustering algorithm for high dimensional nonlinear feature data is proposed based on kernel principal component analysis (KPCA) and density clustering (DBSCAN).To extract the nonlinear characteristics of high dimensional data,the KPCA theory is adopted to map the original to a higher dimensional data space,thus a set of directions in principal component spacePCS for extracting the nonlinear characteristics of data and reduced dimensions can be obtained.The similarity distance of data in PCS is defined to improve the traditional DBSCAN clustering algorithm and 3δ statistical theory is used to characterize the clustering results.A case of hypertension group clustering is provided to illustrate the feasibility of the proposed method,and the results show that the proposed method can effectively obtain the nonlinear characteristics of the high dimensional data and realize cluster analysis and cluster center knowledge expression to solve the difficulties in the traditional DBSCAN clustering method for cluster analysis of high dimensional data.
What problem does this paper attempt to address?