Data Visualization with Probabilistic Clustering and Neighbor Embedding

Xiaohui Liao,Jingqi Yan
DOI: https://doi.org/10.23919/chicc.2018.8482651
2018-01-01
Abstract:In the era of information explosion, processing and analyzing large-scale and high-dimensional data sets has become a big challenge for data mining and machine learning. In order to obtain and intuitively understand the information underlying the big data, an effective visualization technique is on demand. Many successful visualization techniques project high-dimensional data sets into low-dimensional spaces so that we can present data points in scatter plots, histograms or parallel coordinate plots. In this paper, we propose a new algorithm called PCNE, the algorithm first performs a probabilistic clustering algorithm for coarse classification on the data sets, and then reconstruct the joint probability with the heuristic information of classification results and neighborhood relationship. Our experimental results on the public data sets demonstrate that the PCNE algorithm outperforms the classical embedding algorithms in revealing both local and global structures of data.
What problem does this paper attempt to address?