Learning the Distribution of Data for Embedding

Yunpeng Shen,Pengfei Ren,Taiping Zhang,Yuan Yan Tang
DOI: https://doi.org/10.1109/ccbd.2016.020
2016-01-01
Abstract:One of the central problems in machine learning and pattern recognition is how to deal with high-dimensional data either for visualization or for classification and clustering. Most of dimensionality reduction technologies, designed to cope with the curse of dimensionality, are based on Euclidean distance metric. In this work, we propose an unsupervised nonlinear dimensionality reduction method which attempt to preserve the distribution of input data, called distribution preserving embedding (DPE). It is done by minimizing the dissimilarity between the densities estimated in the original and embedded spaces. In theory, patterns in data can effectively be described by the distribution of the data. Therefore, DPE is able to discover the intrinsic pattern (structure) of data, including the global structures and the local structures. Additionally, DPE can be extended to cope with out-of-sample problem naturally. Extensive experiments on different data sets compared with other competing methods are reported to demonstrate the effectiveness of the proposed approach.
What problem does this paper attempt to address?