FVec2vec: A Fast Nonlinear Dimensionality Reduction Approach for General Data.

Xiaoli Ren,Kefeng Deng,Kaijun Ren,Junqiang Song,Xiaoyong Li,Qing Xu
DOI: https://doi.org/10.1109/BigData55660.2022.10020682
IF: 4.426
2022-01-01
Big Data
Abstract:Dimensionality reduction is a fundamental technique to address the curse of dimensionality problem in real-world big datasets. However, most existing methods either only target raw datasets that contain explicit relationships between data points, or construct the complete neighborhood graph of the dataset by calculating pairwise similarities, and then generate contexts of data points by random walking to measure the structure of the dataset, which are computationally expensive. In this paper, we propose a fast nonlinear locality-preserving dimensionality reduction approach called FVec2vec, which extends the Skip-gram model to embedding representation of general numerical matrices. Specifically, instead of constructing neighborhood graph by calculating pairwise similarities between data points, we approximate the k-nearest neighbors (kNN) of each data point in matrices by exploring its neighbors’ neighbors first. Then, we design a novel sampling algorithm to randomly sample on the kNN to depict the structure of the dataset. Experimental results show that FVec2vec is faster than most existing methods while achieving acceptable accuracy, and the accuracy is even higher than the state-of-the-art method under certain similarity metrics.
What problem does this paper attempt to address?