Manifold-based Denoising, Outlier Detection, and Dimension Reduction Algorithm for High-Dimensional Data

Guanghua Zhao,Tao Yang,Dongmei Fu
DOI: https://doi.org/10.1007/s13042-023-01873-y
2023-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Manifold learning, which has emerged in recent years, plays an increasingly important role in machine learning. However, because inevitable noises and outliers destroy the manifold structure of data, the dimensionality reduction effect of manifold learning will be reduced. Therefore, this paper proposes a denoising algorithm for high-dimensional data based on manifold learning. The algorithm first projects noisy sample vectors onto the local manifold, thereby achieving noise reduction. Then, a statistical analysis of noises is performed to obtain a data boundary. Because all the data come from the same background and obey the same distribution, the sample vectors that are not within the data boundary are marked as outliers, and these outliers are eliminated. Finally, the dimension reduction of the data after noise reduction and outlier detection is performed. Experimental results show that the algorithm can effectively eliminate the interference of noises and outliers in high-dimensional datasets to some extent for manifold learning.
What problem does this paper attempt to address?