Improved Algorithm and Bounds for Successive Projection

Jiashun Jin,Zheng Tracy Ke,Gabriel Moryoussef,Jiajun Tang,Jingming Wang
2024-03-17
Abstract:Given a $K$-vertex simplex in a $d$-dimensional space, suppose we measure $n$ points on the simplex with noise (hence, some of the observed points fall outside the simplex). Vertex hunting is the problem of estimating the $K$ vertices of the simplex. A popular vertex hunting algorithm is successive projection algorithm (SPA). However, SPA is observed to perform unsatisfactorily under strong noise or outliers. We propose pseudo-point SPA (pp-SPA). It uses a projection step and a denoise step to generate pseudo-points and feed them into SPA for vertex hunting. We derive error bounds for pp-SPA, leveraging on extreme value theory of (possibly) high-dimensional random vectors. The results suggest that pp-SPA has faster rates and better numerical performances than SPA. Our analysis includes an improved non-asymptotic bound for the original SPA, which is of independent interest.
Machine Learning,Statistics Theory
What problem does this paper attempt to address?
This paper mainly discusses the problem of estimating the vertices of a simple polygon in high-dimensional space, especially in the presence of noise or outliers. The author proposes a new algorithm called pseudo-point successive projection algorithm (pp-SPA) to improve the existing successive projection algorithm (SPA). SPA performs poorly under strong noise or outliers, while pp-SPA generates pseudo-points by introducing projection steps and denoising steps, and then inputs these pseudo-points into SPA for vertex estimation. The main contributions of the paper include: 1. The introduction of the pp-SPA algorithm, which first estimates the hyperplane where the data points lie, projects all points onto this hyperplane, and adds a denoising step to reduce the noise impact through nearest neighbor averaging. 2. The provision of an improved non-asymptotic error bound for the original SPA, which is tighter than existing results, especially in some cases where the new bound is related to the singular values of the true hyperplane, while the old bound is related to possible zero singular values. 3. Analysis of the convergence speed of pp-SPA, proving its faster convergence compared to SPA, especially when the dimension is much larger than the number of vertices. The paper also discusses the potential applications of pp-SPA in different fields, such as hyperspectral unmixing, prototype analysis, network community analysis, and topic modeling. In addition, a comparison with other improved SPA methods is made, highlighting the advantages of pp-SPA in theory and practice. In conclusion, the paper addresses the problem of more accurately estimating the vertices of a simple polygon in noisy and outlier environments and proposes a new efficient algorithm.