Abstract:The successive projection algorithm (SPA) is a workhorse algorithm to learn the $r$ vertices of the convex hull of a set of $(r-1)$-dimensional data points, a.k.a. a latent simplex, which has numerous applications in data science. In this paper, we revisit the robustness to noise of SPA and several of its variants. In particular, when $r \geq 3$, we prove the tightness of the existing error bounds for SPA and for two more robust preconditioned variants of SPA. We also provide significantly improved error bounds for SPA, by a factor proportional to the conditioning of the $r$ vertices, in two special cases: for the first extracted vertex, and when $r \leq 2$. We then provide further improvements for the error bounds of a translated version of SPA proposed by Arora et al. (''A practical algorithm for topic modeling with provable guarantees'', ICML, 2013) in two special cases: for the first two extracted vertices, and when $r \leq 3$. Finally, we propose a new more robust variant of SPA that first shifts and lifts the data points in order to minimize the conditioning of the problem. We illustrate our results on synthetic data.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper mainly explores and improves the robustness of the **Successive Projection Algorithm (SPA)** and its variants when dealing with noisy data. Specifically, the research objectives include: 1. **Re - evaluate the robustness of SPA and its variants**: - The paper re - analyzes the performance of SPA and its two more robust pre - processing variants in the face of noise and proves the tightness of the existing error bounds, especially in the case of $ r\geq3 $. 2. **Provide improved error bounds**: - For special cases (such as extracting the first vertex or when $ r\leq2 $), significantly improved error bounds are provided. These improvements are proportional to the condition number $ K(W) $, rather than its square. 3. **Improve the error bounds of the translated - version SPA (T - SPA)**: - For the translated - version SPA proposed by Arora et al., the error bounds are further improved in two special cases (such as extracting the first two vertices or when $ r\leq3 $). 4. **Propose a new robust variant**: - A new SPA variant is proposed. By first translating and then lifting the data points to minimize the condition number of the problem, the robustness is improved. 5. **Verify theoretical results**: - Numerical experiments with synthetic data are used to compare the performance of different SPA variants to verify the validity of the theoretical findings. ### Background and Motivation The Simplex - Structured Matrix Factorization (SSMF) problem is a fundamental problem in signal processing, data analysis, and machine learning. Specific applications include chemometrics, hyperspectral imaging, audio source separation, topic modeling, and community detection, etc. The goal of SSMF is to recover the latent simplex from the observed noisy data points. However, the existing SPA and its variants have certain limitations in the face of noise, so in - depth research and improvement on their robustness are required. ### Main Contributions 1. **Improved error bounds**: The error bounds for the first step of SPA and in specific cases are improved, from $ O(\epsilon K^{2}(W)) $ to $ O(\epsilon K(W)) $. 2. **Improvement of translated - version SPA**: For the translated - version SPA (T - SPA), similar improvements are also obtained in specific cases. 3. **New robust variant**: A new SPA variant is proposed. Through pre - processing steps (translating and lifting data points), the robustness is improved. 4. **Theoretical verification**: Theoretical results are verified through numerical experiments, demonstrating the superior performance of the new method in adversarial settings. ### Summary This paper provides more effective tools for dealing with noisy data by re - evaluating and improving the robustness of SPA and its variants, especially for the simplex - structured matrix factorization problem in high - dimensional data and complex application scenarios.

On the Robustness of the Successive Projection Algorithm

Successive Projection Algorithm Robust to Outliers

Improved Algorithm and Bounds for Successive Projection

Convergence of Projected Subgradient Method with Sparse or Low-Rank Constraints

Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering

A Nonconvex Projection Method for Robust PCA

A re-examination to the SCoTLASS problems for SPCA and two projection-based methods for them

Successive Nonnegative Projection Algorithm for Robust Nonnegative Blind Source Separation

Smoothed separable nonnegative matrix factorization

BRIDGING CONVEX AND NONCONVEX OPTIMIZATION IN ROBUST PCA: NOISE, OUTLIERS, AND MISSING DATA

Robust Learning from Noisy Side-information by Semidefinite Programming

Probabilistic Recovery of Multiple Subspaces in Point Clouds by Geometric lp Minimization

Robust Statistical Estimation and Segmentation of Multiple Subspaces

Efficient Low-Rank Semidefinite Programming With Robust Loss Functions

Solving sparse principal component analysis with global support

A Robust Recovery Algorithm with Smoothing Strategies.

New nonasymptotic convergence rates of stochastic proximal pointalgorithm for convex optimization problems

Robust K-Subspaces Recovery With Combinatorial Initialization

Weakly Convex Regularized Robust Sparse Recovery Methods with Theoretical Guarantees

Adaptive Stochastic Gradient Descent on the Grassmannian for Robust Low-Rank Subspace Recovery

Projected Randomized Smoothing for Certified Adversarial Robustness