Abstract:The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank-$r$ matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, $m$, satisfies $m \geq \mathcal{O}(n^{7/4}r^2 \log(n))$. With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to $m \geq \mathcal{O}(nr^2 \log(n))$. Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-$r$ matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the problem of determining the configuration of points from partial distance information (i.e., the Euclidean distance geometry problem, EDG problem)**. Specifically, the EDG problem is a fundamental problem in many applied science tasks, especially in cases where only partial distance information is available, how to accurately recover the positions of these points in space.
### Problem Background
In many practical applications, due to the influence of geography, climate or other factors, the measurement data may be incomplete. For example, in protein structure prediction, nuclear magnetic resonance (NMR) spectroscopy experiments can only provide distance information between neighboring protons, resulting in incomplete distance information. Similarly, in sensor networks, we may only know the distance between mobile nodes and fixed anchor points. In this case, how to infer the spatial configuration of points based on partial distance information is a key problem.
### Mathematical Description
To describe this problem mathematically, let \(\{p_i\}_{i = 1}^n\subset\mathbb{R}^r\) represent the positions of \(n\) points in \(\mathbb{R}^r\). Define the matrix \(P=[p_1,p_2,\ldots,p_n]\in\mathbb{R}^{r\times n}\), where each column represents a point. Two important mathematical objects are:
- **Gram matrix** \(X\in\mathbb{R}^{n\times n}\), defined as \(X = P^{\top}P\).
- **Squared distance matrix** \(D\in\mathbb{R}^{n\times n}\), defined as \(D_{ij}=\|p_i - p_j\|^2_2\).
When there is complete distance information, the configuration of points can be calculated by the Classical MDS (Classical Multidimensional Scaling) method. However, in actual scenarios, the distance matrix is usually incomplete, so new methods need to be developed to handle this incomplete data.
### Main Contributions of the Paper
The paper proposes two non - convex iterative algorithms based on the Riemannian optimization framework to solve the EDG problem. The main features of these algorithms include:
1. **Low - rank matrix completion**: Transform the problem into a completion problem of a low - rank Gram matrix.
2. **Initialization method**: Propose two different initialization methods and prove the error bound between the initialization and the true solution.
3. **Convergence and sample complexity**: Provide theoretical analysis to ensure that one of the algorithms converges locally to the true solution with high probability and prove the sample complexity required for the initialization method.
Through these methods, the paper aims to provide a provable non - convex algorithm and its initialization method for recovering the configuration of points from partial distance information.