Abstract:For a set $X$ of $N$ points in $\mathbb{R}^D$, the Johnson-Lindenstrauss lemma provides random linear maps that approximately preserve all pairwise distances in $X$ -- up to multiplicative error $(1\pm \epsilon)$ with high probability -- using a target dimension of $O(\epsilon^{-2}\log(N))$. Certain known point sets actually require a target dimension this large -- any smaller dimension forces at least one distance to be stretched or compressed too much. What happens to the remaining distances? If we only allow a fraction $\eta$ of the distances to be distorted beyond tolerance $(1\pm \epsilon)$, we show a target dimension of $O(\epsilon^{-2}\log(4e/\eta)\log(N)/R)$ is sufficient for the remaining distances. With the stable rank of a matrix $A$ as $\lVert{A\rVert}_F^2/\lVert{A\rVert}^2$, the parameter $R$ is the minimal stable rank over certain $\log(N)$ sized subsets of $X-X$ or their unit normalized versions, involving each point of $X$ exactly once. The linear maps may be taken as random matrices with i.i.d. zero-mean unit-variance sub-gaussian entries. When the data is sampled i.i.d. as a given random vector $\xi$, refined statements are provided; the most improvement happens when $\xi$ or the unit normalized $\widehat{\xi-\xi'}$ is isotropic, with $\xi'$ an independent copy of $\xi$, and includes the case of i.i.d. coordinates.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to approximately maintain the distances between most pairs of points even under a relatively small target dimension when performing dimension reduction. Specifically, the paper explores whether at least a proportion of $(1 - \eta)$ of the distances can be approximately preserved within a certain error range when the target dimension $k$ is less than the dimension $D_{JL}=O(\epsilon^{-2}\log(N^{2}))$ required by the traditional Johnson - Lindenstrauss (JL) lemma. Here, $\eta$ is a proportion value less than 1, representing the proportion of distances that can be tolerated not to be approximately preserved. ### Background and Motivation The Johnson - Lindenstrauss lemma provides a method to project a set of points in a high - dimensional space to a lower - dimensional space through a random linear mapping while approximately maintaining the distances between all pairs of points. The traditional JL lemma requires that the target dimension $k$ is at least $O(\epsilon^{-2}\log(N))$ to ensure that all distances are approximately preserved within the range of $(1\pm\epsilon)$, where $\epsilon$ is the error tolerance and $N$ is the number of points. However, for some algorithms, especially those whose computational complexity grows exponentially in high - dimensional spaces (such as nearest - neighbor search), even if the JL lemma is used for pre - processing, the target dimension $k$ may still be too large, resulting in inefficiency in practical applications. Therefore, researchers began to explore whether most distances can still be approximately maintained under a smaller target dimension. ### Main Contributions The main contributions of the paper include: 1. **Theoretical Results**: - The paper proves that when the target dimension $k$ is $O\left(\frac{\epsilon^{-2}\log(4e/\eta)\log(N)}{R}\right)$, a proportion of $(1 - \eta)$ of the distances can be approximately maintained. Here, $R$ is the minimum stable rank of the matrix, defined as $\frac{\|A\|_F^{2}}{\|A\|^{2}}$. - For independently and identically distributed (i.i.d.) data, the paper provides more refined results, especially in cases where the data is isotropic or becomes isotropic after unit normalization. 2. **Technical Means**: - The paper introduces the Walecki construction, which is a method of decomposing a complete graph $K_N$ into multiple cycles, each cycle containing $N$ vertices. This method helps to control the approximate preservation of distances in smaller batches. - The paper also utilizes the concept of stable rank and probabilistic tools such as the Hanson - Wright inequality to analyze the performance of random matrices in dimension reduction. ### Conclusion The paper shows that through appropriate random linear mappings, most distances between pairs of points can be approximately maintained under a smaller target dimension. This result is of great significance for improving the efficiency of high - dimensional data processing, especially in scenarios requiring large - scale data processing.

Bulk Johnson-Lindenstrauss Lemmas

Improving the Johnson-Lindenstrauss Lemma

An Analysis of the Johnson-Lindenstrauss Lemma with the Bivariate Gamma Distribution

Simple, unified analysis of Johnson-Lindenstrauss with applications

On Outer Bi-Lipschitz Extensions of Linear Johnson-Lindenstrauss Embeddings of Subsets of $\mathbb{R}^N$

A class of sparse Johnson--Lindenstrauss transforms and analysis of their extreme singular values

On Sparsity and Sub-Gaussianity in the Johnson-Lindenstrauss Lemma

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey

A distance theorem for inhomogenous random rectangular matrices

Optimization Can Learn Johnson Lindenstrauss Embeddings

LIL and the Approximation of Rectangular Sums of B-valued Random Variables when Extreme Terms Are Excluded

The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation: From Coresets to Dimension Reduction

Surrounding the solution of a linear system of equations from all sides

Efficient Certificates of Anti-Concentration Beyond Gaussians

The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy

Invariance principle of random projection for the norm

Quenched large deviation principles for random projections of $\ell_p^n$ balls

Various issues around the L1-norm distance

Random zero sets with local growth guarantees