Terminal Embeddings in Sublinear Time
Yeshwanth Cherapanamjeri,Jelani Nelson
DOI: https://doi.org/10.46298/theoretics.24.6
2021-10-17
Abstract:Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it
terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a
set of designated terminals $T\subset X$. Such an embedding $f$ is said to have
distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a
constant $C>0$ satisfying
\begin{equation*}
\forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho
d_X(x, q) .
\end{equation*}
When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional,
recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev,
Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is
achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for
$n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only
preserves distances within $T$ and not to $T$ from the rest of space. The
downside of prior work is that evaluating their embedding on some $q\in
\mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$
constraints in~$m$ variables and thus required some superlinear
$\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new
data structure for computing terminal embeddings. We show how to pre-process
$T$ to obtain an almost linear-space data structure that supports computing the
terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^*
(n^{1-\Theta(\epsilon^2)} + d)$. To accomplish this, we leverage tools
developed in the context of approximate nearest neighbor search.
Machine Learning,Computational Geometry,Data Structures and Algorithms