Fitting a manifold to data in the presence of large noise
Charles Fefferman,Sergei Ivanov,Matti Lassas,Hariharan Narayanan
2023-12-19
Abstract:We assume that $M_0$ is a $d$-dimensional $C^{2,1}$-smooth submanifold of $R^n$. Let $K_0$ be the convex hull of $M_0,$ and $B^n_1(0)$ be the unit ball. We assume that $ M_0 \subseteq \partial K_0 \subseteq B^n_1(0).$ We also suppose that $M_0$ has volume ($d$-dimensional Hausdorff measure) less or equal to $V$, reach (i.e., normal injectivity radius) greater or equal to $\tau$.
Moreover, we assume that $M_0$ is $R$-exposed, that is, tangent to every point $x \in M$ there is a closed ball of radius $R$ that contains $M$. Let $x_1, \dots, x_N$ be independent random variables sampled from uniform distribution on $M_0$ and
$\zeta_1, \dots, \zeta_N$ be a sequence of i.i.d Gaussian random variables in $R^n$ that are independent of $x_1, \dots, x_N$ and have mean zero and covariance $\sigma^2 I_n.$ We assume that we are given the noisy sample points $y_i$, given by $$ y_i = x_i + \zeta_i,\quad \hbox{ for }i = 1, 2, \dots,N. $$ Let $\epsilon,\eta>0$ be real numbers and $k\geq 2$. Given points $y_i$, $i=1,2,\dots,N$, we produce a $C^k$-smooth function which zero set is a manifold $M_{rec}\subseteq R^n$ such that the Hausdorff distance between $M_{rec}$ and $M_0$ is at most $ \epsilon$ and $M_{rec}$ has reach that is bounded below by $c\tau/d^6$ with probability at least $1 - \eta.$ Assuming $d < c \sqrt{\log \log n}$ and all the other parameters are positive constants independent of $n$, the number of the needed arithmetic operations is polynomial in $n$. In the present work, we allow the noise magnitude $\sigma$ to be an arbitrarily large constant, thus overcoming a drawback of previous work.
Statistics Theory,Differential Geometry