Abstract:In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task from a statistical learning perspective, i.e. by carrying out a nonparametric finite-sample predictive analysis. Given $d\geq 1$ values taken by a realization of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure, at sites $s_1,\; \ldots,\; s_d$ in $S$, the goal is to predict the unknown values it takes at any other location $s\in S$ with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization $X'$ of $X$, independent from those to be predicted, observed at $n\geq 1$ locations $\sigma_1,\; \ldots,\; \sigma_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non independent and identically distributed nature of the training data $X'_{\sigma_1},\; \ldots,\; X'_{\sigma_n}$ involved in the learning procedure. In this article, non-asymptotic bounds of order $O_{\mathbb{P}}(1/\sqrt{n})$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results are illustrated by various numerical experiments, on simulated data and on real-world datasets.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is, in the context of the big data era, how to use statistical learning theory to handle prediction tasks in large - scale datasets with complex spatial - dependence structures, especially Simple Kriging tasks. Traditionally, the standard probabilistic - statistical learning theory is not directly applicable to this type of data because these data usually do not satisfy the independent and identically distributed (i.i.d.) assumption. Therefore, the main objective of the paper is to analyze the performance of Simple Kriging tasks from the perspective of statistical learning and attempt to establish the non - asymptotic bounds of its generalization ability.
Specifically, the paper focuses on a square - integrable random field \(X = \{X_s\}_{s\in S}\), where \(S\subset\mathbb{R}^2\) and the covariance structure of \(X\) is unknown. Suppose that the values of \(X\) are observed at \(d\geq1\) positions \(s_1,\ldots,s_d\in S\), and the goal is to predict the value of \(X\) at any other position \(s\in S\) so as to minimize the prediction error. The training dataset is a single realization \(X'\) independent of the values to be predicted, and is observed at \(n\geq1\) positions \(\sigma_1,\ldots,\sigma_n\in S\).
Although this problem is related to Kernel Ridge Regression (KRR), since the training data \(X'_{\sigma_1},\ldots,X'_{\sigma_n}\) are not independent and identically distributed, it is not easy to establish the generalization ability of the empirical - risk minimizer. The main contribution of the paper is to prove that in an isotropic stationary Gaussian process, when the observation points form a regular grid, the non - asymptotic bound of the excess risk of the plug - in prediction rule is \(O_P\left(\frac{1}{\sqrt{n}}\right)\).
These theoretical results are verified by numerical experiments, including experiments on simulated data and actual datasets, which lay the foundation for the development of statistical learning methods based on spatial data.