Abstract:With the advent of massive data sets, much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity, and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable, and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper, we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kd-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well-conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the best linear unbiased predictor (BLUP) and generalized least squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases, the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing, and computational applied mathematics.

Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets

Large-Dimensional Positive Definite Covariance Estimation for High Frequency Data via Low-rank and Sparse Matrix Decomposition

A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization

TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression

Spatial best linear unbiased prediction: a computational mathematics approach for high dimensional massive datasets

Covariance prediction via convex optimization

Self-Supervised Learning for Covariance Estimation

Covariance Regression with High-Dimensional Predictors

Efficient Covariance Estimation from Temporal Data

A Best Linear Empirical Bayes Method for High-Dimensional Covariance Matrix Estimation

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes

Covariance Discriminative Learning: A Natural and Efficient Approach to Image Set Classification

Large covariance matrix estimation via penalized log-det heuristics

Learning from a lot: Empirical Bayes in high-dimensional prediction settings

User Modelling for Avoiding Overfitting in Interactive Knowledge Elicitation for Prediction

A Projection Approach to Local Regression with Variable-Dimension Covariates

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Bayesian Nonparametric Covariance Regression

On the Benefits of Active Data Collection in Operator Learning

Fast and Positive Definite Estimation of Large Covariance Matrix for High-Dimensional Data Analysis