Robust estimation of precision matrices under cellwise contamination

Garth Tarr,Samuel Müller,Neville C. Weber

DOI: https://doi.org/10.1016/j.csda.2015.02.005

2015-01-09

Abstract:There is a great need for robust techniques in data mining and machine learning contexts where many standard techniques such as principal component analysis and linear discriminant analysis are inherently susceptible to outliers. Furthermore, standard robust procedures assume that less than half the observation rows of a data matrix are contaminated, which may not be a realistic assumption when the number of observed features is large. This work looks at the problem of estimating covariance and precision matrices under cellwise contamination. We consider using a robust pairwise covariance matrix as an input to various regularisation routines, such as the graphical lasso, QUIC and CLIME. To ensure the input covariance matrix is positive semidefinite, we use a method that transforms a symmetric matrix of pairwise covariances to the nearest covariance matrix. The result is a potentially sparse precision matrix that is resilient to moderate levels of cellwise contamination. Since this procedure is not based on subsampling it scales well as the number of variables increases.

Methodology

What problem does this paper attempt to address?

This paper aims to solve the problem that standard techniques such as principal component analysis (PCA) and linear discriminant analysis (LDA) are sensitive to outliers in the context of data mining and machine learning, especially in high - dimensional data. In addition, standard robust estimation methods assume that less than half of the observed rows in the data matrix are contaminated, which may not be a realistic assumption when the number of features is large. Therefore, this paper focuses on the problem of how to robustly estimate the covariance matrix and the precision matrix in the case of cellwise contamination. Specifically, the paper proposes a method of using a robust pairwise covariance matrix as an input to various regularization procedures (such as graphical lasso, QUIC, and CLIME) to ensure that the input covariance matrix is positive semi - definite. Through this method, a potentially sparse precision matrix that is still robust under a moderate level of cellwise contamination can be obtained. Since this process is not based on sub - sampling, it has good scalability as the number of variables increases. The paper also conducts a detailed simulation study to evaluate the performance of multiple precision matrix estimators under different scenarios and contamination levels, and uses a series of performance indicators to comprehensively evaluate the results. The research shows that in the presence of cellwise contamination, the pairwise covariance estimation method can handle a higher level of cellwise contamination than existing classical robust estimators. This is an innovative result in this field and marks an important progress in dealing with cellwise contamination.

Robust estimation of precision matrices under cellwise contamination

Robust high-dimensional precision matrix estimation

Robust covariance estimation with missing values and cell-wise contamination

Robust Online Covariance and Sparse Precision Estimation Under Arbitrary Data Corruption

Low Rank Matrix Recovery with Simultaneous Presence of Outliers and Sparse Corruption

Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Adjusting For High-Dimensional Covariates In Sparse Precision Matrix Estimation By L(1)-Penalization

Fast robust correlation for high-dimensional data

An overview of the estimation of large covariance and precision matrices

Partial correlation screening for estimating large precision matrices, with applications to classification

Robust covariance estimation and explainable outlier detection for matrix-valued data

Penalized Likelihood Approach to Covariance Matrix Estimation From Data With Cell Outliers

Adjusting for high-dimensional covariates in sparse precision matrix estimation by ℓ1-penalization

Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices Via Convex Optimization.

A New Sparse and Robust Adaptive Lasso Estimator for the Independent Contamination Model

Covariate-adjusted Precision Matrix Estimation with an Application in Genetical Genomics

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

Robust Regression with Covariate Filtering: Heavy Tails and Adversarial Contamination