Partial correlation screening for estimating large precision matrices, with applications to classification
Shiqiong Huang,Jiashun Jin,Zhigang Yao
DOI: https://doi.org/10.1214/15-aos1392
2016-10-01
The Annals of Statistics
Abstract:Given $n$ samples $X_{1},X_{2},\ldots,X_{n}$ from $N(0,\Sigma)$, we are interested in estimating the $p\times p$ precision matrix $\Omega=\Sigma^{-1}$; we assume $\Omega$ is sparse in that each row has relatively few nonzeros. We propose Partial Correlation Screening (PCS) as a new row-by-row approach. To estimate the $i$th row of $\Omega$, $1\leq i\leq p$, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index $j$ that has the largest empirical partial correlation (in magnitude) with $i$, given the set of indices recruited so far. In the Clean step, PCS reinvestigates all recruited indices, removes false positives and uses the resultant set of indices to reconstruct the $i$th row. PCS is computationally efficient and modest in memory use: to estimate a row of $\Omega$, it only needs a few rows (determined sequentially) of the empirical covariance matrix. PCS is able to execute an estimation of a large $\Omega$ (e.g., $p=10K$) in a few minutes. Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential, we need a good estimate of $\Omega$. Note that given an estimate of $\Omega$, we can always combine it with HCT to build a classifier (e.g., HCT-PCS, HCT-glasso). We have applied HCT-PCS to two microarray data sets ($p=8K$ and $10K$) for classification, where it not only significantly outperforms HCT-glasso, but also is competitive to the Support Vector Machine (SVM) and Random Forest (RF). These suggest that PCS gives more useful estimates of $\Omega$ than the glasso; we study this carefully and have gained some interesting insight. We show that in a broad context, PCS fully recovers the support of $\Omega$ and HCT-PCS is optimal in classification. Our theoretical study sheds interesting light on the behavior of stage-wise procedures.
statistics & probability