Mutation frequency and type during ageing in mouse seminiferous tubules

S. Martin,C. Hopkins,Anne Naumer,M. Dollé,J. Vijg

DOI: https://doi.org/10.1016/S0047-6374(01)00267-6

IF: 5.498

2001-09-01

Mechanisms of Ageing and Development

Abstract:

What problem does this paper attempt to address?

Sparse permutation invariant covariance estimation

Adam J. Rothman,Peter J. Bickel,Elizaveta Levina,Ji Zhu

DOI: https://doi.org/10.1214/08-EJS176

2008-06-26

Abstract:The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty. We establish a rate of convergence in the Frobenius norm as both data dimension $p$ and sample size $n$ are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlation-based version of the method exhibits better rates in the operator norm. We also derive a fast iterative algorithm for computing the estimator, which relies on the popular Cholesky decomposition of the inverse but produces a permutation-invariant estimator. The method is compared to other estimators on simulated data and on a real data example of tumor tissue classification using gene expression data.

Statistics Theory
Towards a sparse, scalable, and stably positive definite (inverse) covariance estimator

Sang-Yun Oh,Bala Rajaratnam,Joong-Ho Won

DOI: https://doi.org/10.48550/arXiv.1502.00471

2016-06-27

Abstract:High dimensional covariance estimation and graphical models is a contemporary topic in statistics and machine learning having widespread applications. An important line of research in this regard is to shrink the extreme spectrum of the covariance matrix estimators. A separate line of research in the literature has considered sparse inverse covariance estimation which in turn gives rise to graphical models. In practice, however, a sparse covariance or inverse covariance matrix which is simultaneously well-conditioned and at the same time computationally tractable is desired. There has been little research at the confluence of these three topics. In this paper we consider imposing a condition number constraint to various types of losses used in covariance and inverse covariance matrix estimation. When the loss function can be decomposed as a sum of an orthogonally invariant function of the estimate and its inner product with a function of the sample covariance matrix, we show that a solution path algorithm can be derived, involving a series of ordinary differential equations. The path algorithm is attractive because it provides the entire family of estimates for all possible values of the condition number bound, at the same computational cost of a single estimate with a fixed upper bound. An important finding is that the proximal operator for the condition number constraint, which turns out to be very useful in regularizing loss functions that are not orthogonally invariant and may yield non-positive-definite estimates, can be efficiently computed by this path algorithm. As a concrete illustration of its practical importance, we develop an operator-splitting algorithm that imposes a guarantee of well-conditioning as well as positive definiteness to recently proposed convex pseudo-likelihood based graphical model selection methods.

Methodology
Penalized Sparse Covariance Regression with High Dimensional Covariates

Yuan Gao,Zhiyuan Zhang,Zhanrui Cai,Xuening Zhu,Tao Zou,Hansheng Wang

DOI: https://doi.org/10.1080/07350015.2024.2415109

2024-01-01

Abstract:Covariance regression offers an effective way to model the large covariance matrix with the auxiliary similarity matrices. In this work, we propose a sparse covariance regression (SCR) approach to handle the potentially high-dimensional predictors (i.e., similarity matrices). Specifically, we use the penalization method to identify the informative predictors and estimate their associated coefficients simultaneously. We first investigate the Lasso estimator and subsequently consider the folded concave penalized estimation methods (e.g., SCAD and MCP). However, the theoretical analysis of the existing penalization methods is primarily based on i.i.d. data, which is not directly applicable to our scenario. To address this difficulty, we establish the non-asymptotic error bounds by exploiting the spectral properties of the covariance matrix and similarity matrices. Then, we derive the estimation error bound for the Lasso estimator and establish the desirable oracle property of the folded concave penalized estimator. Extensive simulation studies are conducted to corroborate our theoretical results. We also illustrate the usefulness of the proposed method by applying it to a Chinese stock market dataset.
Nonparametric estimation of large covariance matrices with conditional sparsity

Hanchao Wang,Bin Peng,Degui Li,Chenlei Leng

DOI: https://doi.org/10.1016/j.jeconom.2020.09.002

IF: 3.363

2021-07-01

Journal of Econometrics

Abstract:<p>This paper studies estimation of covariance matrices with conditional sparse structure. We overcome the challenge of estimating dense matrices using a factor structure, the challenge of estimating large-dimensional matrices by postulating sparsity on covariance of random noises, and the challenge of estimating varying matrices by allowing factor loadings to smoothly change. A kernel-weighted estimation approach combined with generalised shrinkage is proposed. Under some technical conditions, we derive uniform consistency for the developed estimation method and obtain convergence rates. Numerical studies including simulation and an empirical application are presented to examine the finite-sample performance of the developed methodology.</p>

economics,social sciences, mathematical methods,mathematics, interdisciplinary applications
Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation

Konstantinos Benidis,Ying Sun,Prabhu Babu,Daniel P. Palomar

DOI: https://doi.org/10.1109/tsp.2016.2605073

IF: 4.875

2016-12-01

IEEE Transactions on Signal Processing

Abstract:The problem of estimating sparse eigenvectors of a symmetric matrix has attracted a lot of attention in many applications, especially those with a high dimensional dataset. While classical eigenvectors can be obtained as the solution of a maximization problem, existing approaches formulate this problem by adding a penalty term into the objective function that encourages a sparse solution. However, the vast majority of the resulting methods achieve sparsity at the expense of sacrificing the orthogonality property. In this paper, we develop a new method to estimate dominant sparse eigenvectors without trading off their orthogonality. The problem is highly nonconvex and hard to handle. We apply the minorizationmaximization framework, wherein we iteratively maximize a tight lower bound (surrogate function) of the objective function over the Stiefel manifold. The inner maximization problem turns out to be a rectangular Procrustes problem, which has a closed-form solution. In addition, we propose a method to improve the covariance estimation problem when its underlying eigenvectors are known to be sparse. We use the eigenvalue decomposition of the covariance matrix to formulate an optimization problem wherein we impose sparsity on the corresponding eigenvectors. Numerical experiments show that the proposed eigenvector extraction algorithm outperforms existing algorithms in terms of support recovery and explained variance, whereas the covariance estimation algorithms improve the sample covariance estimator significantly.

engineering, electrical & electronic
Covariance Structure Estimation with Laplace Approximation

Bongjung Sung,Jaeyong Lee

DOI: https://doi.org/10.48550/arXiv.2111.02637

2021-12-06

Abstract:Gaussian covariance graph model is a popular model in revealing underlying dependency structures among random variables. A Bayesian approach to the estimation of covariance structures uses priors that force zeros on some off-diagonal entries of covariance matrices and put a positive definite constraint on matrices. In this paper, we consider a spike and slab prior on off-diagonal entries, which uses a mixture of point-mass and normal distribution. The point-mass naturally introduces sparsity to covariance structures so that the resulting posterior from this prior renders covariance structure learning. Under this prior, we calculate posterior model probabilities of covariance structures using Laplace approximation. We show that the error due to Laplace approximation becomes asymptotically marginal at some rate depending on the posterior convergence rate of covariance matrix under the Frobenius norm. With the approximated posterior model probabilities, we propose a new framework for estimating a covariance structure. Since the Laplace approximation is done around the mode of conditional posterior of covariance matrix, which cannot be obtained in the closed form, we propose a block coordinate descent algorithm to find the mode and show that the covariance matrix can be estimated using this algorithm once the structure is chosen. Through a simulation study based on five numerical models, we show that the proposed method outperforms graphical lasso and sample covariance matrix in terms of root mean squared error, max norm, spectral norm, specificity, and sensitivity. Also, the advantage of the proposed method is demonstrated in terms of accuracy compared to our competitors when it is applied to linear discriminant analysis (LDA) classification to breast cancer diagnostic dataset.

Methodology,Statistics Theory
Large-Dimensional Positive Definite Covariance Estimation for High Frequency Data via Low-rank and Sparse Matrix Decomposition

Liyuan Cui,Yongmiao Hong,Yingxing Li,Junhui Wang

DOI: https://doi.org/10.2139/ssrn.3414910

2019-01-01

Abstract:This paper proposes a novel covariance estimator via a machine learning approach when both the sampling frequency and covariance dimension are large. Assuming that a large covariance matrix can be decomposed into low rank and sparse components, our method simultaneously provides a consistent estimation of these two components in a one-step procedure. Moreover, in the presence of microstructure noises and asynchronous trading, the covariance estimator is guaranteed to be positive definite with the optimal rate of convergence. Taking into account the serial dependent feature of financial data, we further provide a data-driven algorithm to select the optimal tuning parameters in practice. We apply the proposed estimator to vast portfolio allocations, which enjoy significantly enhanced out-of-sample portfolio risk and Sharpe ratios. The success of our approach helps justify the role that machine learning techniques play in finance.
Large covariance matrix estimation via penalized log-det heuristics

Enrico Bernardi,Matteo Farnè

DOI: https://doi.org/10.48550/arXiv.2209.04867

2022-09-11

Abstract:This paper provides a comprehensive estimation framework for large covariance matrices via a log-det heuristics augmented by a nuclear norm plus $l_{1}$ norm penalty. %We develop the model framework, which includes high-dimensional approximate factor models with a sparse residual covariance. The underlying assumptions allow for non-pervasive latent eigenvalues and a prominent residual covariance pattern. We prove that the aforementioned log-det heuristics is locally convex with a Lipschitz-continuous gradient, so that a proximal gradient algorithm may be stated to numerically solve the problem while controlling the threshold parameters. The proposed optimization strategy recovers with high probability both the covariance matrix components and the latent rank and the residual sparsity pattern, and performs systematically not worse than the corresponding estimators employing Frobenius loss in place of the log-det heuristics. The error bounds for the ensuing low rank and sparse covariance matrix estimators are established, and the identifiability condition for the latent geometric manifolds is provided. The validity of outlined results is highlighted by means of an exhaustive simulation study and a real financial data example involving euro zone banks.

Statistics Theory,Methodology
Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

Tomer Lancewicki

DOI: https://doi.org/10.48550/arXiv.1707.06156

2017-07-19

Computation

Abstract:The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.
Shrinkage MMSE estimators of covariances beyond the zero-mean and stationary variance assumptions

Olivier Flasseur,Eric Thiébaut,Loïc Denis,Maud Langlois

2024-06-27

Abstract:We tackle covariance estimation in low-sample scenarios, employing a structured covariance matrix with shrinkage methods. These involve convexly combining a low-bias/high-variance empirical estimate with a biased regularization estimator, striking a bias-variance trade-off. Literature provides optimal settings of the regularization amount through risk minimization between the true covariance and its shrunk counterpart. Such estimators were derived for zero-mean statistics with i.i.d. diagonal regularization matrices accounting for the average sample variance solely. We extend these results to regularization matrices accounting for the sample variances both for centered and non-centered samples. In the latter case, the empirical estimate of the true mean is incorporated into our shrinkage estimators. Introducing confidence weights into the statistics also enhance estimator robustness against outliers. We compare our estimators to other shrinkage methods both on numerical simulations and on real data to solve a detection problem in astronomy.

Instrumentation and Methods for Astrophysics,Methodology
Model-based Clustering with Sparse Covariance Matrices

Michael Fop,Thomas Brendan Murphy,Luca Scrucca

DOI: https://doi.org/10.48550/arXiv.1711.07748

2018-09-23

Abstract:Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality.

Methodology,Computation
Sparse factor models of high dimension

Benjamin Poignard,Yoshikazu Terada

2023-07-12

Abstract:We consider the estimation of factor model-based variance-covariance matrix when the factor loading matrix is assumed sparse. To do so, we rely on a system of penalized estimating functions to account for the identification issue of the factor loading matrix while fostering sparsity in potentially all its entries. We prove the oracle property of the penalized estimator for the factor model when the dimension is fixed. That is, the penalization procedure can recover the true sparse support, and the estimator is asymptotically normally distributed. Consistency and recovery of the true zero entries are established when the number of parameters is diverging. These theoretical results are supported by simulation experiments, and the relevance of the proposed method is illustrated by an application to portfolio allocation.

Statistics Theory
Fast Covariance Estimation for Sparse Functional Data

Luo Xiao,Cai Li,William Checkley,Ciprian M. Crainiceanu

DOI: https://doi.org/10.48550/arXiv.1603.05758

2017-04-06

Abstract:Smoothing of noisy sample covariances is an important component in functional data analysis. We propose a novel covariance smoothing method based on penalized splines and associated software. The proposed method is a bivariate spline smoother that is designed for covariance smoothing and can be used for sparse functional or longitudinal data. We propose a fast algorithm for covariance smoothing using leave-one-subject-out cross validation. Our simulations show that the proposed method compares favorably against several commonly used methods. The method is applied to a study of child growth led by one of coauthors and to a public dataset of longitudinal CD4 counts.

Methodology
Sparse covariance estimation in logit mixture models

Youssef M Aboutaleb,Mazen Danaf,Yifei Xie,Moshe E Ben-Akiva

DOI: https://doi.org/10.1093/ectj/utab008

2021-03-19

Abstract:Summary This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models. Researchers typically specify covariance matrices in logit mixture models under one of two extreme assumptions: either an unrestricted full covariance matrix (allowing correlations between all random coefficients), or a restricted diagonal matrix (allowing no correlations at all). Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances. We propose a new estimator, called MISC (mixed integer sparse covariance), that uses a mixed-integer optimization (MIO) program to find an optimal block diagonal structure specification for the covariance matrix, corresponding to subsets of correlated coefficients, for any desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws from the unrestricted full covariance matrix. The optimal sparsity level of the covariance matrix is determined using out-of-sample validation. We demonstrate the ability of MISC to correctly recover the true covariance structure from synthetic data. In an empirical illustration using a stated preference survey on modes of transportation, we use MISC to obtain a sparse covariance matrix indicating how preferences for attributes are related to one another.
Entropic covariance models

Piotr Zwiernik

2024-05-08

Abstract:In covariance matrix estimation, one of the challenges lies in finding a suitable model and an efficient estimation method. Two commonly used modelling approaches in the literature involve imposing linear restrictions on the covariance matrix or its inverse. Another approach considers linear restrictions on the matrix logarithm of the covariance matrix. In this paper, we present a general framework for linear restrictions on different transformations of the covariance matrix, including the mentioned examples. Our proposed estimation method solves a convex problem and yields an $M$-estimator, allowing for relatively straightforward asymptotic (in general) and finite sample analysis (in the Gaussian case). In particular, we recover standard $\sqrt{n/d}$ rates, where $d$ is the dimension of the underlying model. Our geometric insights allow to extend various recent results in covariance matrix modelling. This includes providing unrestricted parametrizations of the space of correlation matrices, which is alternative to a recent result utilizing the matrix logarithm.

Statistics Theory,Machine Learning
Minimax estimation of functionals in sparse vector model with correlated observations

Yuhao Wang,Pengkun Yang,Alexandre B. Tsybakov

2024-07-20

Abstract:We consider the observations of an unknown $s$-sparse vector ${\boldsymbol \theta}$ corrupted by Gaussian noise with zero mean and unknown covariance matrix ${\boldsymbol \Sigma}$. We propose minimax optimal methods of estimating the $\ell_2$ norm of ${\boldsymbol \theta}$ and testing the hypothesis $H_0: {\boldsymbol \theta}=0$ against sparse alternatives when only partial information about ${\boldsymbol \Sigma}$ is available, such as an upper bound on its Frobenius norm and the values of its diagonal entries to within an unknown scaling factor. We show that the minimax rates of the estimation and testing are leveraged not by the dimension of the problem but by the value of the Frobenius norm of ${\boldsymbol \Sigma}$.

Statistics Theory
Fast covariance estimation for multivariate sparse functional data

Cai Li,Luo Xiao,Sheng Luo

DOI: https://doi.org/10.1002/sta4.245

2020-01-01

Stat

Abstract:<p>Covariance estimation is essential yet underdeveloped for analysing multivariate functional data. We propose a fast covariance estimation method for multivariate sparse functional data using bivariate penalized splines. The tensor‐product B‐spline formulation of the proposed method enables a simple spectral decomposition of the associated covariance operator and explicit expressions of the resulting eigenfunctions as linear combinations of B‐spline bases, thereby dramatically facilitating subsequent principal component analysis. We derive a fast algorithm for selecting the smoothing parameters in covariance smoothing using leave‐one‐subject‐out cross‐validation. The method is evaluated with extensive numerical studies and applied to an Alzheimer's disease study with multiple longitudinal outcomes.</p>

statistics & probability
Covariance-Free Sparse Bayesian Learning

Alexander Lin,Andrew H. Song,Berkin Bilgic,Demba Ba

DOI: https://doi.org/10.1109/TSP.2022.3186185

2022-04-08

Abstract:Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.

Signal Processing,Machine Learning
Covariance Matrix Estimation Via Network Structure

Wei Lan,Zheng Fang,Hansheng Wang,Chih-Ling Tsai

DOI: https://doi.org/10.1080/07350015.2016.1173558

2016-01-01

Journal of Business and Economic Statistics

Abstract:In this article, we employ a regression formulation to estimate the high-dimensional covariance matrix for a given network structure. Using prior information contained in the network relationships, we model the covariance as a polynomial function of the symmetric adjacency matrix. Accordingly, the problem of estimating a high-dimensional covariance matrix is converted to one of estimating low dimensional coefficients of the polynomial regression function, which we can accomplish using ordinary least squares or maximum likelihood. The resulting covariance matrix estimator based on the maximum likelihood approach is guaranteed to be positive definite even in finite samples. Under mild conditions, we obtain the theoretical properties of the resulting estimators. A Bayesian information criterion is also developed to select the order of the polynomial function. Simulation studies and empirical examples illustrate the usefulness of the proposed methods.
Two new algorithms for maximum likelihood estimation of sparse covariance matrices with applications to graphical modeling

Ghania Fatima,Prabhu Babu,Petre Stoica

2023-05-11

Abstract:In this paper, we propose two new algorithms for maximum-likelihood estimation (MLE) of high dimensional sparse covariance matrices. Unlike most of the state of-the-art methods, which either use regularization techniques or penalize the likelihood to impose sparsity, we solve the MLE problem based on an estimated covariance graph. More specifically, we propose a two-stage procedure: in the first stage, we determine the sparsity pattern of the target covariance matrix (in other words the marginal independence in the covariance graph under a Gaussian graphical model) using the multiple hypothesis testing method of false discovery rate (FDR), and in the second stage we use either a block coordinate descent approach to estimate the non-zero values or a proximal distance approach that penalizes the distance between the estimated covariance graph and the target covariance matrix. Doing so gives rise to two different methods, each with its own advantage: the coordinate descent approach does not require tuning of any hyper-parameters, whereas the proximal distance approach is computationally fast but requires a careful tuning of the penalty parameter. Both methods are effective even in cases where the number of observed samples is less than the dimension of the data. For performance evaluation, we test the proposed methods on both simulated and real-world data and show that they provide more accurate estimates of the sparse covariance matrix than two state-of-the-art methods.

Methodology,Signal Processing

Mutation frequency and type during ageing in mouse seminiferous tubules

Sparse permutation invariant covariance estimation

Towards a sparse, scalable, and stably positive definite (inverse) covariance estimator

Penalized Sparse Covariance Regression with High Dimensional Covariates

Nonparametric estimation of large covariance matrices with conditional sparsity

Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation

Covariance Structure Estimation with Laplace Approximation

Large-Dimensional Positive Definite Covariance Estimation for High Frequency Data via Low-rank and Sparse Matrix Decomposition

Large covariance matrix estimation via penalized log-det heuristics

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

Shrinkage MMSE estimators of covariances beyond the zero-mean and stationary variance assumptions

Model-based Clustering with Sparse Covariance Matrices

Sparse factor models of high dimension

Fast Covariance Estimation for Sparse Functional Data

Sparse covariance estimation in logit mixture models

Entropic covariance models

Minimax estimation of functionals in sparse vector model with correlated observations

Fast covariance estimation for multivariate sparse functional data

Covariance-Free Sparse Bayesian Learning

Covariance Matrix Estimation Via Network Structure

Two new algorithms for maximum likelihood estimation of sparse covariance matrices with applications to graphical modeling