GGLasso -- a Python package for General Graphical Lasso computation

Fabian Schaipp,Christian L. Müller,Oleg Vlasovets
DOI: https://doi.org/10.48550/arXiv.2110.10521
2021-10-20
Abstract:We introduce GGLasso, a Python package for solving General Graphical Lasso problems. The Graphical Lasso scheme, introduced by (Friedman 2007) (see also (Yuan 2007; Banerjee 2008)), estimates a sparse inverse covariance matrix $\Theta$ from multivariate Gaussian data $\mathcal{X} \sim \mathcal{N}(\mu, \Sigma) \in \mathbb{R}^p$. Originally proposed by (Dempster 1972) under the name Covariance Selection, this estimation framework has been extended to include latent variables in (Chandrasekaran 2012). Recent extensions also include the joint estimation of multiple inverse covariance matrices, see, e.g., in (Danaher 2013; Tomasi 2018). The GGLasso package contains methods for solving a general problem formulation, including important special cases, such as, the single (latent variable) Graphical Lasso, the Group, and the Fused Graphical Lasso.
Computation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the current lack of a Python package capable of handling the General Graphical Lasso (GGL) problem. Specifically, existing tools have limitations when solving specific types of graphical lasso problems. For example: 1. **Single Graphical Lasso (SGL)**: Although `scikit - learn` and `skggm` can solve the standard SGL problem, their functionality is limited. 2. **Graphical Lasso with latent variables**: Existing tools such as `regain` can handle some extended cases, but are still not comprehensive. 3. **Group Graphical Lasso (GGL) and Fused Graphical Lasso (FGL)**: These more complex problems are not well - supported in existing tools. To solve these problems, the author developed GGLasso, a Python package specifically designed to solve the General Graphical Lasso problem. The main contributions of GGLasso include: - Providing a unified framework to solve various graphical lasso problems, including SGL, GGL, FGL and their versions with latent variables. - Implementing multiple optimization algorithms, such as ADMM, PPDNA and Block - ADMM, to efficiently solve different types of graphical lasso problems. - Supporting non - consistent GGL problems, that is, the situation where some variables may be missing in different instances. Through these improvements, GGLasso aims to fill the gaps in existing tools and provide a more comprehensive and efficient solution, especially when dealing with complex data structures and latent variables. ### Formula Summary The mathematical form of the General Graphical Lasso problem mentioned in the paper is as follows: \[ \min_{\Theta, L \in S_+^K} \left( -\sum_{k = 1}^K \log \det(\Theta^{(k)} - L^{(k)}) + \langle S^{(k)}, \Theta^{(k)} - L^{(k)} \rangle \right)+ P(\Theta) + \sum_{k = 1}^K \mu_{1,k} \|L^{(k)}\|_\star \] where: - \( S_+^K \) represents the K - fold product of the space of symmetric positive definite matrices. - \( \Theta = (\Theta^{(1)}, \ldots, \Theta^{(K)}) \) is the sparse part of the inverse covariance matrix. - \( L = (L^{(1)}, \ldots, L^{(K)}) \) is the low - rank component formed by latent variables. - \( P(\Theta) \) is a regularization function used to induce the desired sparse structure. - \( \|\cdot\|_\star \) represents the nuclear norm. ### Special Cases 1. **Single Graphical Lasso (SGL)**: \[ P(\Theta) = \lambda_1 \sum_{i \neq j} |\Theta_{ij}| \] 2. **Group Graphical Lasso (GGL)**: \[ P(\Theta) = \lambda_1 \sum_{k = 1}^K \sum_{i \neq j} |\Theta_{ij}^{(k)}|+ \lambda_2 \sum_{i \neq j} \left( \sum_{k = 1}^K |\Theta_{ij}^{(k)}|^2 \right)^{1/2} \] 3. **Fused Graphical Lasso (FGL)**: \[ P(\Theta) = \lambda_1 \sum_{k = 1}^K \sum_{i \neq j} |\Theta_{ij}^{(k)}|+ \lambda_2 \sum_{k = 2}^K \