Algorithm xxx: A Covariate-Dependent Approach to Gaussian Graphical Modeling in R

Jacob Helwig,Sutanoy Dasgupta,Peng Zhao,Bani K. Mallick,Debdeep Pati
DOI: https://doi.org/10.1145/3659206
IF: 2.464
2024-04-30
ACM Transactions on Mathematical Software
Abstract:Graphical models are used to capture complex multivariate relationships and have applications in diverse disciplines such as in biology, physics, and economics. Within this field, Gaussian graphical models aim to identify the pairs of variables whose dependence is maintained even after conditioning on the remaining variables in the data, known as the conditional dependence structure of the data. There are many existing software packages for Gaussian graphical modeling, however, they often make restrictive assumptions that reduce their flexibility for modeling data that are not identically distributed. Conversely, covdepGE is a R implementation of a variational weighted pseudo-likelihood algorithm for modeling the conditional dependence structure as a continuous function of an extraneous covariate. To build on the efficiency of this algorithm, covdepGE leverages parallelism and C++ integration with R. Additionally, covdepGE provides fully-automated and data-driven hyperparameter specification while maintaining flexibility for the user to decide key components of the estimation procedure. Through an extensive simulation study spanning diverse settings, covdepGE is demonstrated to be top of its class in recovering the ground-truth conditional dependence structure while efficiently managing computational overhead.
computer science, software engineering,mathematics, applied
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively model the Conditional Dependence Structure (CDS) in Gaussian graphical models, especially when the data is not homogeneously distributed. Many existing Gaussian graphical model software packages usually assume that the precision matrix is homogeneous throughout the data set, which limits their flexibility in dealing with non - homogeneous data. However, the `covdepGE` package proposed in this paper improves the flexibility and adaptability of the model by introducing an extraneous covariate, allowing the precision matrix to change as a continuous function of this covariate. Specifically, this paper addresses the following key issues: 1. **Modeling non - homogeneous data**: Existing Gaussian graphical model methods usually assume that the data is homogeneous, that is, all observations share the same precision matrix. However, actual data is often non - homogeneous, and the conditional dependence relationships between different observations may be different. `covdepGE` allows the precision matrix to change with the change of this covariate by introducing an extraneous covariate, thus better modeling non - homogeneous data. 2. **Efficiency improvement**: To improve computational efficiency, `covdepGE` introduces the variational weighted pseudo - likelihood algorithm and accelerates the inference process through parallel computing and C++ integration. In addition, `covdepGE` also provides an automated hyperparameter selection method, reducing the burden on users. 3. **Automation and flexibility**: `covdepGE` provides a fully automated hyperparameter selection method and also allows users to manually specify key modeling components as needed. This design not only facilitates user use but also maintains the flexibility of the model. 4. **Performance verification**: Through extensive simulation studies, `covdepGE` performs excellently in recovering the true conditional dependence structure while effectively managing the computational overhead. These simulation studies cover a variety of different settings, including data and covariates of different dimensions, and different types of functional relationships. In summary, the main objective of this paper is to provide an efficient and flexible tool for modeling the conditional dependence structure of non - homogeneous data and prove its superior performance in various settings through experiments.