HighDimMixedModels.jl: Robust High Dimensional Mixed Models across Omics Data

Evan Gorstein,Rosa Aghdam,Claudia Solis-Lemus
DOI: https://doi.org/10.1101/2024.05.09.593305
2024-05-10
Abstract:High dimensional mixed-effect models are an increasingly important form of regression in modern biology, in which the number of variables often matches or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate gradient descent (CGD) algorithm that lacks guarantees of convergence to a global optimum. Here, we study empirically the behavior of the algorithm across a number of common study types in modern omics datatypes. In particular, we study the empirical performance of high dimensional mixed-effect models fit to data simulated to mimic the features of transcriptome, genome-wide association, and microbiome data. In addition, we study the performance of the model on real data from each of these study types. To facilitate these simulations, we implement the algorithm in an open source Julia package HighDimMixedModels.jl. We compare the performance of two commonly used penalties, namely LASSO and SCAD, within the HighDimMixedModels.jl framework. Our results demonstrate that the SCAD penalty consistently outperforms LASSO in terms of both variable selection and estimation accuracy across omics data. Through our comprehensive analysis, we illuminate the intricate relationship between algorithmic behavior, penalty selection, and dataset properties such as the correlation structure among features, providing valuable insights for researchers employing high dimensional mixed-effect models in biological investigations.
Systems Biology
What problem does this paper attempt to address?