Abstract:The EM algorithm is a popular tool for maximum likelihood estimation but has not been used much for high-dimensional regularization problems in linear mixed-effects models. In this paper, we introduce the EMLMLasso algorithm, which combines the EM algorithm and the popular and efficient R package glmnet for Lasso variable selection of fixed effects in linear mixed-effects models. We compare the performance of our proposed EMLMLasso algorithm with the one implemented in the well-known R package glmmLasso through the analyses of both simulated and real-world applications. The simulations and applications demonstrated good properties, such as consistency, and the effectiveness of the proposed variable selection procedure, for both $p < n$ and $p > n$. Moreover, in all evaluated scenarios, the EMLMLasso algorithm outperformed glmmLasso. The proposed method is quite general and can be easily extended for ridge and elastic net penalties in linear mixed-effects models.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the regularization problem in high-dimensional linear mixed-effects models (LMM). Specifically, it focuses on how to effectively select fixed effect variables when the number of predictors $ p $ is greater than the number of observations $ n $. This issue is very common in many practical applications, especially in fields such as genetics, health, finance, ecology, and image processing. Although some statistical methods have been proposed for variable selection, selecting fixed effects in the context of high-dimensional data remains a challenge. ### Background and Motivation 1. **Linear Mixed-Effects Models (LMM)**: - LMM is a class of statistical models used to describe the relationship between response variables and covariates, particularly suitable for clustered or longitudinal data. - With the increase in data volume, LMM has become increasingly important in many fields. 2. **High-Dimensional Data**: - When the number of predictors $ p $ is much greater than the number of observations $ n $, it is referred to as the high-dimensional variable selection problem. - Even with the continuous advancement of computational, statistical, and technological tools, selecting fixed effects in high-dimensional data remains a difficult problem. 3. **Existing Methods**: - Some existing methods include penalized maximum likelihood estimation (PML) based on L1 penalty, which have been applied in some studies. - However, the performance of these methods in high-dimensional data is not always satisfactory. ### Proposed Method The authors propose the EMLMLasso algorithm, which combines the EM algorithm and the Lasso variable selection method from the R package `glmnet`, for selecting fixed effects in high-dimensional linear mixed-effects models. The specific steps are as follows: 1. **Initialization**: - Set initial parameter values, including fixed effect coefficients $ \beta $, random effect variance $ \sigma^2 $, and random effect covariance matrix $ D $. 2. **E Step**: - Calculate the conditional expectation of the complete data log-likelihood function, considering the current parameter estimates. 3. **M Step**: - Update the parameter values by maximizing the conditional expectation function. 4. **Tuning Parameter Selection**: - Use the Bayesian Information Criterion (BIC) to select the optimal tuning parameter $ \lambda $. ### Experimental Results 1. **Simulation Experiments**: - The authors validated the effectiveness of the EMLMLasso algorithm through simulation experiments and compared it with the existing glmmLasso algorithm. - The results show that under different scenarios, the EMLMLasso algorithm outperforms the glmmLasso algorithm in terms of variable selection and parameter estimation. 2. **Real Data Applications**: - The authors applied the EMLMLasso algorithm to two real datasets: the Framingham cholesterol data and the riboflavin production gene data. - The results indicate that the EMLMLasso algorithm performs well on both datasets. ### Conclusion This paper proposes a new EMLMLasso algorithm that combines the EM algorithm and Lasso penalty to effectively solve the problem of fixed effect selection in high-dimensional linear mixed-effects models. Through simulation experiments and real data applications, the superior performance of this algorithm in the context of high-dimensional data is demonstrated.

The use of the EM algorithm for regularization problems in high-dimensional linear mixed-effects models

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

Regularized EM Algorithms: A Unified Framework and Statistical Guarantees

Regularized EM algorithm

Accelerating L1-penalized expectation maximization algorithm for latent variable selection in multidimensional two-parameter logistic models

Bayesian high-dimensional covariate selection in non-linear mixed-effects models using the SAEM algorithm

A Relaxation Approach to Feature Selection for Linear Mixed Effects Models

HighDimMixedModels.jl: Robust High Dimensional Mixed Models across Omics Data

A globally convergent algorithm for lasso-penalized mixture of linear regression models

Leveraging independence in high-dimensional mixed linear regression

Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm

AN EXTENDED LINEARIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS FOR FUSED-LASSO PENALIZED LINEAR REGRESSION

Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference

Linear mixed-effects models for the analysis of high-density electromyography with application to diabetic peripheral neuropathy

Using Regularization to Identify Measurement Bias Across Multiple Background Characteristics: A Penalized Expectation–Maximization Algorithm

Variable Selection for Linear Mixed Effects Model Via Penalization Approaches

Efficient computation of high-dimensional penalized generalized linear mixed models by latent factor modeling of the random effects

Variable Selection for High-dimensional Generalized Linear Models using an Iterated Conditional Modes/Medians Algorithm

Elastic Net Procedure for Partially Linear Models

Nonnegative Adaptive Lasso for Ultra-High Dimensional Regression Models and A Two-Stage Method Applied in Financial Modeling

Fast Implementation for Normal Mixed Effects Models With Censored Response.