Abstract:Mixed-effect models are very popular for analyzing data with a hierarchical structure, e.g. repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed effect covariates collected from each patient can be quite large, e.g. data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed effect variables in linear mixed-effects models along with maximum penalized likelihood estimation of both fixed and random effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high-dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model set-up (like finite mixture of regressions etc.) illustrating the huge range of applicability of our proposal.

What problem does this paper attempt to address?

This paper attempts to solve the problem of fixed - effect variable selection in linear mixed - effect models in high - dimensional datasets. Specifically, with the progress of medical research, a large number of fixed - effect covariates (such as gene expression data) can be collected for each patient, but not all of these variables are important for the research results. Therefore, the correct selection of relevant covariates is crucial for obtaining the best inference in the overall research. However, most of the existing methods are limited to the classical low - dimensional setting (i.e., the sample size is larger than the number of parameters) and perform poorly in modern high - dimensional datasets (the number of parameters is much larger than the sample size). To solve this problem, the paper proposes a regularization selection method using general non - concave penalty functions to estimate both fixed - effect and random - effect parameters simultaneously. This method is applicable not only to the classical low - dimensional case but also to high - dimensional datasets. Through this method, the paper proves the consistency of the maximum penalized likelihood estimators (MPLEs) and the oracle property of variable selection in both low - and high - dimensional cases, and provides a computationally efficient algorithm to achieve this goal. ### Main contributions 1. **Asymptotic theory of general non - convex loss functions and non - concave penalty functions**: - The paper provides the asymptotic theory of maximum penalized likelihood estimation under general non - convex loss functions and non - concave penalty functions, including the classical low - dimensional case (\(P < n\)) and the high - dimensional case (\(P \gg n\)). This general theory is applicable to a variety of non - standard statistical models, such as finite - mixture regression models, etc., expanding the scope of the existing literature. 2. **Asymptotic distribution of high - dimensional linear mixed - effect models**: - In the high - dimensional case, the paper provides the asymptotic distribution of the maximum penalized likelihood estimator using general non - concave penalty functions, which is not covered in the existing literature. 3. **Application advantages**: - The paper shows through simulation and real - data examples that using the SCAD penalty function in linear mixed - effect models is superior to the traditional L1 penalty function (LASSO) in the selection and estimation of fixed - effect variables. This provides important guidance for practical applications. ### Method overview - **Non - concave penalty functions**: - Use non - concave penalty functions (such as SCAD) to select important fixed - effect variables. Non - concave penalty functions have excellent properties such as unbiasedness, sparsity, and continuity, and can effectively select variables in high - dimensional datasets. - **Maximum penalized likelihood estimation**: - Simultaneously estimate fixed - effect and random - effect parameters by minimizing the negative log - likelihood function containing non - concave penalty terms. Since this is a non - convex optimization problem, the paper proposes some suitable quadratic approximations and iterative algorithms to solve this problem. ### Theoretical results - **Consistency**: - Prove the consistency of the maximum penalized likelihood estimator and the oracle property of variable selection in both low - and high - dimensional cases. - **Asymptotic distribution**: - Provide the asymptotic distribution of the estimators of fixed - effect and random - effect parameters, which is very important for estimating standard errors and variance parameters. ### Application examples - **Simulation and real - data**: - Verify the effectiveness of the proposed method through simulation experiments and real - data, especially in high - dimensional datasets, the performance of the SCAD penalty function is better than that of the L1 penalty function. In summary, this paper solves the problem of fixed - effect variable selection in high - dimensional linear mixed - effect models by introducing non - concave penalty functions and the maximum penalized likelihood estimation method, and provides theoretical support and practical application guidance.

Non-Concave Penalization in Linear Mixed-Effects Models and Regularized Selection of Fixed Effects

Linear mixed effects models for non‐Gaussian continuous repeated measurement data

Fast selection of nonlinear mixed effect models using penalized likelihood

Non-convex Penalized Estimation in High-Dimensional Models with Single-Index Structure

PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK

Two Stage Non-penalized Corrected Least Squares for High Dimensional Linear Models with Measurement error or Missing Covariates

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Robust Variable Selection Via Nonconcave Penalties with an Upgraded Parsimonious Dynamic Covariance Modeling

Variable Selection for Linear Mixed Effects Model Via Penalization Approaches

Mixed‐Effect Hybrid Models for Longitudinal Data with Nonignorable Dropout

Quasi-maximum likelihood estimation and penalized estimation under non-standard conditions

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models

Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

Robust Variable Selection in Linear Mixed Models

Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model

A globally convergent algorithm for lasso-penalized mixture of linear regression models

glmmPen: High Dimensional Penalized Generalized Linear Mixed Models

Penalized asymptotic likelihood approach for linear transformation model selection

Efficient computation of high-dimensional penalized generalized linear mixed models by latent factor modeling of the random effects