Abstract:We consider high-dimensional regression with a count response modeled by Poisson or negative binomial generalized linear model (GLM). We propose a penalized maximum likelihood estimator with a properly chosen complexity penalty and establish its adaptive minimaxity across models of various sparsity. To make the procedure computationally feasible for high-dimensional data we consider its LASSO and SLOPE convex surrogates. Their performance is illustrated through simulated and real-data examples.

What problem does this paper attempt to address?

This paper aims to solve the high - dimensional count - response regression problem. Specifically, the paper focuses on how to conduct effective regression analysis when the response variable is count data (for example, the number of times an event occurs) and the number of features (\(d\)) is much larger than the sample size (\(n\)). In this case, traditional regression methods are often difficult to be directly applied because the "curse of dimensionality" problem brought by high - dimensional data will lead to model over - fitting or unacceptable computational complexity. To solve this problem, the author proposes several methods: 1. **Penalized maximum likelihood estimation**: Select the model by introducing an appropriate complexity penalty term to achieve adaptive minimaxity. This method can achieve the optimal Kullback - Leibler risk among models with different degrees of sparsity. 2. **Convex optimization alternatives**: To make the above method computationally feasible in high - dimensional data, the author considers two convex optimization alternatives, LASSO and SLOPE. These methods transform the non - convex model selection problem into a convex optimization problem, making the problem easier to solve. - **LASSO**: Using the L1 - norm as a penalty term, sparse solutions can be obtained, that is, some regression coefficients are zero. - **SLOPE**: Using the ordered L1 - norm as a penalty term can better handle the situation of correlated predictor variables and usually performs better in dense models. 3. **Theoretical and empirical analysis**: The paper not only provides theoretical guarantees but also demonstrates the effectiveness of the proposed methods through simulation experiments and real - data examples. In particular, SLOPE performs well in handling correlated predictor variables and dense models, while LASSO may be too sparse in some cases. In summary, the main contribution of this paper is to provide a method for effectively selecting models in high - dimensional count - data regression and verify its effectiveness through theoretical analysis and empirical research.

High-dimensional regression with a count response

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

High-dimensional prediction for count response via sparse exponential weights

Penalized Independence Rule for Testing High-Dimensional Hypotheses

High-dimensional classification by sparse logistic regression

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

High-Dimensional Censored Regression via the Penalized Tobit Likelihood

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Adaptive Lasso and group-Lasso for functional Poisson regression

Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models

Ultra high-dimensional semiparametric longitudinal data analysis

High-Dimensional Sparse Additive Hazards Regression

A Flexible Regression Model for Count Data

Penalized Sparse Covariance Regression with High Dimensional Covariates

High-dimensional generalized linear models and the lasso

Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data

High Dimensional Logistic Regression Under Network Dependence

L2RM: Low-Rank Linear Regression Models for High-Dimensional Matrix Responses

Beta regression for double‐bounded response with correlated high‐dimensional covariates

Statistical Inference in High-dimensional Poisson Regression with Applications to Mediation Analysis