High-dimensional regression with a count response

Or Zilberman,Felix Abramovich
2024-09-13
Abstract:We consider high-dimensional regression with a count response modeled by Poisson or negative binomial generalized linear model (GLM). We propose a penalized maximum likelihood estimator with a properly chosen complexity penalty and establish its adaptive minimaxity across models of various sparsity. To make the procedure computationally feasible for high-dimensional data we consider its LASSO and SLOPE convex surrogates. Their performance is illustrated through simulated and real-data examples.
Methodology,Statistics Theory
What problem does this paper attempt to address?
This paper aims to solve the high - dimensional count - response regression problem. Specifically, the paper focuses on how to conduct effective regression analysis when the response variable is count data (for example, the number of times an event occurs) and the number of features (\(d\)) is much larger than the sample size (\(n\)). In this case, traditional regression methods are often difficult to be directly applied because the "curse of dimensionality" problem brought by high - dimensional data will lead to model over - fitting or unacceptable computational complexity. To solve this problem, the author proposes several methods: 1. **Penalized maximum likelihood estimation**: Select the model by introducing an appropriate complexity penalty term to achieve adaptive minimaxity. This method can achieve the optimal Kullback - Leibler risk among models with different degrees of sparsity. 2. **Convex optimization alternatives**: To make the above method computationally feasible in high - dimensional data, the author considers two convex optimization alternatives, LASSO and SLOPE. These methods transform the non - convex model selection problem into a convex optimization problem, making the problem easier to solve. - **LASSO**: Using the L1 - norm as a penalty term, sparse solutions can be obtained, that is, some regression coefficients are zero. - **SLOPE**: Using the ordered L1 - norm as a penalty term can better handle the situation of correlated predictor variables and usually performs better in dense models. 3. **Theoretical and empirical analysis**: The paper not only provides theoretical guarantees but also demonstrates the effectiveness of the proposed methods through simulation experiments and real - data examples. In particular, SLOPE performs well in handling correlated predictor variables and dense models, while LASSO may be too sparse in some cases. In summary, the main contribution of this paper is to provide a method for effectively selecting models in high - dimensional count - data regression and verify its effectiveness through theoretical analysis and empirical research.