Penalized robust estimators in logistic regression with applications to sparse models

Ana M. Bianco,Graciela Boente,Gonzalo Chebi
DOI: https://doi.org/10.48550/arXiv.1911.00554
2020-02-13
Abstract:Sparse covariates are frequent in classification and regression problems and in these settings the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of non--zero parameters and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted $M-$type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalizations functions and we introduce the so--called Sign penalization. This new penalty has the advantage that it depends only on one penalty parameter, avoiding arbitrary tuning constants. We discuss the variable selection capability of the given proposals as well as their asymptotic behaviour. Through a numerical study, we compare the finite sample performance of the proposal corresponding to different penalized estimators either robust or classical, under different scenarios. A robust cross--validation criterion is also presented. The analysis of two real data sets enables to investigate the stability of the penalized estimators to the presence of outliers.
Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to achieve sparse and robust estimation in the logistic regression model. Specifically, the author focuses on how to select variables by introducing penalty terms and at the same time maintain the robustness of the model in the presence of outliers. To achieve this goal, the author proposes a new class of penalized weighted M - type estimators. These estimators can stably handle abnormal data and achieve automatic variable selection through different penalty functions (such as Sign penalty). ### Main research questions: 1. **Variable selection in sparse models**: In sparse regression models, how to effectively select important predictor variables while excluding irrelevant variables. 2. **Robustness**: In data with outliers, how to ensure the robustness of the estimator, that is, the estimation result is not affected by outliers. 3. **Selection of penalty functions**: How to select appropriate penalty functions to achieve a balance between sparsity and robustness. ### Specific methods: - **Penalized weighted M - type estimators**: The author introduces a new class of estimators. These estimators combine loss functions and weight functions to control influence points and generate sparse estimates through penalty terms. - **Sign penalty**: A new penalty function - Sign penalty is proposed. This penalty function only depends on one penalty parameter, avoiding the problem of arbitrarily adjusting constants. The Sign penalty is similar to the LASSO penalty in direction, but does not compress the estimated coefficients to zero. - **Theoretical analysis**: The author theoretically proves the consistency and convergence rate of the proposed estimators and discusses the properties of variable selection. ### Research background: - **Sparse models**: In many classification and regression problems, the number of actually useful predictor variables is often far less than the number of measured covariates. Therefore, sparse models are easier to interpret. - **Limitations of traditional methods**: Traditional maximum likelihood estimation is prone to over - fitting under the assumptions of multicollinearity or sparse models and is sensitive to outliers. - **Existing methods**: There are already some methods (such as Ridge regression, LASSO, Elastic Net, etc.) for dealing with these problems, but they still have deficiencies in robustness and variable selection. ### Contributions of the paper: - **New penalty function**: The Sign penalty is proposed, which is a new penalty function with simple and effective characteristics. - **Robustness**: By introducing loss functions and weight functions, it is ensured that the estimator can still remain robust in the presence of outliers. - **Theoretical basis**: A strict theoretical basis is provided, proving the consistency and asymptotic properties of the estimator, especially in the case of fixed covariate dimensions. ### Conclusion: This paper solves the key problems of sparsity and robustness in the logistic regression model by introducing new penalized weighted M - type estimators, providing a powerful tool for practical applications.