Abstract:Sparse covariates are frequent in classification and regression problems and in these settings the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of non--zero parameters and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted $M-$type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalizations functions and we introduce the so--called Sign penalization. This new penalty has the advantage that it depends only on one penalty parameter, avoiding arbitrary tuning constants. We discuss the variable selection capability of the given proposals as well as their asymptotic behaviour. Through a numerical study, we compare the finite sample performance of the proposal corresponding to different penalized estimators either robust or classical, under different scenarios. A robust cross--validation criterion is also presented. The analysis of two real data sets enables to investigate the stability of the penalized estimators to the presence of outliers.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to achieve sparse and robust estimation in the logistic regression model. Specifically, the author focuses on how to select variables by introducing penalty terms and at the same time maintain the robustness of the model in the presence of outliers. To achieve this goal, the author proposes a new class of penalized weighted M - type estimators. These estimators can stably handle abnormal data and achieve automatic variable selection through different penalty functions (such as Sign penalty). ### Main research questions: 1. **Variable selection in sparse models**: In sparse regression models, how to effectively select important predictor variables while excluding irrelevant variables. 2. **Robustness**: In data with outliers, how to ensure the robustness of the estimator, that is, the estimation result is not affected by outliers. 3. **Selection of penalty functions**: How to select appropriate penalty functions to achieve a balance between sparsity and robustness. ### Specific methods: - **Penalized weighted M - type estimators**: The author introduces a new class of estimators. These estimators combine loss functions and weight functions to control influence points and generate sparse estimates through penalty terms. - **Sign penalty**: A new penalty function - Sign penalty is proposed. This penalty function only depends on one penalty parameter, avoiding the problem of arbitrarily adjusting constants. The Sign penalty is similar to the LASSO penalty in direction, but does not compress the estimated coefficients to zero. - **Theoretical analysis**: The author theoretically proves the consistency and convergence rate of the proposed estimators and discusses the properties of variable selection. ### Research background: - **Sparse models**: In many classification and regression problems, the number of actually useful predictor variables is often far less than the number of measured covariates. Therefore, sparse models are easier to interpret. - **Limitations of traditional methods**: Traditional maximum likelihood estimation is prone to over - fitting under the assumptions of multicollinearity or sparse models and is sensitive to outliers. - **Existing methods**: There are already some methods (such as Ridge regression, LASSO, Elastic Net, etc.) for dealing with these problems, but they still have deficiencies in robustness and variable selection. ### Contributions of the paper: - **New penalty function**: The Sign penalty is proposed, which is a new penalty function with simple and effective characteristics. - **Robustness**: By introducing loss functions and weight functions, it is ensured that the estimator can still remain robust in the presence of outliers. - **Theoretical basis**: A strict theoretical basis is provided, proving the consistency and asymptotic properties of the estimator, especially in the case of fixed covariate dimensions. ### Conclusion: This paper solves the key problems of sparsity and robustness in the logistic regression model by introducing new penalized weighted M - type estimators, providing a powerful tool for practical applications.

Penalized robust estimators in logistic regression with applications to sparse models

Robust adaptive LASSO in high-dimensional logistic regression

Robust and sparse estimation methods for high dimensional linear and logistic regression

Robust variable selection for partially linear additive models

Weighted Lasso Estimates for Sparse Logistic Regression: Non-Asymptotic Properties with Measurement Errors

High-dimensional classification by sparse logistic regression

Variable Selection with Exponential Weights and $l_0$-Penalization

Robust estimation for functional logistic regression models

Penalized polytomous ordinal logistic regression using cumulative logits. Application to network inference of zero-inflated variables

The Trimmed Lasso: Sparsity and Robustness

Robust Estimation and Outlier Detection for Varying-Coefficient Models Via Penalized Regression

Robust and sparse estimators for linear regression models

Robust Bayesian nonparametric variable selection for linear regression

Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model

Penalized Sparse Covariance Regression with High Dimensional Covariates

Robust Variable Selection Via Nonconcave Penalties with an Upgraded Parsimonious Dynamic Covariance Modeling

Sparse inference in Poisson Log-Normal model by approximating the L0-norm

Sparse Poisson Regression with Penalized Weighted Score Function

On Regularized Sparse Logistic Regression

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates