A modern maximum-likelihood theory for high-dimensional logistic regression

Pragya Sur,Emmanuel J. Candès
DOI: https://doi.org/10.1073/pnas.1810420116
IF: 11.1
2019-07-01
Proceedings of the National Academy of Sciences
Abstract:Significance Logistic regression is a popular model in statistics and machine learning to fit binary outcomes and assess the statistical significance of explanatory variables. Here, the classical theory of maximum-likelihood (ML) estimation is used by most software packages to produce inference. In the now common setting where the number of explanatory variables is not negligible compared with the sample size, we show that classical theory leads to inferential conclusions that cannot be trusted. We develop a theory that provides expressions for the bias and variance of the ML estimate and characterizes the asymptotic distribution of the likelihood-ratio statistic under some assumptions regarding the distribution of the explanatory variables. This theory can be used to provide valid inference.
What problem does this paper attempt to address?