Consistency of $\ell _{1}$ Penalized Negative Binomial Regressions

Fang Xie,Zhijie Xiao

DOI: https://doi.org/10.48550/arXiv.2002.07441

2020-02-18

Statistics Theory

Abstract:We prove the consistency of the $\ell_1$ penalized negative binomial regression (NBR). A real data application about German health care demand shows that the $\ell_1$ penalized NBR produces a more concise but more accurate model, comparing to the classical NBR.

What problem does this paper attempt to address?

ELASTIC-NET REGULARIZED HIGH-DIMENSIONAL NEGATIVE BINOMIAL REGRESSION: CONSISTENCY AND WEAK SIGNAL DETECTION

Huiming Zhang,Jinzhu Jia

DOI: https://doi.org/10.5705/ss.202019.0315

IF: 1.4

2020-01-01

Statistica Sinica

Abstract:We study sparse negative binomial regression (NBR) for count data by showing non-asymptotic merits of the Elastic-net estimator. Two types of oracle inequalities are derived for the Elastic-net estimates of NBR by utilizing Compatibility Factor or Stabil Condition. The second-type oracle inequality is for random design which can be extended to many $\ell_1 + \ell_2$ regularized M-estimation with the corresponding empirical process having stochastic Lipschitz properties. To show some high probability events, we derive concentration inequality for suprema empirical processes for the weighted sum of negative binomial variables. For applications, we show the sign consistency provided that the non-zero components in sparse true vector are larger than a proper choice of the weakest signal detection threshold; and the second application is that we show the grouping effect inequality with high probability; thirdly, under some assumptions of design matrix, we can recover the true variable set with high probability if the weakest signal detection threshold is large than the turning parameter up to a known constant; at last, we briefly discuss the de-biased Elastic-net estimator and numerical studies are given to support the proposal.
Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection

Huiming Zhang,Jinzhu Jia

DOI: https://doi.org/10.48550/arXiv.1712.03412

IF: 5.414

2017-12-09

Machine Learning

Abstract:We study a sparse negative binomial regression (NBR) for count data by showing the non-asymptotic advantages of using the elastic-net estimator. Two types of oracle inequalities are derived for the NBR's elastic-net estimates by using the Compatibility Factor Condition and the Stabil Condition. The second type of oracle inequality is for the random design and can be extended to many $\ell_1 + \ell_2$ regularized M-estimations, with the corresponding empirical process having stochastic Lipschitz properties. We derive the concentration inequality for the suprema empirical processes for the weighted sum of negative binomial variables to show some high--probability events. We apply the method by showing the sign consistency, provided that the nonzero components in the true sparse vector are larger than a proper choice of the weakest signal detection threshold. In the second application, we show the grouping effect inequality with high probability. Third, under some assumptions for a design matrix, we can recover the true variable set with a high probability if the weakest signal detection threshold is large than the turning parameter up to a known constant. Lastly, we briefly discuss the de-biased elastic-net estimator, and numerical studies are given to support the proposal.
Empirical Likelihood for Single-Index Regression Models under Negatively Associated Errors

Zheng-Yan Lin,Ran Wang

DOI: https://doi.org/10.1080/03610926.2012.758746

2015-01-01

Abstract:In this article, we use bockwise empirical likelihood technique to construct confidence regions for the parameter of the single-index models under negatively associated errors. It is shown that the blockwise empirical likelihood ratio statistic for the parameter of interest is asymptotically chi(2)-type distributed. The result can be used to obtain confidence regions for the parameter of interest.
Consistent Specification Testing Via Nonparametric Series Regression

YM HONG,H WHITE

DOI: https://doi.org/10.2307/2171724

IF: 6.1

1995-01-01

Econometrica

Abstract:This paper proposes two consistent one-sided specification tests for parametric regression models, one based on the sample covariance between the residual from the parametric model and the discrepancy between the parametric and nonparametric fitted values; the other based on the difference in sums of squared residuals between the parametric and nonparametric models. We estimate the nonparametric model by series regression. The new test statistics converge in distribution to a unit normal under correct specification and grow to infinity faster than the parametric rate (n(-1/2)) under misspecification, while avoiding weighting, sample splitting, and non-nested testing procedures used elsewhere in the literature. Asymptotically, our tests can be viewed as a test of the joint hypothesis that the true parameters of a series regression model are zero, where the dependent variable is the residual from the parametric model, and the series terms are functions of the explanatory variables, chosen so as to support nonparametric estimation of a conditional expectation. We specifically consider Fourier series and regression splines, and present a Monte Carlo study of the finite sample performance of the new tests in comparison to consistent tests of Bierens (1990), Eubank and Spiegelman (1990), Jayasuriya (1990), Wooldridge (1992), and Yatchew (1992); the results show the new tests have good power, performing quite well in some situations. We suggest a joint Bonferroni procedure that combines a new test with those of Bierens and Wooldridge to capture the best features of the three approaches.
Consistency of Neural Networks with Regularization

Xiaoxi Shen,Jinghang Lin

DOI: https://doi.org/10.48550/arXiv.2207.01538

2022-06-23

Abstract:Neural networks have attracted a lot of attention due to its success in applications such as natural language processing and computer vision. For large scale data, due to the tremendous number of parameters in neural networks, overfitting is an issue in training neural networks. To avoid overfitting, one common approach is to penalize the parameters especially the weights in neural networks. Although neural networks has demonstrated its advantages in many applications, the theoretical foundation of penalized neural networks has not been well-established. Our goal of this paper is to propose the general framework of neural networks with regularization and prove its consistency. Under certain conditions, the estimated neural network will converge to true underlying function as the sample size increases. The method of sieves and the theory on minimal neural networks are used to overcome the issue of unidentifiability for the parameters. Two types of activation functions: hyperbolic tangent function(Tanh) and rectified linear unit(ReLU) have been taken into consideration. Simulations have been conducted to verify the validation of theorem of consistency.

Machine Learning,Methodology
N-Consistent Density Estimation in Semiparametric Regression Models

Shuo Li,Yundong Tu

DOI: https://doi.org/10.1016/j.csda.2016.06.013

IF: 2.035

2016-01-01

Computational Statistics & Data Analysis

Abstract:The authors propose an estimator for the density of the response variable in the parametric mean regression model where the error density is left unspecified. With the application of empirical process theory, they derive its n-consistency and asymptotical normality. This result is further extended to models which allow possible parametric misspecification on the regression function and a special location–scale model. However, it is found that n-consistency breaks down in the presence of endogeneity. Monte Carlo simulations show that the proposed estimators have superior performance in finite sample compared to other density estimators available in the literature. Two real data illustrations reveal the advantage of the proposed density estimator over the Rosenblatt–Parzen kernel density estimator.
Estimation of Partially Linear Regression Model under Partial Consistency Property

Xia Cui,Ying Lu,Heng Peng

DOI: https://doi.org/10.48550/arXiv.1401.2163

2014-01-10

Abstract:In this paper, utilizing recent theoretical results in high dimensional statistical modeling, we propose a model-free yet computationally simple approach to estimate the partially linear model $Y=X\beta+g(Z)+\varepsilon$. Motivated by the partial consistency phenomena, we propose to model $g(Z)$ via incidental parameters. Based on partitioning the support of $Z$, a simple local average is used to estimate the response surface. The proposed method seeks to strike a balance between computation burden and efficiency of the estimators while minimizing model bias. Computationally this approach only involves least squares. We show that given the inconsistent estimator of $g(Z)$, a root $n$ consistent estimator of parametric component $\beta$ of the partially linear model can be obtained with little cost in efficiency. Moreover, conditional on the $\beta$ estimates, an optimal estimator of $g(Z)$ can then be obtained using classic nonparametric methods. The statistical inference problem regarding $\beta$ and a two-population nonparametric testing problem regarding $g(Z)$ are considered. Our results show that the behavior of test statistics are satisfactory. To assess the performance of our method in comparison with other methods, three simulation studies are conducted and a real dataset about risk factors of birth weights is analyzed.

Methodology,Computation
\Ell_1-Regression with Heavy-tailed Distributions.

Lijun Zhang,Zhi-Hua Zhou

2018-01-01

Abstract:In this paper, we consider the problem of linear regression with heavy-tailed distributions. Different from previous studies that use the squared loss to measure the performance, we choose the absolute loss, which is capable of estimating the conditional median. To address the challenge that both the input and output could be heavy-tailed, we propose a truncated minimization problem, and demonstrate that it enjoys an (O) over tilde(root d/n) excess risk, where d is the dimensionality and n is the number of samples. Compared with traditional work on l(1)-regression, the main advantage of our result is that we achieve a high-probability risk bound without exponential moment conditions on the input and output. Furthermore, if the input is bounded, we show that the classical empirical risk minimization is competent for l(1)-regression even when the output is heavy-tailed.
Liu-type Negative Binomial Regression: A Comparison of Recent Estimators and Applications

Yasin Asar

DOI: https://doi.org/10.48550/arXiv.1604.02335

2016-04-08

Abstract:This paper introduces a new biased estimator for the negative binomial regression model that is a generalization of Liu-type estimator proposed for the linear model in [12]. Since the variance of the maximum likelihood estimator (MLE) is inflated when there is multicollinearity between the explanatory variables, a new biased estimator is proposed to solve the problem and decrease the variance of MLE in order to make stable inferences. Moreover, we obtain some theoretical comparisons between the new estimator and some others via matrix mean squared error (MMSE) criterion. Furthermore, a Monte Carlo simulation study is designed to evaluate performances of the estimators in the sense of mean squared error. Finally, a real data application is used to illustrate the benefits of new estimator.

Methodology
Iterative Reweighted Framework Based Algorithms for Sparse Linear Regression with Generalized Elastic Net Penalty

Yanyun Ding,Zhenghua Yao,Peili Li,Yunhai Xiao

2024-11-22

Abstract:The elastic net penalty is frequently employed in high-dimensional statistics for parameter regression and variable selection. It is particularly beneficial compared to lasso when the number of predictors greatly surpasses the number of observations. However, empirical evidence has shown that the $\ell_q$-norm penalty (where $0 < q < 1$) often provides better regression compared to the $\ell_1$-norm penalty, demonstrating enhanced robustness in various scenarios. In this paper, we explore a generalized elastic net model that employs a $\ell_r$-norm (where $r \geq 1$) in loss function to accommodate various types of noise, and employs a $\ell_q$-norm (where $0 < q < 1$) to replace the $\ell_1$-norm in elastic net penalty. Theoretically, we establish the computable lower bounds for the nonzero entries of the generalized first-order stationary points of the proposed generalized elastic net model. For implementation, we develop two efficient algorithms based on the locally Lipschitz continuous $\epsilon$-approximation to $\ell_q$-norm. The first algorithm employs an alternating direction method of multipliers (ADMM), while the second utilizes a proximal majorization-minimization method (PMM), where the subproblems are addressed using the semismooth Newton method (SNN). We also perform extensive numerical experiments with both simulated and real data, showing that both algorithms demonstrate superior performance. Notably, the PMM-SSN is efficient than ADMM, even though the latter provides a simpler implementation.

Machine Learning,Statistics Theory
A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model

Yulu Ji,Yang Liu

DOI: https://doi.org/10.3390/math12172674

IF: 2.4

2024-08-31

Mathematics

Abstract:In capture–recapture experiments, the presence of overdispersion and heterogeneity necessitates the use of the negative binomial regression model for inferring population sizes. However, within this model, existing methods based on likelihood and ratio regression for estimating the dispersion parameter often face boundary and nonidentifiability issues. These problems can result in nonsensically large point estimates and unbounded upper limits of confidence intervals for the population size. We present a penalized empirical likelihood technique for solving these two problems by imposing a half-normal prior on the population size. Based on the proposed approach, a maximum penalized empirical likelihood estimator with asymptotic normality and a penalized empirical likelihood ratio statistic with asymptotic chi-square distribution are derived. To improve numerical performance, we present an effective expectation-maximization (EM) algorithm. In the M-step, optimization for the model parameters could be achieved by fitting a standard negative binomial regression model via the R basic function glm.nb(). This approach ensures the convergence and reliability of the numerical algorithm. Using simulations, we analyze several synthetic datasets to illustrate three advantages of our methods in finite-sample cases: complete mitigation of the boundary problem, more efficient maximum penalized empirical likelihood estimates, and more precise penalized empirical likelihood ratio interval estimates compared to the estimates obtained without penalty. These advantages are further demonstrated in a case study estimating the abundance of black bears (Ursus americanus) at the U.S. Army's Fort Drum Military Installation in northern New York.

mathematics
Nonnegative Elastic Net and application in index tracking

Lan Wu,Yuehan Yang

DOI: https://doi.org/10.1016/j.amc.2013.11.049

IF: 4.397

2014-01-01

Applied Mathematics and Computation

Abstract:This paper deals with the model selection consistency of Nonnegative Elastic Net (proposed by imposing nonnegative constraint to the regression parameters) in general setting where p (the number of predictors), q (the number of predictors with non-zero coefficients in the true linear model) and n (sample size) all go to infinity. We prove that this method has nice property of variable selection consistency under NEIC condition. Comparing with Nonnegative-lasso, Nonnegative Elastic Net can select the true variables even when Nonnegative-lasso cannot. In Empirical Part, this method is applied to the constrained index tracking problem in stock market without short sales, i.e. tracking CSI 300 Index1CSI 300 Index is a capitalization-weighted stock market index designed to replicate the performance of 300 stocks traded in the Shanghai and Shenzhen stock exchanges.1 and SSE 180 Index2SSE 180 Index selects constituents with best representation through scientific and objective method. SSE is a benchmark index reflecting Shanghai market and serving as a performance benchmark for investment and a basis for financial innovation.2 by selecting about 30 stocks. The results indicate that Nonnegative Elastic Net outperforms Nonnegative-lasso in asset selection. A two-step method, Nonnegative Elastic Net combined with OLS produce better results than simple Nonnegative Elastic Net method.
The consistency of the estimators in semiparametric regression model based on m-asymptotic negatively associated errors

Jiayi Feng,Aiting Shen,Dantong Wang,Xuejun Wang

DOI: https://doi.org/10.1080/15326349.2024.2325449

2024-03-17

Stochastic Models

Abstract:In this article, we analyze the strong consistency and r -th ( r > 1) mean consistency for the estimators β u , n and g u , n of β and g , respectively, based on m -asymptotic negatively associated errors. Furthermore, the convergence rate for the strong consistency of the estimators is also considered. The results obtained in this article extend the corresponding ones for ρ ∗ -mixing random variables and other dependent sequences.

statistics & probability
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Mo Zhou,Rong Ge

DOI: https://doi.org/10.48550/arXiv.2302.00257

2023-05-26

Abstract:In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem $y = \beta^{*\top} x +\xi$ with sparse $\beta^*$, neither minimum $\ell_1$ or $\ell_2$ norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.

Machine Learning
On Model Selection Consistency of the Elastic Net when P >> N

Jinzhu Jia,Bin Yu

DOI: https://doi.org/10.21236/ada485557

IF: 1.4

2010-01-01

Statistica Sinica

Abstract:We study the model selection property of the Elastic Net. In the classical settings when p (the number of predictors) and q(the number of predictors with non-zero coefficients in the true linear model) are fixed, Yuan and Lin (2007) give a necessary and sufficient condition for the Elastic Net to consistently select the true model. They showed that it consistently selects the true model if and only if there exist suitable sequences lambda(1) (n) and lambda(1)(n) that satisfy EIC (which is defined later in the paper). Here we study the general case when p, q, and n all go to infinity. For general scalings of p, q, and n, when gaussian noise is assumed, sufficient conditions are given such that EIC guarantees the Elastic Net's model selection consistency. We show that to make these conditions hold, n should grow at a rate faster than q log(p q). We compare the variable selection performance of the Elastic Net with that of the Lasso. Through theoretical results and simulation studies, we provide insights into when the Elastic Net can consistently select the true model even when the Lasso cannot. We also point out through examples that when the Lasso cannot select the true model, it is very likely that the Elastic Net cannot select the true model either.
Strong Consistency of the Internal Estimator of Nonparametric Regression with Dependent Data

Jia Shen,Yuan Xie

DOI: https://doi.org/10.1016/j.spl.2013.04.027

IF: 0.718

2013-01-01

Statistics & Probability Letters

Abstract:In this paper, the strong consistency of the multivariate internal nonparametric estimator is investigated under strong mixing dependence assumption. This estimator is particularly easy to use when we model the regression function by additive nonparametric structure. The pointwise strong consistency and its rate are given as well as that over a compact set, under suitable conditions.
Robust Negative Binomial Regression via the Kibria–Lukman Strategy: Methodology and Application

Adewale F. Lukman,Olayan Albalawi,Mohammad Arashi,Jeza Allohibi,Abdulmajeed Atiah Alharbi,Rasha A. Farghali

DOI: https://doi.org/10.3390/math12182929

IF: 2.4

2024-09-21

Mathematics

Abstract:Count regression models, particularly negative binomial regression (NBR), are widely used in various fields, including biometrics, ecology, and insurance. Over-dispersion is likely when dealing with count data, and NBR has gained attention as an effective tool to address this challenge. However, multicollinearity among covariates and the presence of outliers can lead to inflated confidence intervals and inaccurate predictions in the model. This study proposes a comprehensive approach integrating robust and regularization techniques to handle the simultaneous impact of multicollinearity and outliers in the negative binomial regression model (NBRM). We investigate the estimators' performance through extensive simulation studies and provide analytical comparisons. The simulation results and the theoretical comparisons demonstrate the superiority of the proposed robust hybrid KL estimator (M-NBKLE) with predictive accuracy and stability when multicollinearity and outliers exist. We illustrate the application of our methodology by analyzing a forestry dataset. Our findings complement and reinforce the simulation and theoretical results.

mathematics
Accurate inference in negative binomial regression

Euloge Clovis Kenne Pagui,Alessandra Salvan,Nicola Sartori

DOI: https://doi.org/10.48550/arXiv.2011.02784

2020-11-05

Abstract:Negative binomial regression is commonly employed to analyze overdispersed count data. With small to moderate sample sizes, the maximum likelihood estimator of the dispersion parameter may be subject to a significant bias, that in turn affects inference on mean parameters. This paper proposes inference for negative binomial regression based on adjustments of the score function aimed at mean and median bias reduction. The resulting estimating equations are similar to those available for improved inference in generalized linear models and, in particular, can be solved using a suitable extension of iterative weighted least squares. Simulation studies show a remarkable performance of the new methods, which are also found to solve in many cases numerical problems of maximum likelihood estimates. The methods are illustrated and evaluated using two case studies: an Ames salmonella assay data set and data on epileptic seizures. Inference based on adjusted scores turns out to be generally preferable to explicit bias correction.

Methodology
Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective

Fufangchen Zhao,Guoqiang Jin,Jiaheng Huang,Rui Zhao,Fei Tan

2024-02-27

Abstract:Nowadays both commercial and open-source academic LLM have become the mainstream models of NLP. However, there is still a lack of research on LLM consistency, meaning that throughout the various stages of LLM research and deployment, its internal parameters and capabilities should remain unchanged. This issue exists in both the industrial and academic sectors. The solution to this problem is often time-consuming and labor-intensive, and there is also an additional cost of secondary deployment, resulting in economic and time losses. To fill this gap, we build an LLM consistency task dataset and design several baselines. Additionally, we choose models of diverse scales for the main experiments. Specifically, in the LightGBM experiment, we used traditional NLG metrics (i.e., ROUGE, BLEU, METEOR) as the features needed for model training. The final result exceeds the manual evaluation and GPT3.5 as well as other models in the main experiment, achieving the best performance. In the end, we use the best performing LightGBM model as the base model to build the evaluation tool, which can effectively assist in the deployment of business models. Our code and tool demo are available at https://github.com/heavenhellchen/Consistency.git

Computation and Language
A General Framework of the Consistency for Large Neural Networks

Haoran Zhan,Yingcun Xia

2024-10-03

Abstract:Neural networks have shown remarkable success, especially in overparameterized or "large" models. Despite increasing empirical evidence and intuitive understanding, a formal mathematical justification for the behavior of such models, particularly regarding overfitting, remains incomplete. In this paper, we propose a general regularization framework to study the Mean Integrated Squared Error (MISE) of neural networks. This framework includes many commonly used neural networks and penalties, such as ReLu and Sigmoid activations and $L^1$, $L^2$ penalties. Based on our frameworks, we find the MISE curve has two possible shapes, namely the shape of double descents and monotone decreasing. The latter phenomenon is new in literature and the causes of these two phenomena are also studied in theory. These studies challenge conventional statistical modeling frameworks and broadens recent findings on the double descent phenomenon in neural networks.

Machine Learning,Statistics Theory

Consistency of $\ell _{1}$ Penalized Negative Binomial Regressions

ELASTIC-NET REGULARIZED HIGH-DIMENSIONAL NEGATIVE BINOMIAL REGRESSION: CONSISTENCY AND WEAK SIGNAL DETECTION

Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection

Empirical Likelihood for Single-Index Regression Models under Negatively Associated Errors

Consistent Specification Testing Via Nonparametric Series Regression

Consistency of Neural Networks with Regularization

N-Consistent Density Estimation in Semiparametric Regression Models

Estimation of Partially Linear Regression Model under Partial Consistency Property

\Ell_1-Regression with Heavy-tailed Distributions.

Liu-type Negative Binomial Regression: A Comparison of Recent Estimators and Applications

Iterative Reweighted Framework Based Algorithms for Sparse Linear Regression with Generalized Elastic Net Penalty

A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model

Nonnegative Elastic Net and application in index tracking

The consistency of the estimators in semiparametric regression model based on m-asymptotic negatively associated errors

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

On Model Selection Consistency of the Elastic Net when P >> N

Strong Consistency of the Internal Estimator of Nonparametric Regression with Dependent Data

Robust Negative Binomial Regression via the Kibria–Lukman Strategy: Methodology and Application

Accurate inference in negative binomial regression

Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective

A General Framework of the Consistency for Large Neural Networks