Conditional nonparametric variable screening by neural factor regression

Jianqing Fan,Weining Wang,Yue Zhao
2024-08-20
Abstract:High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to their representation power. We ask the question whether individual covariates have additional contributions given the latent factors or more generally a set of variables. Our test statistics are based on the estimated partial derivative of the regression function of the candidate variable for screening and a observable proxy for the latent factors. Hence, our test reveals how much predictors contribute additionally to the non-parametric regression after accounting for the latent factors. Our derivative estimator is the convolution of a deep neural network regression estimator and a smoothing kernel. We demonstrate that when the neural network size diverges with the sample size, unlike estimating the regression function itself, it is necessary to smooth the partial derivative of the neural network estimator to recover the desired convergence rate for the derivative. Moreover, our screening test achieves asymptotic normality under the null after finely centering our test statistics that makes the biases negligible, as well as consistency for local alternatives under mild conditions. We demonstrate the performance of our test in a simulation study and two real world applications.
Econometrics,Statistics Theory
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the issue of variable selection in nonparametric regression within a high-dimensional data environment, particularly in the presence of latent factors. The authors propose a conditional variable selection method based on neural networks, leveraging the powerful representation capabilities of neural networks to adapt to potential low-dimensional structures. Specifically, the main challenges faced in the study include: 1. **High-dimensional covariates**: Effectively screening in scenarios with a large number of covariates to reduce computational burden and improve predictive performance. 2. **Presence of latent factors**: Covariates may follow a linear factor structure, which requires screening while considering these factors. 3. **Variable selection in nonparametric regression**: Conducting variable selection within a nonparametric regression framework is more challenging than in parametric models, as nonparametric methods often face the "curse of dimensionality." To address the above challenges, the authors propose the following key contributions: - **Neural network-based nonparametric regression estimation**: Utilizing the flexibility of neural networks to approximate complex nonparametric regression functions and estimate latent factors through neural networks. - **Smoothing techniques**: Applying smoothing to the neural network derivative estimates to improve their convergence speed and quality, addressing irregularities caused by increasing neural network depth. - **Estimation of partial derivatives**: Evaluating the additional contribution of candidate variables to the response variable by estimating their partial derivatives, thereby achieving variable selection. - **Statistical tests**: Constructing statistics based on partial derivative estimates to perform hypothesis testing, determining whether covariates have a significant impact on the response variable. In summary, this study aims to develop a new conditional variable selection testing method that can effectively utilize neural networks for nonparametric regression analysis in high-dimensional inputs and achieve efficient and reliable variable selection on this basis.