Optimal Estimation of Large-Dimensional Nonlinear Factor Models

Yingjie Feng
2023-11-13
Abstract:This paper studies optimal estimation of large-dimensional nonlinear factor models. The key challenge is that the observed variables are possibly nonlinear functions of some latent variables where the functional forms are left unspecified. A local principal component analysis method is proposed to estimate the factor structure and recover information on latent variables and latent functions, which combines $K$-nearest neighbors matching and principal component analysis. Large-sample properties are established, including a sharp bound on the matching discrepancy of nearest neighbors, sup-norm error bounds for estimated local factors and factor loadings, and the uniform convergence rate of the factor structure estimator. Under mild conditions our estimator of the latent factor structure can achieve the optimal rate of uniform convergence for nonparametric regression. The method is illustrated with a Monte Carlo experiment and an empirical application studying the effect of tax cuts on economic growth.
Statistics Theory,Econometrics,Methodology
What problem does this paper attempt to address?
This paper attempts to address the problem of estimating large-scale nonlinear factor models in high-dimensional data. Specifically, it focuses on the scenario where the observed variables may be nonlinear functions of some latent variables, and the forms of these functions are unspecified. This presents a challenge because traditional linear factor models may not capture such complex relationships. To tackle this issue, the authors propose a method called Local Principal Component Analysis (LPCA), which combines K-nearest neighbors matching and principal component analysis techniques to estimate the factor structure and recover information about the latent variables and latent functions. ### Main Contributions: 1. **Precise Boundaries for Nearest Neighbor Matching**: The authors derive precise boundaries for the implicit differences in nearest neighbor matching on latent variables. This result applies to general choices of distance functions, extending previous research based on specific metric matching techniques. 2. **Error Bounds for Local Factors and Factor Loadings**: For the estimated local factors and factor loadings obtained through PCA applied to nearest neighbors, the authors derive sup-norm error bounds. These target quantities describe the latent functions and latent variables, respectively. 3. **Convergence Rates for Consistent Estimation**: Based on the first two results, the authors demonstrate that local PCA can consistently estimate the nonlinear factor structure and derive a uniform convergence rate over individuals and features. Under fairly general conditions, the local PCA estimator can achieve the same optimal convergence rate as the infeasible cross-sectional nonparametric estimator. 4. **Extended Models**: The paper also extends the basic nonlinear factor model to include observable regression variables with high-rank variations, providing new tools for studying linear regression models with nonlinear fixed effects, for example. 5. **Application to Matrix Completion Problems**: The authors apply local PCA to matrix completion problems with a small number of missing entries and demonstrate the potential usefulness of this method in policy evaluation settings through an empirical application (studying the impact of tax cuts on economic growth). ### Method Overview: - **K-Nearest Neighbors Matching**: First, define the "distance" between different units in the sample based on the observed variables, and then find the K nearest neighbors for each unit. - **Local Principal Component Analysis**: Apply PCA within the local neighborhood of each unit to extract the local factor structure. If the found neighbors are indeed close in latent variables, a low-rank structure can locally approximate the potentially full-rank matrix. ### Theoretical Foundation: - **Denoising**: Choosing an appropriate distance function can "denoise" the data, revealing the latent factor structure. - **Information Transmission**: The distance in the noise-free structure needs to reflect the distance in unobserved variables, which is a key condition for ensuring the effectiveness of nearest neighbor matching. ### Empirical Application: - **Synthetic Control Method**: Through an empirical application studying the impact of tax cuts on economic growth, the potential application value of this method in policy evaluation is demonstrated. In summary, this paper makes significant contributions to handling high-dimensional nonlinear factor models, providing a new method to estimate complex factor structures and validating the effectiveness of this method both theoretically and empirically.