Principal Components and Regularized Estimation of Factor Models

Jushan Bai,Serena Ng
DOI: https://doi.org/10.48550/arXiv.1708.08137
2017-11-13
Abstract:It is known that the common factors in a large panel of data can be consistently estimated by the method of principal components, and principal components can be constructed by iterative least squares regressions. Replacing least squares with ridge regressions turns out to have the effect of shrinking the singular values of the common component and possibly reducing its rank. The method is used in the machine learning literature to recover low-rank matrices. We study the procedure from the perspective of estimating a minimum-rank approximate factor model. We show that the constrained factor estimates are biased but can be more efficient in terms of mean-squared errors. Rank consideration suggests a data-dependent penalty for selecting the number of factors. The new criterion is more conservative in cases when the nominal number of factors is inflated by the presence of weak factors or large measurement noise. The framework is extended to incorporate a priori linear constraints on the loadings. We provide asymptotic results that can be used to test economic hypotheses.
Methodology,Econometrics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the estimation of factor models through principal component analysis (PCA) and regularization estimation methods, especially in the context of big data. Specifically, the paper focuses on the following points: 1. **Low - rank matrix recovery**: The paper explores how to use the singular value thresholding (SVT) technique to recover low - rank matrices from limited or noisy data. This method is particularly useful when dealing with large - scale data sets, such as the movie rating prediction problem in the Netflix Challenge, as well as problems in compressed sensing and face recognition. 2. **Minimum - rank factor model estimation**: The paper proposes an SVT - based minimum - rank factor model estimation method. Traditional factor analysis methods often have difficulty in determining the number of factors, especially when there are weak factors or a large amount of measurement noise in the data. SVT makes the selected number of factors more conservative by introducing a data - dependent penalty term, thereby improving the robustness of the model. 3. **Statistical properties of factor estimation**: From the perspective of the parameterized factor model, the paper provides the asymptotic properties of the SVT estimator. These results can be used to test economic hypotheses and are complementary to the algorithmic properties from the machine - learning perspective. 4. **Regularized factor analysis framework**: The paper proposes a frequentist - based regularized factor analysis framework that allows prior linear constraints to be added during the estimation process. This provides a theoretical basis for testing economic hypotheses. 5. **Selection of the number of factors**: The paper proposes a new criterion for selecting the number of factors, which implicitly adds a data - dependent term to pursue the common component with the minimum rank. This criterion can give a more conservative estimate of the number of factors when there are outliers in the data or when some factors contribute less to the common component. ### Formula Explanation - **Singular Value Decomposition (SVD)**: \[ Z = UDV' \] where \(U\) and \(V\) are the left - singular vector matrix and the right - singular vector matrix respectively, and \(D\) is the singular value matrix. - **Nuclear Norm**: \[ \|Z\|_*=\sum_{i = 1}^{\min(m,n)}d_i(Z) \] where \(d_i(Z)\) is the \(i\)-th singular value of matrix \(Z\). - **Singular Value Thresholding (SVT)**: \[ D_\gamma^r=[\max(D_r-\gamma I_r,0)] \] where \(D_r\) is the diagonal matrix composed of the first \(r\) singular values, and \(\gamma\) is the threshold parameter. - **Optimization problem of SVT**: \[ U_rD_\gamma^rV_r'=\arg\min_L\|L\|_*+\frac{1}{2}\|Z - L\|_F^2 \] ### Conclusion By introducing the SVT technique, the paper not only solves the problem of factor model estimation in the context of big data, but also provides a new criterion for selecting the number of factors, making the model more robust and efficient. These methods have important application values in economic data analysis.