A note on the variance in principal component regression

Bert van der Veen
DOI: https://doi.org/10.48550/arXiv.2301.01543
2023-01-04
Methodology
Abstract:Principal component regression is a popular method to use when the predictor matrix in a regression is of reduced column rank. It has been proposed to stabilize computation under such conditions, and to improve prediction accuracy by reducing variance of the least squares estimator for the regression slopes. However, it presents the added difficulty of having to determine which principal components to include in the regression. I provide arguments against selecting the principal components by the magnitude of their associated eigenvalues, by examining the estimator for the residual variance, and by examining the contribution of the residual variance to the variance of the estimator for the regression slopes. I show that when a principal component is omitted from the regression that is important in explaining the response variable, the residual variance is overestimated, so that the variance of the estimator for the regression slopes can be higher than that of the ordinary least squares estimator.
What problem does this paper attempt to address?