On Distribution-Weighted Partial Least Squares with Diverging Number of Highly Correlated Predictors

Li-Ping Zhu,Li-Xing Zhu
DOI: https://doi.org/10.1111/j.1467-9868.2008.00697.x
2009-01-01
Abstract:Summary.  Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution‐weighted least squares estimator that can recover directions in the central subspace, then use the distribution‐weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution‐weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O{n1/2/ log (n)} and o(n1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n1/2 and n1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non‐ellipticity and works well even in ‘small n–large p’ problems.
What problem does this paper attempt to address?