Computing Robust Leverage Diagnostics when the Design Matrix Contains Coded Categorical Variables

Kjell Konis
DOI: https://doi.org/10.48550/arXiv.1301.5035
2013-01-22
Abstract:For a robust leverage diagnostic in linear regression, Rousseeuw and van Zomeren [1990] proposed using robust distance (Mahalanobis distance computed using robust estimates of location and covariance). However, a design matrix X that contains coded categorical predictor variables is often sufficiently sparse that robust estimates of location and covariance cannot be computed. Specifically, matrices formed by taking subsets of the rows of X are likely to be singular, causing algorithms that rely on subsampling to fail. Following the spirit of Maronna and Yohai [2000], we observe that extreme leverage points are extreme in the continuous predictor variables. We therefore propose a robust leverage diagnostic that combines a robust analysis of the continuous predictor variables and the classical definition of leverage.
Computation
What problem does this paper attempt to address?