SELECTING THE “BEST” REGRESSION WHEN FACED WITH MISSING OBSERVATIONS

Donald B. Rubin
DOI: https://doi.org/10.1002/j.2333-8504.1975.tb01049.x
1975-01-01
Research Bulletin
Abstract:ABSTRACTThe sample multiple correlation coefficient is often used to select a subset of independent variables that “best” predicts a dependent variable, Y. If the data are partially missing, the choice of best predictors often should reflect not only how correlated the predictors are with Y but also how likely they are to be observed. Thus, an independent variable that is highly correlated with Y but also is difficult to record (i.e., is often missing) may not be as useful a predictor of Y as a less correlated but easily recorded independent variable. A generalization of the multiple correlation coefficient is defined which is appropriate when there are missing values but is identical to the multiple correlation coefficient when there are no missing values. An example of its use is presented.
What problem does this paper attempt to address?