Comparing Regressions when Some Predictor Values Are Missing

DB RUBIN
DOI: https://doi.org/10.1080/00401706.1976.10489425
1976-01-01
Technometrics
Abstract:The sample multiple correlation coefficient is often used to compare sets of independent variables with respect to how well they predict the future values of a dependent variable, Y. If the data are partially missing, the comparison often should reflect not only how correlated the predictors are with Y but also how likely they are to be observed. Thus, an independent variable that is highly correlated with Y but also is often missing is not as useful a predictor of future Y values as a less correlated but always observed independent variable. A generalization of the multiple correlation coefficient is defined which is appropriate when there are missing values but is identical to the multiple correlation coefficient when there are no missing values. An example of its use is presented.
What problem does this paper attempt to address?