19 Incomplete Data in Epidemiology and Medical Statistics
Susanne Raessler,Donald B. Rubin,Elizabeth R. Zell
DOI: https://doi.org/10.1016/s0169-7161(07)27019-1
2008-01-01
Abstract:Missing data are a common problem in most epidemiological and medical studies, including surveys and clinical trials. Imputation, or filling in the missing values, is an intuitive and flexible way to handle the incomplete data sets that arise because of such missing data. Here, in addition to imputation, including multiple imputation (MI), we discuss several other strategies and their theoretical background, as well as present some examples and advice on computation. Our focus is on MI, which is a statistically valid strategy for handling missing data, although we review other less sound methods, as well as direct maximum likelihood and Bayesian methods for estimating parameters, which are also valid approaches. The analysis of a multiply-imputed data set is now relatively standard using readily available statistical software. The creation of multiply-imputed data sets is more challenging than their analysis but still straightforward relative to other valid methods of handling missing data, and we discuss available software for doing so. Ad hoc methods, including using singly-imputed data sets, almost always lead to invalid inferences and should be eschewed, especially when the focus is on valid interval estimation or testing hypotheses.