Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines

Fan Li,Yaming Yu,Donald B. Rubin
2012-01-01
Abstract:Missing data are pervasive in large public-use databases. M ultiple imputation (MI) is an effective methodology to handle the problem. Current state -of-the-art procedures of MI often fit fully Bayesian models assuming some joint probabilit y d stribution for the underlying complete data. Though theoretically valid, joint modeling may not accurately capture the important relations between the variables that are outside that theoretical structure. Alternatively, a widely used strategy multiple imputation usin g chained equations (MICE), first specifies a set of univariate conditional models and then ite rat v ly imputes the missing data based on these conditional models. Though practically flexi bl , MICE defines a possibly incompatible Gibbs sampler (PIGS) when there is no joint dis tribution corresponding to the specified conditional distributions. We construct several x mples to reveal some of the undesirable theoretical and algorithmic properties of a PIGS . We then propose a spectrum of imputation strategies, imputation by monotone blocks (IMB ), which combines (1) sequential imputation for monotone missing data, (2) and a fully condit ional strategy like MICE when (1) cannot be applied. The key is to partition an arbitrary mi ssing data pattern into a series of monotone patterns. We further provide some general guide lines for choosing strategies within this spectrum in practice.
What problem does this paper attempt to address?