Imputing missing data by fully conditional models: Some cautionary examples and guidelines

Fan Li, Yaming Yu, Donald B Rubin
2012-02-02
Abstract:Missing data are pervasive in large public-use databases. Multiple imputation (MI) is an effective methodology to handle the problem. Current state-of-the-art procedures of MI often fit fully Bayesian models assuming some joint probability distribution for the underlying complete data. Though theoretically valid, joint modeling may not accurately capture the important relations between the variables that are outside that theoretical structure. Alternatively, a widely used strategy-multiple imputation using chained equations (MICE), first specifies a set of univariate conditional models and then iteratively imputes the missing data based on these conditional models. Though practically flexible, MICE defines a possibly incompatible Gibbs sampler (PIGS) when there is no joint distribution corresponding to the specified conditional distributions. We construct several examples to reveal some of the undesirable theoretical and algorithmic properties of a PIGS. We then propose a spectrum of imputation strategies, imputation by monotone blocks (IMB), which combines (1) sequential imputation for monotone missing data,(2) and a fully conditional strategy like MICE when (1) cannot be applied. The key is to partition an arbitrary missing data pattern into a series of monotone patterns. We further provide some general guidelines for choosing strategies within this spectrum in practice.
What problem does this paper attempt to address?