Maximum likelihood estimation of missing data probability for nonmonotone missing at random data

Yang Zhao
DOI: https://doi.org/10.1007/s10260-022-00650-5
2022-06-18
Abstract:In general, statistical analysis with missing data requires specification of a model for the missing data probability and/or the covariate distribution. For nonmonotone missing data patterns, modeling and practical estimation of the missing data probability are very challenging. Recently a semiparametric likelihood model was developed to estimate parametric regression models for the missing data mechanism based on all the observed data, which can deal with arbitrary nonmonotone missing data patterns. However, due to the curse of dimensionality in the likelihood-based models, this method becomes impractical if the number of variables increases. This research generalizes the semiparametric likelihood model such that it can deal with any number of variables with arbitrary nonmonotone missing data patterns. It further introduces a semiparametric estimator of the missing data probability for the partially observed data, which can be used to assess the model fit. An EM algorithm with closed form expressions at each step are used to compute the estimates. Simulation studies in various settings indicate that the performance of the new method is acceptable for practical implementation. The missing data mechanism of a case-control study of hip fractures among male veterans is analyzed to illustrate the method.
statistics & probability
What problem does this paper attempt to address?