Systematically missing data in distributed data networks: multiple imputation when data cannot be pooled

Robert Thiesmeier,Matteo Bottai,Nicola Orsini
DOI: https://doi.org/10.1080/00949655.2024.2404220
IF: 1.225
2024-09-21
Journal of Statistical Computation and Simulation
Abstract:Systematically missing data in distributed data networks presents practical and methodological challenges. Failure to handle it appropriately can bias statistical inference. Multiple imputations can be used to address systematic missingness. However, when data from different study sites cannot be pooled into a unified file, conventional imputation approaches become unavailable due to the absence of a basis for imputation. To address such challenges, we introduce an imputation method based on conditional quantiles – conditional quantile imputation (CQI) – which involves four steps: (i) estimating 99 quantiles for the systematically missing variable in studies with observed data; (ii) deriving a weighted average of regression coefficients across studies and transmitting it to sites with systematically missing data; (iii) imputing the systematically missing values based on observed data and the set of regression coefficients from step ii; and (iv) combining estimates of the substantive outcome model across imputations using Rubin's rules. We evaluate CQI in different simulation scenarios and illustrate it with an applied data example. We conclude that CQI can be a suitable approach for the imputation of systematically missing data when data from multiple studies cannot be pooled.
statistics & probability,computer science, interdisciplinary applications
What problem does this paper attempt to address?