Invited Commentary: Estimation and Bounds under Data Fusion

Miao Wang,Li Wei,Wenjie Hu,Ruoyu Wang,Zhi Geng
DOI: https://doi.org/10.1093/aje/kwab194
2021-01-01
American Journal of Epidemiology
Abstract:In their recent article, Ogburn et al. (Am J Epidemiol. 2021;190(6):1142-1147) raised a cautionary tale for epidemiologic data fusion: Bias may occur if a variable that is completely missing in the primary data set is imputed according to a regression model estimated from an auxiliary data set. However, in some specific settings, a solution may exist. Focusing on a linear outcome regression model with a missing covariate, we show that the bias can be eliminated if the underlying imputation model for the missing covariate is nonlinear in the common variables measured in both data sets. Otherwise, we describe 2 alternative approaches existing in the data fusion literature that could partially resolve this issue: One fits the outcome model by leveraging an additional validation data set containing joint observations of the outcome and the missing covariate, and the other offers informative bounds for the outcome regression coefficients without using validation data. We justify these 3 methods in a linear outcome model and briefly discuss their extension to general settings.
What problem does this paper attempt to address?