Graphical Models for Processing Missing Data

Karthika Mohan,Judea Pearl
DOI: https://doi.org/10.48550/arXiv.1801.03583
2019-11-14
Abstract:This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency, estimability and testability}. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are Missing Not At Random (MNAR). In particular, we identify conditions that guarantee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally we derive testable implications for missing data models in both MAR (Missing At Random) and MNAR categories.
Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use graphical models to deal with transparency, estimability and testability in the problem of missing data. Specifically: 1. **Transparency**: The paper explores how to make the missing - data mechanism more transparent through graphical models. Traditional methods for handling missing data often assume that the data conforms to a specific missing mechanism (such as MCAR or MAR), but these assumptions are difficult to verify in practical applications. Graphical models can show the causal relationships and conditional independence between variables through an intuitive graph structure, making it easier for researchers to understand and verify these assumptions. 2. **Estimability**: The paper studies which parameters can be consistently estimated from incomplete data given a graphical model. Especially in the case of non - random missing data (MNAR), traditional statistical methods often cannot provide consistent estimates. The paper proposes some conditions and methods that can achieve consistent estimation in a wider range of missing - data problems. 3. **Testability**: The paper discusses how to test whether model assumptions are compatible with observed data through graphical models. Especially in the MNAR case, traditional testing methods are very limited. The paper proposes some new testing methods that can be used to test conditional independence assumptions and thus verify the validity of the model. ### Main contributions - **Transparency**: Through graphical models, researchers can intuitively understand the missing - data mechanism and determine whether the data conforms to MCAR, MAR or MNAR. - **Estimability**: The paper provides conditions and methods that enable consistent estimation even in the MNAR case. - **Testability**: New testing methods are proposed that can test the validity of model assumptions in the MNAR case. ### Specific examples The paper shows how to use graphical models to deal with the problem of missing data through specific examples. For example, in a school study, three variables were measured: age (A), gender (G) and obesity (O). Among them, the obesity data are partially missing. By constructing a graphical model, the missing mechanism can be clearly represented and data analysis can be carried out accordingly. ### Mathematical formulas - **Missing - data distribution**: \[ P(G, O^*, A, R_O)=P(G, O|A, R_O = 0)P(A)P(R_O) \] where \(O^*\) is the proxy variable for obesity and \(R_O\) represents the missing mechanism of obesity data. - **Recovery of joint distribution**: \[ P(G, O, A)=P(G, O^*|A, R_O = 0)P(A) \] ### Conclusion The paper provides a new framework through graphical models, which solves the problems of low transparency, poor estimability and insufficient testability in traditional methods for handling missing data. This framework is not only applicable to the MAR case, but also can provide effective solutions in the MNAR case.