Abstract:This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency, estimability and testability}. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are Missing Not At Random (MNAR). In particular, we identify conditions that guarantee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally we derive testable implications for missing data models in both MAR (Missing At Random) and MNAR categories.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use graphical models to deal with transparency, estimability and testability in the problem of missing data. Specifically: 1. **Transparency**: The paper explores how to make the missing - data mechanism more transparent through graphical models. Traditional methods for handling missing data often assume that the data conforms to a specific missing mechanism (such as MCAR or MAR), but these assumptions are difficult to verify in practical applications. Graphical models can show the causal relationships and conditional independence between variables through an intuitive graph structure, making it easier for researchers to understand and verify these assumptions. 2. **Estimability**: The paper studies which parameters can be consistently estimated from incomplete data given a graphical model. Especially in the case of non - random missing data (MNAR), traditional statistical methods often cannot provide consistent estimates. The paper proposes some conditions and methods that can achieve consistent estimation in a wider range of missing - data problems. 3. **Testability**: The paper discusses how to test whether model assumptions are compatible with observed data through graphical models. Especially in the MNAR case, traditional testing methods are very limited. The paper proposes some new testing methods that can be used to test conditional independence assumptions and thus verify the validity of the model. ### Main contributions - **Transparency**: Through graphical models, researchers can intuitively understand the missing - data mechanism and determine whether the data conforms to MCAR, MAR or MNAR. - **Estimability**: The paper provides conditions and methods that enable consistent estimation even in the MNAR case. - **Testability**: New testing methods are proposed that can test the validity of model assumptions in the MNAR case. ### Specific examples The paper shows how to use graphical models to deal with the problem of missing data through specific examples. For example, in a school study, three variables were measured: age (A), gender (G) and obesity (O). Among them, the obesity data are partially missing. By constructing a graphical model, the missing mechanism can be clearly represented and data analysis can be carried out accordingly. ### Mathematical formulas - **Missing - data distribution**: \[ P(G, O^*, A, R_O)=P(G, O|A, R_O = 0)P(A)P(R_O) \] where \(O^*\) is the proxy variable for obesity and \(R_O\) represents the missing mechanism of obesity data. - **Recovery of joint distribution**: \[ P(G, O, A)=P(G, O^*|A, R_O = 0)P(A) \] ### Conclusion The paper provides a new framework through graphical models, which solves the problems of low transparency, poor estimability and insufficient testability in traditional methods for handling missing data. This framework is not only applicable to the MAR case, but also can provide effective solutions in the MNAR case.

Graphical Models for Processing Missing Data

Missing Data Exploration: Highlighting Graphical Presentation of Missing Pattern.

Graphical Models of Entangled Missingness

Toward Systematic Considerations of Missingness in Visual Analytics

Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm

Pattern graphs: a graphical approach to nonmonotone missing data

Graphical Model Sketch.

Graphical Models for Relations - Modeling Relational Context

Handling Missing Data with Graph Representation Learning

Model-based clustering with missing not at random data

Efficient Learning of Discrete Graphical Models

Review for Handling Missing Data with special missing mechanism

To Explore What Isn't There -- Glyph-based Visualization for Analysis of Missing Values

Graph Neural Networks for Missing Value Classification in a Task-Driven Metric Space.

rags2ridges: A One-Stop-Shop for Graphical Modeling of High-Dimensional Precision Matrices

Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers

Attribute-Missing Graph Clustering Network

Missingness-Pattern-Adaptive Learning With Incomplete Data

Diagnosing missing always at random in multivariate data

A Realistic Evaluation of Methods for Handling Missing Data When There is a Mixture of MCAR, MAR, and MNAR Mechanisms in the Same Dataset