Data Imputation with Iterative Graph Reconstruction

Jiajun Zhong,Weiwei Ye,Ning Gui

DOI: https://doi.org/10.1609/aaai.v37i9.26348

2024-04-15

Abstract:Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at <a class="link-external link-https" href="https://github.com/G-AILab/IGRM" rel="external noopener nofollow">this https URL</a>.

Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of data imputation, particularly how to perform imputation more effectively in cases of high missing data rates. #### Main Contributions 1. **Introduction of the "Friend Network" Concept**: The paper proposes a novel concept called the "Friend Network" to enhance the traditional bipartite graph structure, thereby establishing differentiated connections between samples. 2. **End-to-End Trainable Framework**: To generate accurate friend networks in the presence of a large amount of missing data, an end-to-end trainable framework called IGRM (Iterative Graph Generation and Reconstruction framework for Missing data imputation) is proposed. This framework can continuously optimize the friend network and further refine the bipartite graph learning process through differentiated message passing. 3. **Node Embedding to Reduce Bias**: Innovatively uses node embedding to mitigate the impact of a large amount of missing data and handle the distribution diversity of different attributes, rather than simply relying on pure attribute similarity. 4. **Experimental Results**: Comparative experiments were conducted on eight benchmark datasets against nine state-of-the-art baseline methods. The results show that IGRM reduces the mean absolute error (MAE) by 9.04% compared to the second-best baseline method at a 30% missing rate and performs even better on data with higher missing rates. Through these contributions, the paper addresses existing issues in imputation methods, such as the inability to simultaneously capture complex relationships between features and samples, and the difficulty in obtaining accurate sample relationships in cases of high missing data rates.

Data Imputation with Iterative Graph Reconstruction

A Bipartite Graph Based Method for Traffic Continuous Data Imputation

Revisiting Initializing Then Refining: An Incomplete and Missing Graph Imputation Network

Data Imputation from the Perspective of Graph Dirichlet Energy

Initializing then Refining: A Simple Graph Attribute Imputation Network

GIG: Graph Data Imputation With Graph Differential Dependencies

Missing data imputation with adversarially-trained graph convolutional networks

Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph

Efficient Web-Based Data Imputation With Graph Model

Relational Data Imputation with Quality Guarantee.

GAGIN: generative adversarial guider imputation network for missing data

DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

Iterative Time Series Imputation by Maintaining Dependency Consistency

Handling Missing Data with Graph Representation Learning

Bidirectional Spatial-Temporal Traffic Data Imputation via Graph Attention Recurrent Neural Network

An Overview of Graph Data Missing Value Imputation

MGGNet: A Multi-Graph Generation Network for Time Series Imputation

Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks

A Global to Local Guiding Network for Missing Data Imputation.

Imputing Brain Measurements Across Data Sets via Graph Neural Networks

Graph-Tensor Neural Networks for Network Traffic Data Imputation.