Data Imputation with Iterative Graph Reconstruction

Jiajun Zhong,Weiwei Ye,Ning Gui
DOI: https://doi.org/10.1609/aaai.v37i9.26348
2024-04-15
Abstract:Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at <a class="link-external link-https" href="https://github.com/G-AILab/IGRM" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of data imputation, particularly how to perform imputation more effectively in cases of high missing data rates. #### Main Contributions 1. **Introduction of the "Friend Network" Concept**: The paper proposes a novel concept called the "Friend Network" to enhance the traditional bipartite graph structure, thereby establishing differentiated connections between samples. 2. **End-to-End Trainable Framework**: To generate accurate friend networks in the presence of a large amount of missing data, an end-to-end trainable framework called IGRM (Iterative Graph Generation and Reconstruction framework for Missing data imputation) is proposed. This framework can continuously optimize the friend network and further refine the bipartite graph learning process through differentiated message passing. 3. **Node Embedding to Reduce Bias**: Innovatively uses node embedding to mitigate the impact of a large amount of missing data and handle the distribution diversity of different attributes, rather than simply relying on pure attribute similarity. 4. **Experimental Results**: Comparative experiments were conducted on eight benchmark datasets against nine state-of-the-art baseline methods. The results show that IGRM reduces the mean absolute error (MAE) by 9.04% compared to the second-best baseline method at a 30% missing rate and performs even better on data with higher missing rates. Through these contributions, the paper addresses existing issues in imputation methods, such as the inability to simultaneously capture complex relationships between features and samples, and the difficulty in obtaining accurate sample relationships in cases of high missing data rates.