Abstract:Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at <a class="link-external link-https" href="https://github.com/liguanlue/GLPN" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to improve the existing "draft - then - refine" paradigm from the perspective of Graph Dirichlet Energy when dealing with missing data. Specifically, the author observes that simple "draft" steps (such as mean imputation or KNN imputation) will lead to a significant reduction in Graph Dirichlet Energy, thus requiring a "refine" step that can restore the overall energy balance. However, existing refinement techniques (such as Graph Convolutional Network, GCN) tend to further reduce Dirichlet Energy, resulting in overly smoothed final imputation results and affecting the imputation effect. To solve this problem, the author proposes a new framework - Graph Laplacian Pyramid Network (GLPN). GLPN combines U - shaped auto - encoders and residual networks to effectively capture global and local details, thereby maintaining the stability of Graph Dirichlet Energy while imputing missing data. Through extensive experiments on multiple real - world datasets, GLPN shows better performance than existing methods under three different missing - data mechanisms (Missing Completely at Random, MCAR; Missing at Random, MAR; Missing Not at Random, MNAR). ### Formula Summary 1. **Definition of Graph Dirichlet Energy**: \[ E_D(\mathbf{X})=\text{tr}(\mathbf{X}^T\tilde{\Delta}\mathbf{X}) = \frac{1}{2}\sum_{i,j = 1}^{n}A_{ij}\left\|\mathbf{X}_{i,:}\sqrt{1 + D_{ii}}-\mathbf{X}_{j,:}\sqrt{1 + D_{jj}}\right\|^2 \] where $\tilde{\Delta}=I_n-\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}$ is the augmented normalized Laplacian matrix, and $\tilde{A}=A + I_n$ and $\tilde{D}=D + I_n$ are the adjacency matrix and degree matrix including self - loop connections respectively. 2. **Output formula of GLPN**: \[ \hat{\mathbf{X}}=P_l\mathbf{X}_d+\alpha S S^T\mathbf{X}_d \] where $\mathbf{X}_d$ is the preliminarily imputed feature matrix, $\hat{\mathbf{X}}$ is the refined feature matrix, $S$ is the assignment matrix, and $P_l = I+\tilde{\Delta}$ is the high - pass filter from the residual network. 3. **Energy - preservation analysis**: \[ (1 + C_{\min})^2E_D(\mathbf{X}_d)\leq E_D(\hat{\mathbf{X}}) \] where $C_{\min}$ is the minimum eigenvalue of the matrix $\tilde{\Delta}+\alpha S S^T$. ### Main Contributions 1. Analyzed the existing "draft - refine" imputation methods from the perspective of Graph Dirichlet Energy and revealed their shortcomings. 2. Proposed the GLPN framework, which combines U - shaped auto - encoders and residual networks to maintain the graph energy and improve the imputation performance. 3. Conducted extensive experiments under multiple datasets and missing mechanisms to verify the effectiveness and robustness of GLPN. In summary, this paper aims to propose a new imputation framework GLPN from the perspective of Graph Dirichlet Energy to solve the energy - loss problem existing in existing methods when imputing missing data, thereby improving the imputation effect.

Data Imputation from the Perspective of Graph Dirichlet Energy

Data Imputation with Iterative Graph Reconstruction

A Bipartite Graph Based Method for Traffic Continuous Data Imputation

Missing data imputation with adversarially-trained graph convolutional networks

DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

Revisiting Initializing Then Refining: An Incomplete and Missing Graph Imputation Network

GIG: Graph Data Imputation With Graph Differential Dependencies

Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph

Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback

GAGIN: generative adversarial guider imputation network for missing data

Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm

Efficient and effective data imputation with influence functions

Attribute imputation autoencoders for attribute-missing graphs

An Experimental Survey of Missing Data Imputation Algorithms

Bidirectional Spatial-Temporal Traffic Data Imputation via Graph Attention Recurrent Neural Network

DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model

Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers

Improved generative adversarial imputation networks for missing data

Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders

FGATT: A Robust Framework for Wireless Data Imputation Using Fuzzy Graph Attention Networks and Transformer Encoders

Handling Missing Data with Graph Representation Learning