Abstract:Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at <a class="link-external link-https" href="https://github.com/liguanlue/GLPN" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to improve the existing "draft - then - refine" paradigm from the perspective of Graph Dirichlet Energy when dealing with missing data. Specifically, the author observes that simple "draft" steps (such as mean imputation or KNN imputation) will lead to a significant reduction in Graph Dirichlet Energy, thus requiring a "refine" step that can restore the overall energy balance. However, existing refinement techniques (such as Graph Convolutional Network, GCN) tend to further reduce Dirichlet Energy, resulting in overly smoothed final imputation results and affecting the imputation effect.
To solve this problem, the author proposes a new framework - Graph Laplacian Pyramid Network (GLPN). GLPN combines U - shaped auto - encoders and residual networks to effectively capture global and local details, thereby maintaining the stability of Graph Dirichlet Energy while imputing missing data. Through extensive experiments on multiple real - world datasets, GLPN shows better performance than existing methods under three different missing - data mechanisms (Missing Completely at Random, MCAR; Missing at Random, MAR; Missing Not at Random, MNAR).
### Formula Summary
1. **Definition of Graph Dirichlet Energy**:
\[
E_D(\mathbf{X})=\text{tr}(\mathbf{X}^T\tilde{\Delta}\mathbf{X}) = \frac{1}{2}\sum_{i,j = 1}^{n}A_{ij}\left\|\mathbf{X}_{i,:}\sqrt{1 + D_{ii}}-\mathbf{X}_{j,:}\sqrt{1 + D_{jj}}\right\|^2
\]
where $\tilde{\Delta}=I_n-\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}$ is the augmented normalized Laplacian matrix, and $\tilde{A}=A + I_n$ and $\tilde{D}=D + I_n$ are the adjacency matrix and degree matrix including self - loop connections respectively.
2. **Output formula of GLPN**:
\[
\hat{\mathbf{X}}=P_l\mathbf{X}_d+\alpha S S^T\mathbf{X}_d
\]
where $\mathbf{X}_d$ is the preliminarily imputed feature matrix, $\hat{\mathbf{X}}$ is the refined feature matrix, $S$ is the assignment matrix, and $P_l = I+\tilde{\Delta}$ is the high - pass filter from the residual network.
3. **Energy - preservation analysis**:
\[
(1 + C_{\min})^2E_D(\mathbf{X}_d)\leq E_D(\hat{\mathbf{X}})
\]
where $C_{\min}$ is the minimum eigenvalue of the matrix $\tilde{\Delta}+\alpha S S^T$.
### Main Contributions
1. Analyzed the existing "draft - refine" imputation methods from the perspective of Graph Dirichlet Energy and revealed their shortcomings.
2. Proposed the GLPN framework, which combines U - shaped auto - encoders and residual networks to maintain the graph energy and improve the imputation performance.
3. Conducted extensive experiments under multiple datasets and missing mechanisms to verify the effectiveness and robustness of GLPN.
In summary, this paper aims to propose a new imputation framework GLPN from the perspective of Graph Dirichlet Energy to solve the energy - loss problem existing in existing methods when imputing missing data, thereby improving the imputation effect.