Abstract:A fundamental challenge confronting supervised graph outlier detection algorithms is the prevalent problem of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection via latent Diffusion Models. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at PyPI: <a class="link-external link-https" href="https://pypi.org/project/godm/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the class imbalance problem in supervised graph outlier detection. Specifically: - **Class Imbalance Problem**: In supervised graph outlier detection, the number of normal instances (inliers) is much larger than that of abnormal instances (outliers). This imbalance causes the model to be biased towards normal instances during training, thus reducing the detection performance for abnormal instances. For example, in the DGraph dataset, the ratio of positive to negative samples is only 1:85, which reflects the extreme ratio in scenarios such as financial fraud detection in the real world. - **Limitations of Existing Methods**: - **Upsampling and Downsampling**: These methods relieve the imbalance problem by replicating the minority class or reducing the majority class respectively, but there are risks of over - fitting or losing valuable training data. - **Instance Reweighting in the Loss Function**: Adjusting the loss function by giving abnormal instances greater weights, but like upsampling and downsampling, problems still exist. To solve these problems, the paper introduces a data augmentation method based on latent diffusion models (LDM) - GODM (Graph Outlier Detection via Latent Diffusion Models). This method aims to generate synthetic abnormal instances to balance the class distribution in the training data, thereby improving the performance of graph outlier detectors. ### Main Contributions of GODM 1. **Generate High - Quality Synthetic Data**: Perform data augmentation in the latent space through the diffusion model to generate realistic abnormal nodes. 2. **Handle Heterogeneous Graph Data**: In view of the complexity and heterogeneity of graph data, propose a variational encoder to map different types of graph information to a unified latent space. 3. **Improve Computational Efficiency**: Adopt negative sampling and graph clustering techniques to reduce computational costs, enabling GODM to run efficiently on large - scale graph data. 4. **Conditional Generation**: Only generate abnormal nodes to ensure that the generated data meets the requirements of the task. ### Experimental Results The paper verifies the effectiveness and efficiency of GODM through experiments on multiple datasets, especially showing excellent performance in metrics such as AUC, AP, and Recall, significantly improving the performance of graph outlier detection. In conclusion, by introducing GODM, this paper provides an effective method to alleviate the class imbalance problem in supervised graph outlier detection, thereby improving the detection performance.

Data Augmentation for Supervised Graph Outlier Detection via Latent Diffusion Models

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation from Scratch

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation

DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

Unsupervised Graph Outlier Detection: Problem Revisit, New Insight, and Superior Method

Ensemble Data Augmentation for Imbalanced Fault Diagnosis.

GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection

Graphusion: Latent Diffusion for Graph Generation

PyGOD: A Python Library for Graph Outlier Detection

DAGAD: Data Augmentation for Graph Anomaly Detection

Toward Understanding Generative Data Augmentation

Generative adversarial nets for unsupervised outlier detection

LatentAugment: Data Augmentation via Guided Manipulation of GAN's Latent Space

Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Model

A Novel Data Augmentation Method Based on Denoising Diffusion Probabilistic Model for Fault Diagnosis Under Imbalanced Data

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark