A Sampling-Based Framework for Hypothesis Testing on Large Attributed Graphs

Yun Wang,Chrysanthi Kosyfaki,Sihem Amer-Yahia,Reynold Cheng
DOI: https://doi.org/10.14778/3681954.3681993
IF: 2.5
2024-07-01
Proceedings of the VLDB Endowment
Abstract:Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing on graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses on attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and time-efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m -dimensional random walk that accounts for the paths specified in the hypothesis. We further optimize its time efficiency and propose PHASE opt . Experiments on three real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling methods in terms of accuracy and time efficiency.
computer science, information systems, theory & methods
What problem does this paper attempt to address?