Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

Anton Tsitsulin,Marina Munkhoeva,Bryan Perozzi
DOI: https://doi.org/10.1145/3366423.3380026
2020-03-03
Abstract:Graph comparison is a fundamental operation in data mining and information retrieval. Due to the combinatorial nature of graphs, it is hard to balance the expressiveness of the similarity measure and its scalability. Spectral analysis provides quintessential tools for studying the multi-scale structure of graphs and is a well-suited foundation for reasoning about differences between graphs. However, computing full spectrum of large graphs is computationally prohibitive; thus, spectral graph comparison methods often rely on rough approximation techniques with weak error guarantees. In this work, we propose SLaQ, an efficient and effective approximation technique for computing spectral distances between graphs with billions of nodes and edges. We derive the corresponding error bounds and demonstrate that accurate computation is possible in time linear in the number of graph edges. In a thorough experimental evaluation, we show that SLaQ outperforms existing methods, oftentimes by several orders of magnitude in approximation accuracy, and maintains comparable performance, allowing to compare million-scale graphs in a matter of minutes on a single machine.
Social and Information Networks,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to calculate spectral distances efficiently and accurately in large - scale graphs (such as network - scale graphs). Specifically, the authors propose a method named SLaQ, aiming to overcome the problems of high computational complexity and low approximation accuracy encountered by existing methods when calculating the spectral distances of large - scale graphs. Although traditional spectral analysis methods can provide multi - scale information about graph structures, due to the high computational cost of calculating the entire graph spectrum, these methods usually rely on rough approximation techniques, which often have weak error guarantees. By using the Stochastic Lanczos Quadrature (SLQ), SLaQ achieves accurate calculation of spectral distances within linear time and provides corresponding error bounds. This makes SLaQ not only far superior to existing methods in approximation accuracy, but also able to quickly process graphs containing billions of nodes and edges on a single machine.