Abstract:We present the first linear time complexity randomized algorithms for unbiased approximation of the celebrated family of general random walk kernels (RWKs) for sparse graphs. This includes both labelled and unlabelled instances. The previous fastest methods for general RWKs were of cubic time complexity and not applicable to labelled graphs. Our method samples dependent random walks to compute novel graph embeddings in $\mathbb{R}^d$ whose dot product is equal to the true RWK in expectation. It does so without instantiating the direct product graph in memory, meaning we can scale to massive datasets that cannot be stored on a single machine. We derive exponential concentration bounds to prove that our estimator is sharp, and show that the ability to approximate general RWKs (rather than just special cases) unlocks efficient implicit graph kernel learning. Our method is up to $\mathbf{27\times}$ faster than its counterparts for efficient computation on large graphs and scales to graphs $\mathbf{128 \times}$ bigger than largest examples amenable to brute-force computation.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the problem of efficiently computing general Random Walk Kernels (RWKs) on sparse graphs. Specifically, the authors propose the first linear-time complexity randomized algorithm for unbiased estimation of RWKs on sparse graphs, including both labeled and unlabeled graph instances. ### Background and Challenges - **Limitations of Existing Methods**: The fastest previous methods for computing general RWKs have a cubic time complexity (O(N^3)) and are not suitable for labeled graphs. These methods face significant computational bottlenecks when handling large-scale datasets. - **Applications of Random Walk Kernels**: RWKs are an important class of graph kernel functions widely used in machine learning tasks such as Gaussian processes, clustering, 3D reconstruction, and linear attention transformers. However, due to their high computational cost, the practical application of graph kernel methods is limited. ### Solution - **Algorithmic Innovation**: The authors propose a new algorithm called Graph Voyagers (GVoys), which computes low-dimensional embeddings of graphs through sampling-dependent random walks. The dot products of these embeddings equal the expected values of the true RWKs. - **Efficiency Improvement**: The time complexity of the GVoys algorithm is linear (O(N)), making it 27 times faster than existing methods and capable of handling graph datasets 128 times larger than those manageable by current methods. - **Theoretical Guarantee**: The authors derive exponential concentration bounds to prove the accuracy of the estimator and demonstrate its scalability on large-scale datasets. ### Main Contributions 1. **Linear Time Complexity**: Achieved the first unbiased estimation of general RWKs on sparse graphs with linear time complexity. 2. **Wide Applicability**: The algorithm is applicable to both unlabeled and labeled graphs. 3. **Efficient Scalability**: Capable of efficient computation on large-scale datasets that cannot be stored on a single machine. 4. **Theoretical Support**: Provided rigorous theoretical analysis proving the accuracy and robustness of the algorithm. ### Conclusion By proposing the GVoys algorithm, this paper addresses the challenge of efficiently computing general RWKs on sparse graphs, opening up new possibilities for the application of graph kernel methods on large-scale datasets.

Optimal Time Complexity Algorithms for Computing General Random Walk Graph Kernels on Sparse Graphs

General Graph Random Features

Taming graph kernels with random features

Fast Computation of Kemeny's Constant for Directed Graphs

Nearly Linear Time Algorithm for Mean Hitting Times of Random Walks on a Graph

Fast C-K-R Partitions of Sparse Graphs

A general framework for estimating graphlet statistics via random walk

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

Mean Hitting Time for Random Walks on a Class of Sparse Networks

Local Access to Random Walks

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

Random Feature Approximation for Online Nonlinear Graph Topology Identification

Fluctuation of the Largest Eigenvalue of a Kernel Matrix with application in Graphon-based Random Graphs

Sublinear Time Spectral Density Estimation

Weisfeiler-Lehman Graph Kernels

Efficient Approximation of Kemeny's Constant for Large Graphs.

GraphWalker: an I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks.

Frustrated Random Walks: A Fast Method to Compute Node Distances on Hypergraphs

Power-Law Graphs Have Minimal Scaling of Kemeny Constant for Random Walks.

Recycling Randomness with Structure for Sublinear time Kernel Expansions