The All-Paths and Cycles Graph Kernel

P.-L. Giscard,R. C. Wilson
DOI: https://doi.org/10.48550/arXiv.1708.01410
2017-08-04
Abstract:With the recent rise in the amount of structured data available, there has been considerable interest in methods for machine learning with graphs. Many of these approaches have been kernel methods, which focus on measuring the similarity between graphs. These generally involving measuring the similarity of structural elements such as walks or paths. Borgwardt and Kriegel proposed the all-paths kernel but emphasized that it is NP-hard to compute and infeasible in practice, favouring instead the shortest-path kernel. In this paper, we introduce a new algorithm for computing the all-paths kernel which is very efficient and enrich it further by including the simple cycles as well. We demonstrate how it is feasible even on large datasets to compute all the paths and simple cycles up to a moderate length. We show how to count labelled paths/simple cycles between vertices of a graph and evaluate a labelled path and simple cycles kernel. Extensive evaluations on a variety of graph datasets demonstrate that the all-paths and cycles kernel has superior performance to the shortest-path kernel and state-of-the-art performance overall.
Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the machine - learning problems on graph - structured data, especially how to efficiently calculate the kernel functions of all - paths and simple cycles. Specifically: 1. **Background problems**: - With the increase in the amount of structured data, machine - learning methods on graph data have received extensive attention. - Many methods are based on kernel methods, which perform classification or clustering tasks by measuring the similarity between graphs. - The previously proposed "all - paths kernel" is difficult to be practically applied due to its high computational complexity (NP - hard), so the shortest - path kernel is usually used as an alternative. 2. **Research objectives**: - Propose a new algorithm to efficiently calculate the kernel functions of all - paths and simple cycles. - Prove that this new method is not only superior to the shortest - path kernel in performance but also feasible on large - scale data sets. 3. **Specific problems**: - How to efficiently calculate the number of all - paths and simple cycles in a graph. - How to incorporate label information (such as node labels) into path counting to improve the expressiveness of the kernel function. - How to ensure that the performance of the kernel function reaches or exceeds existing methods (such as the shortest - path kernel, graph sub - structure kernel, etc.) while maintaining computational efficiency. ### Main contributions of the paper - **Proposed an efficient path and cycle counting algorithm**: Through the latest results in algebraic combinatorics, a method that can efficiently calculate all - paths and simple cycles in actual graph networks is proposed. - **Introduced a counting method for labeled paths**: By cleverly encoding node labels, the path - counting algorithm can handle labeled paths without increasing the computational burden. - **Experimental verification of the effectiveness of the new method**: Extensive tests were carried out on multiple standard graph data sets, and the results show that the newly proposed "All - Paths and Cycles Kernel" (APC kernel) is not only significantly superior to the shortest - path kernel but also competitive in terms of computation time. ### Formula summary - **All - Paths Kernel (AP Kernel)**: \[ K_{AP}(G, H)=\sum_{p_{i}\in P(G)}\sum_{p_{j}\in P(H)}K_{B}(p_{i}, p_{j}) \] where \(P(G)\) is the set of all paths in graph \(G\), and \(K_{B}(.,.)\) is the basic kernel function between paths (usually the delta function kernel). - **All - Paths and Cycles Kernel (APC Kernel)**: \[ K_{APC}(G, H)=\sum_{\gamma_{i}\in PC(G)}\sum_{\gamma_{j}\in PC(H)}K_{B}(\gamma_{i}, \gamma_{j}) \] where \(PC(G)\) is the set of all paths and simple cycles in graph \(G\). - **Path and cycle counting formulas**: \[ P_{uv}(l)=(- 1)^{l + 1}\sum_{H\prec\text{conn}G}|H|\leq l + 1, u, v\in H\left(\frac{|N(H)|}{l + 1-|H|}\right)(-1)^{|H|}(A^{l}_{H})_{uv} \] \[ P_{uu}(l)=(-1)^{l}\sum_{H\prec\text{conn}G}|H|\leq l, u\in H\left(\frac{|N(H)|}{l-|H|}\right)(-1)^{|H|}(A^{l}_{H})_{uu} \] Through these improvements,