Abstract:Local clustering aims to find a compact cluster near the given starting instances. This work focuses on graph local clustering, which has broad applications beyond graphs because of the internal connectivities within various modalities. While most existing studies on local graph clustering adopt the discrete graph setting (i.e., unweighted graphs without self-loops), real-world graphs can be more complex. In this paper, we extend the non-approximating Andersen-Chung-Lang ("ACL") algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of graphs, including weighted, directed, and self-looped graphs and hypergraphs. Specifically, leveraging PageRank, we propose two algorithms: GeneralACL for graphs and HyperACL for hypergraphs. We theoretically prove that, under two mild conditions, both algorithms can identify a quadratically optimal local cluster in terms of conductance with at least 1/2 probability. On the property of hypergraphs, we address a fundamental gap in the literature by defining conductance for hypergraphs from the perspective of hypergraph random walks. Additionally, we provide experiments to validate our theoretical findings.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to extend the classic PageRank - based local clustering algorithm (Andersen - Chung - Lang, ACL) to be applicable to more complex graph structures, including weighted directed graphs, self - loop graphs, and hypergraphs. Specifically:
1. **Existing Limitations**: Most of the existing local graph clustering research focuses on discrete graph settings, that is, graphs without weights, undirected, and without self - loops. However, graphs in the real world are usually more complex and may contain characteristics such as edge weights, node weights, directionality, and self - loops.
2. **Problem Definition**: In order to better model and analyze the local clustering problems in these complex graph structures, the paper proposes two new algorithms - GeneralACL (for ordinary graphs) and HyperACL (for hypergraphs), and proves that they can find quadratically optimal local clusters under certain conditions.
3. **Main Contributions**:
- **Extension of ACL Algorithm**: The ACL algorithm is extended from simple undirected unweighted graphs to weighted, directed, self - loop - containing graphs, and further generalized to hypergraphs.
- **Quadratic Optimality**: By introducing the concepts of personalized PageRank and random walk, the paper proves that the new algorithms can find local clusters with quadratically optimal conductance with at least a 50% probability.
- **Definition of Hypergraph Conductance**: For hypergraphs, the paper defines for the first time the conductance based on hypergraph random walk and proves its effectiveness.
4. **Theoretical and Experimental Verification**: The paper not only provides strict theoretical proofs but also verifies the performance of the proposed algorithms on actual data sets through experiments, showing their advantages in dealing with complex graph structures.
### Formula Summary
- **Conductance Formula**:
\[
\Phi_G(S)=\frac{\sum_{u \in S, v \in \bar{S}} \phi(u) P_{u, v}}{\min\left(\sum_{v \in S} \phi(u), 1-\sum_{u \in S} \phi(u)\right)}
\]
\[
\Phi_H(S)=\frac{\sum_{u \in S, v \in \bar{S}} \phi(u) P_{u, v}}{\min\left(\sum_{v \in S} \phi(u), 1-\sum_{u \in S} \phi(u)\right)}
\]
- **Lazy Personalized PageRank Vector**:
\[
\mathrm{pr}(\alpha, s)=\alpha s+(1 - \alpha) \mathrm{pr}(\alpha, s) M
\]
where \(M = \frac{1}{2}(I + P)\), \(s\) is the random vector of the random walk, and \(\alpha\) is the restart probability.
Through these extensions and improvements, the paper provides new tools and methods for dealing with local clustering problems in complex graph structures, filling the gaps in existing research.