Hypergraphs as Weighted Directed Self-Looped Graphs: Spectral Properties, Clustering, Cheeger Inequality

Zihao Li,Dongqi Fu,Hengyu Liu,Jingrui He
2024-10-23
Abstract:Hypergraphs naturally arise when studying group relations and have been widely used in the field of machine learning. There has not been a unified formulation of hypergraphs, yet the recently proposed edge-dependent vertex weights (EDVW) modeling is one of the most generalized modeling methods of hypergraphs, i.e., most existing hypergraphs can be formulated as EDVW hypergraphs without any information loss to the best of our knowledge. However, the relevant algorithmic developments on EDVW hypergraphs remain nascent: compared to spectral graph theories, the formulations are incomplete, the spectral clustering algorithms are not well-developed, and one result regarding hypergraph Cheeger Inequality is even incorrect. To this end, deriving a unified random walk-based formulation, we propose our definitions of hypergraph Rayleigh Quotient, NCut, boundary/cut, volume, and conductance, which are consistent with the corresponding definitions on graphs. Then, we prove that the normalized hypergraph Laplacian is associated with the NCut value, which inspires our HyperClus-G algorithm for spectral clustering on EDVW hypergraphs. Finally, we prove that HyperClus-G can always find an approximately linearly optimal partitioning in terms of Both NCut and conductance. Additionally, we provide extensive experiments to validate our theoretical findings from an empirical perspective.
Social and Information Networks,Discrete Mathematics,Data Structures and Algorithms,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the incompleteness of spectral theory on edge - dependent vertex - weighted (EDVW) hypergraphs and the immaturity of spectral clustering algorithms. Specifically, the objectives of the paper include: 1. **Develop the spectral theory of EDVW hypergraphs**: The existing spectral graph theory has deficiencies when dealing with EDVW hypergraphs, especially in the definitions of normalized cut (NCut), boundary/cut, volume, and conductance, which are not perfect. The paper fills these gaps by introducing new definitions. 2. **Propose the spectral clustering algorithm HyperClus - G**: Based on the newly proposed definitions, the paper develops a new spectral clustering algorithm HyperClus - G, which can effectively perform global partitioning on EDVW hypergraphs and has approximately linear optimality in terms of normalized cut (NCut) and conductance. 3. **Prove the hypergraph Cheeger inequality**: The paper gives the complete proof of the Cheeger inequality for EDVW hypergraphs for the first time and corrects the wrong results in previous literature. ### Main contributions 1. **Algebraic connection**: - **Theorem 1**: Establish the algebraic connection between the normalized cut (NCut), Rayleigh quotient, and Laplacian matrix of hypergraphs. Specifically, for any EDVW hypergraph \( H=(V, E, \omega, \gamma) \), define the normalized cut \( \text{NCut}(S, \overline{S}) \), volume \( \text{vol}(S) \), Rayleigh quotient \( R(x) \), Laplacian matrix \( L \) and stationary distribution matrix \( \Pi \). For any vertex set \( S\subseteq V \), define a \( |V|\)-dimensional vector \( x \): \[ x(u)=\sqrt{\frac{\text{vol}(\overline{S})}{\text{vol}(S)}}, \quad \forall u\in S, \] \[ x(\overline{u}) = -\sqrt{\frac{\text{vol}(S)}{\text{vol}(\overline{S})}}, \quad \forall \overline{u}\in \overline{S}. \] Then: \[ \text{NCut}(S, \overline{S})=\frac{1}{2}R(x)=\frac{x^{T}Lx}{x^{T}\Pi x}. \] 2. **Spectral clustering algorithm**: - **Theorem 2**: Propose a spectral clustering algorithm HyperClus - G suitable for EDVW hypergraphs, which always returns approximately linearly optimal clustering results in terms of normalized cut and conductance. Specifically, the normalized cut \( N \) of the returned clustering and the optimal normalized cut \( N^{*} \) satisfy \( N\leq O(N^{*}) \). 3. **Hypergraph Cheeger inequality**: - **Theorem 3**: Give the complete proof of the Cheeger inequality for EDVW hypergraphs for the first time. Specifically, for any EDVW hypergraph \( H = (V, E, \omega, \gamma) \), define the conductance \( \Phi(H) \) and the second smallest eigenvalue \( \lambda \) of the normalized Laplacian matrix \( \Pi^{- 1/2}L\Pi^{-1/2} \), then: \[ \frac{\Phi(H)^{2}}{2}\leq\lambda\leq2\Phi(H). \] ### Technical overview The paper first re - analyzes the random walk on EDVW hypergraphs, then proposes the HyperClus - G algorithm for hypergraph partitioning. Finally, it proves the approximation of the normalized cut and the upper bounds of the normalized cut and conductance. ### Experimental verification The paper also...