Abstract:We prove strong consistency of spectral clustering under the degree-corrected hypergraph stochastic block model in the sparse regime where the maximum expected hyperdegree is as small as $\Omega(\log n)$ with $n$ denoting the number of nodes. We show that the basic spectral clustering without preprocessing or postprocessing is strongly consistent in an even wider range of the model parameters, in contrast to previous studies that either trim high-degree nodes or perform local refinement. At the heart of our analysis is the entry-wise eigenvector perturbation bound derived by the leave-one-out technique. To the best of our knowledge, this is the first entry-wise error bound for degree-corrected hypergraph models, resulting in the strong consistency for clustering non-uniform hypergraphs with heterogeneous hyperdegrees.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: under the sparse degree - corrected hypergraph stochastic block model (DCHSBM), prove that the basic spectral clustering algorithm can achieve strong consistency. Specifically, the paper focuses on how to ensure that the spectral clustering algorithm can accurately recover the community structure in the hypergraph when the maximum expected hyper - degree is only $\Omega(\log n)$.
### Core problems of the paper
1. **Research background**:
- Community detection is a core problem in modern network science. The goal is to divide the nodes in the network into several communities so that the nodes within the same community are more similar than those in different communities.
- The stochastic block model (SBM) is a commonly used generative model for describing random graphs with community structures. However, the standard SBM cannot well describe networks with degree heterogeneity in reality.
- To solve this problem, the degree - corrected stochastic block model (DCSBM) was proposed, which allows nodes to have different degree distributions.
2. **Research objectives**:
- In the sparse hypergraph environment, verify whether the spectral clustering algorithm can achieve strong consistency under DCHSBM.
- Specifically, the goal of the paper is to prove that under certain conditions, the basic spectral clustering algorithm based on the weighted adjacency matrix can correctly recover the community membership of all nodes with high probability.
3. **Main challenges**:
- The edges in the hypergraph can connect any number of nodes, which leads to higher complexity.
- In sparse hypergraphs, the maximum expected hyper - degree of nodes may be only $\Omega(\log n)$, which increases the difficulty of analysis.
- A fine - grained term - by - term analysis of eigenvector perturbations is required to ensure the error bound at the node level.
---
### Solutions
The paper solves the above problems through the following methods:
1. **Model definition**:
- Define the degree - corrected hypergraph stochastic block model (DCHSBM), where each node has a parameter $\theta_i>0$ that controls its expected degree.
- Introduce the weighted adjacency matrix $A$ and its expected matrix $P = \mathbb{E}[A]$, and analyze the deviation between them.
2. **Theoretical results**:
- Propose a tight upper bound on $\|A - P\|$. Use the combinatorial technique (Kahn - Szemerédi method) to prove that $\|A - P\| \leq C\sqrt{d}$, where $d$ is the upper bound of the node hyper - degree.
- Use the "leave - one - out" technique to obtain the two - to - infinity norm bounds of eigenvector perturbations ($\| \hat{U}\hat{O} - U \|_{2,\infty}$), which is the first term - by - term error bound for non - uniform hypergraph models.
3. **Algorithm performance**:
- Prove that two spectral clustering algorithms (the k - means - based and the threshold - based methods) can achieve strong consistency under appropriate conditions.
- Give specific conditions, for example:
$$
\frac{\gamma K^{3/2} \sqrt{d \log n}}{|\lambda_K|} \leq C_1
$$
or
$$
\frac{\gamma \sqrt{d \log n}}{|\lambda_K|} \leq C_2
$$
where $\gamma$ measures the heterogeneity of node degrees, and $|\lambda_K|$ is the absolute value of the smallest non - zero eigenvalue.
4. **Special case analysis**:
- For the $m$-uniform hypergraph planted partition model (m - uniform HPPM), give clearer conditions, for example:
$$
\alpha_m \geq C_1 K^{2m + 1} \frac{\log n}{n^{m - 1}}