Abstract:We prove strong consistency of spectral clustering under the degree-corrected hypergraph stochastic block model in the sparse regime where the maximum expected hyperdegree is as small as $\Omega(\log n)$ with $n$ denoting the number of nodes. We show that the basic spectral clustering without preprocessing or postprocessing is strongly consistent in an even wider range of the model parameters, in contrast to previous studies that either trim high-degree nodes or perform local refinement. At the heart of our analysis is the entry-wise eigenvector perturbation bound derived by the leave-one-out technique. To the best of our knowledge, this is the first entry-wise error bound for degree-corrected hypergraph models, resulting in the strong consistency for clustering non-uniform hypergraphs with heterogeneous hyperdegrees.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: under the sparse degree - corrected hypergraph stochastic block model (DCHSBM), prove that the basic spectral clustering algorithm can achieve strong consistency. Specifically, the paper focuses on how to ensure that the spectral clustering algorithm can accurately recover the community structure in the hypergraph when the maximum expected hyper - degree is only $\Omega(\log n)$. ### Core problems of the paper 1. **Research background**: - Community detection is a core problem in modern network science. The goal is to divide the nodes in the network into several communities so that the nodes within the same community are more similar than those in different communities. - The stochastic block model (SBM) is a commonly used generative model for describing random graphs with community structures. However, the standard SBM cannot well describe networks with degree heterogeneity in reality. - To solve this problem, the degree - corrected stochastic block model (DCSBM) was proposed, which allows nodes to have different degree distributions. 2. **Research objectives**: - In the sparse hypergraph environment, verify whether the spectral clustering algorithm can achieve strong consistency under DCHSBM. - Specifically, the goal of the paper is to prove that under certain conditions, the basic spectral clustering algorithm based on the weighted adjacency matrix can correctly recover the community membership of all nodes with high probability. 3. **Main challenges**: - The edges in the hypergraph can connect any number of nodes, which leads to higher complexity. - In sparse hypergraphs, the maximum expected hyper - degree of nodes may be only $\Omega(\log n)$, which increases the difficulty of analysis. - A fine - grained term - by - term analysis of eigenvector perturbations is required to ensure the error bound at the node level. --- ### Solutions The paper solves the above problems through the following methods: 1. **Model definition**: - Define the degree - corrected hypergraph stochastic block model (DCHSBM), where each node has a parameter $\theta_i>0$ that controls its expected degree. - Introduce the weighted adjacency matrix $A$ and its expected matrix $P = \mathbb{E}[A]$, and analyze the deviation between them. 2. **Theoretical results**: - Propose a tight upper bound on $\|A - P\|$. Use the combinatorial technique (Kahn - Szemerédi method) to prove that $\|A - P\| \leq C\sqrt{d}$, where $d$ is the upper bound of the node hyper - degree. - Use the "leave - one - out" technique to obtain the two - to - infinity norm bounds of eigenvector perturbations ($\| \hat{U}\hat{O} - U \|_{2,\infty}$), which is the first term - by - term error bound for non - uniform hypergraph models. 3. **Algorithm performance**: - Prove that two spectral clustering algorithms (the k - means - based and the threshold - based methods) can achieve strong consistency under appropriate conditions. - Give specific conditions, for example: $$ \frac{\gamma K^{3/2} \sqrt{d \log n}}{|\lambda_K|} \leq C_1 $$ or $$ \frac{\gamma \sqrt{d \log n}}{|\lambda_K|} \leq C_2 $$ where $\gamma$ measures the heterogeneity of node degrees, and $|\lambda_K|$ is the absolute value of the smallest non - zero eigenvalue. 4. **Special case analysis**: - For the $m$-uniform hypergraph planted partition model (m - uniform HPPM), give clearer conditions, for example: $$ \alpha_m \geq C_1 K^{2m + 1} \frac{\log n}{n^{m - 1}}

Strong Consistency of Spectral Clustering for the Sparse Degree-Corrected Hypergraph Stochastic Block Model

Strong Consistency of Spectral Clustering for Stochastic Block Models.

Strong Consistency, Graph Laplacians, and the Stochastic Block Model

Strong Consistency Guarantees for Clustering High-Dimensional Bipartite Graphs with the Spectral Method

Fundamental Limits of Spectral Clustering in Stochastic Block Models

Clustering Degree-Corrected Stochastic Block Model with Outliers

On consistency of constrained spectral clustering under representation-aware stochastic block model

Consistency of community detection in networks under degree-corrected stochastic block models

A Stochastic Block Hypergraph model

Analysis of spectral clustering algorithms for community detection: the general bipartite setting

Spectral clustering on spherical coordinates under the degree-corrected stochastic blockmodel

Optimal and exact recovery on general non-uniform Hypergraph Stochastic Block Model

Sparse random hypergraphs: Non-backtracking spectra and community detection

Consistency of Graphical Model-based Clustering: Robust Clustering using Bayesian Spanning Forest

Hypergraphs with Edge-Dependent Vertex Weights: Spectral Clustering based on the 1-Laplacian

Degree corrected stochastic block model: excursion representation

A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Community detection in sparse latent space models

Stochastic Block Model and Community Detection in the Sparse Graphs: A spectral algorithm with optimal rate of recovery

Optimal Network Membership Estimation under Severe Degree Heterogeneity