Abstract:For the degree corrected stochastic block model in the presence of arbitrary or even adversarial outliers, we develop a convex-optimization-based clustering algorithm that includes a penalization term depending on the positive deviation of a node from the expected number of edges to other inliers. We prove that under mild conditions, this method achieves exact recovery of the underlying clusters. Our synthetic experiments show that our algorithm performs well on heterogeneous networks, and in particular those with Pareto degree distributions, for which outliers have a broad range of possible degrees that may enhance their adversarial power. We also demonstrate that our method allows for recovery with significantly lower error rates compared to existing algorithms.

What problem does this paper attempt to address?

This paper attempts to solve the problem of clustering the Degree - Corrected Stochastic Block Model (DCSBM) with heterogeneous degree distributions in the presence of outliers. Specifically, the paper focuses on how to identify and group nodes in complex networks, especially ensuring the accuracy of clustering results in the presence of arbitrary or even adversarial outliers in the network. ### Main problem description 1. **Heterogeneous degree distribution**: In real - world networks, the degree distributions of nodes are usually heterogeneous, that is, the number of connections between different nodes varies greatly. For example, in social networks, some users may have thousands of followers, while others may have only a few. This heterogeneity poses challenges to clustering algorithms. 2. **Existence of outliers**: There may be some nodes in the network that do not belong to any cluster (outliers), and the connection patterns of these nodes may be arbitrary or even deliberately designed to confuse clustering algorithms. These outliers may significantly affect the quality of clustering results. 3. **Accurate recovery of clustering structure**: The goal of the paper is to develop an algorithm that can accurately recover the real clustering structure in the network under the above challenges and provide theoretical guarantees. ### Solution To address these problems, the paper proposes a clustering algorithm based on convex optimization. The algorithm effectively deals with outliers by introducing a regularization term to penalize nodes that deviate from the expected connection patterns. Specifically: - **Convex optimization framework**: The algorithm is based on the semidefinite programming (SDP) relaxation modulus maximization method. - **Regularization term**: A regularization term \(\alpha\cdot\text{diag}(d^*)\) that depends on the node degrees is introduced, where \(d^*_i=\max(d_i, H^+)\), and \(H^+\) is the maximum of the expected number of connections of nodes. This regularization term can effectively penalize nodes that exhibit abnormal connection patterns. ### Theoretical guarantees The paper provides a strict theoretical analysis and proves that under certain conditions, the algorithm can accurately recover the real clustering structure in the network with high probability. The key conditions include: - **Density gap**: The gap between the density of intra - cluster edges and the density of inter - cluster edges must be large enough. - **Parameter selection**: The regularization parameter \(\alpha\) needs to be large enough. Specifically, \(\alpha\geq c_1\frac{m}{H^-}\), where \(H^-\) is the minimum of the expected number of connections of intra - cluster nodes. ### Experimental verification Through synthetic data experiments, the paper shows that the algorithm has better performance compared to existing algorithms when dealing with networks with heterogeneous degree distributions and a large number of outliers. The experimental results indicate that even when the network is very sparse or highly heterogeneous, the algorithm can still maintain high clustering accuracy. In summary, this paper aims to solve the problem of accurately clustering the stochastic block model with heterogeneous degree distributions in the presence of outliers, and verifies the effectiveness of the proposed algorithm through theoretical analysis and experimental verification.

Clustering Degree-Corrected Stochastic Block Model with Outliers

Strong Consistency of Spectral Clustering for the Sparse Degree-Corrected Hypergraph Stochastic Block Model

A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers

Optimal and exact recovery on general non-uniform Hypergraph Stochastic Block Model

Regularized Stochastic Block Model for robust community detection in complex networks

Exact Recovery and Bregman Hard Clustering of Node-Attributed Stochastic Block Model

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Finding Outliers in Gaussian Model-based Clustering

Determining the Number of Communities in Degree-corrected Stochastic Block Models.

Estimating Stochastic Block Models in the Presence of Covariates

Recovering Unbalanced Communities in the Stochastic Block Model With Application to Clustering with a Faulty Oracle

Fundamental Limits of Spectral Clustering in Stochastic Block Models

On Saving Outliers for Better Clustering over Noisy Data.

Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit

Strong Consistency, Graph Laplacians, and the Stochastic Block Model

Spectral clustering on spherical coordinates under the degree-corrected stochastic blockmodel

Strong Consistency of Spectral Clustering for Stochastic Block Models.

Outliers Detection Is Not So Hard: Approximation Algorithms for Robust Clustering Problems Using Local Search Techniques

Stochastic Block Model and Community Detection in the Sparse Graphs: A spectral algorithm with optimal rate of recovery

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

On the Optimal Error Rate of Stochastic Block Model with Symmetric Side Information.