Clustering by Mining Density Distributions and Splitting Manifold Structure

Zhichang Xu,Zhiguo Long,Hua Meng

2024-08-20

Abstract:Spectral clustering requires the time-consuming decomposition of the Laplacian matrix of the similarity graph, thus limiting its applicability to large datasets. To improve the efficiency of spectral clustering, a top-down approach was recently proposed, which first divides the data into several micro-clusters (granular-balls), then splits these micro-clusters when they are not "compact'', and finally uses these micro-clusters as nodes to construct a similarity graph for more efficient spectral clustering. However, this top-down approach is challenging to adapt to unevenly distributed or structurally complex data. This is because constructing micro-clusters as a rough ball struggles to capture the shape and structure of data in a local range, and the simplistic splitting rule that solely targets ``compactness'' is susceptible to noise and variations in data density and leads to micro-clusters with varying shapes, making it challenging to accurately measure the similarity between them. To resolve these issues, this paper first proposes to start from local structures to obtain micro-clusters, such that the complex structural information inside local neighborhoods is well captured by them. Moreover, by noting that Euclidean distance is more suitable for convex sets, this paper further proposes a data splitting rule that couples local density and data manifold structures, so that the similarities of the obtained micro-clusters can be easily characterized. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering. A series of experiments based on synthetic and real-world datasets demonstrate that the proposed method has better adaptability to structurally complex data than granular-ball based methods.

Machine Learning

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of inefficiency in spectral clustering when dealing with large - scale data sets, especially the limitations of existing methods when facing data with complex structures or uneven distributions. Specifically: 1. **High computational complexity of spectral clustering**: Traditional spectral clustering requires spectral decomposition of the Laplacian matrix of the similarity graph, and its time complexity is \(O(n^3)\), which makes it very time - consuming when dealing with large - scale data sets. 2. **Deficiencies in the "Granular - Balls" - based method**: A recently proposed top - down method performs more efficient spectral clustering by dividing data into multiple micro - clusters (i.e., "Granular - Balls") and then constructing a similarity graph based on these micro - clusters. However, this method has poor performance when dealing with data with complex structures or uneven densities, mainly because: - Constructing "Granular - Balls" as rough spheres is difficult to capture the data shape and structure within the local range. - Simple splitting rules only target "compactness" and are easily affected by noise and changes in data density, resulting in micro - clusters with different shapes and making it difficult to accurately measure the similarity between them. To overcome these problems, this paper proposes a new method to improve the existing spectral clustering algorithm in the following ways: - **Obtaining micro - clusters from local structures**: Ensure that complex structural information is well captured within the local range. - **Introducing splitting rules that combine local density and data manifold structure**: Make the similarity between micro - clusters easier to characterize. - **Proposing a new method for measuring similarity between micro - clusters**: For the final spectral clustering. Through a series of experiments, this method shows better adaptability and performance on synthetic and real - world data sets, especially when dealing with data with complex structures.

Clustering by Mining Density Distributions and Splitting Manifold Structure

Spectral Clustering on Multiple Manifolds

Locally discriminative spectral clustering with composite manifold

A Convex Formulation for Spectral Shrunk Clustering

A Robust Density-Based Clustering Algorithm for Multi-Manifold Structure.

Spectral Clustering with Smooth Tiny Clusters

Spectral Clustering for Discrete Distributions

A Spectral Clustering Method Combining Path with Density

A Clustering Method Based on Multi-Positive–negative Granularity and Attenuation-Diffusion Pattern

Clustering based on local density peaks and graph cut

Local and Structural Consistency for Multi-Manifold Clustering.

Survey of Spectral Clustering Based on Graph Theory

Multiclass Spectral Clustering Based on Discriminant Analysis

Spectral clustering with linear embedding: A discrete clustering method for large-scale data

Spectral Clustering on Large Datasets: When Does it Work? Theory from Continuous Clustering and Density Cheeger-Buser

Improved manifold clustering algorithm based on density peaks search

Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity

Discrete and Balanced Spectral Clustering with Scalability

Clustering Method Based on Structural Similarity and Compressive Transformation

A Spectral Coarse Graining Algorithm Based on Relative Distance

Unified Spectral Clustering with Optimal Graph