Abstract:Recent spectral clustering methods are a propular and powerful technique for data clustering. These methods need to solve the eigenproblem whose computational complexity is $O(n^3)$, where $n$ is the number of data samples. In this paper, a non-eigenproblem based clustering method is proposed to deal with the clustering problem. Its performance is comparable to the spectral clustering algorithms but it is more efficient with computational complexity $O(n^2)$. We show that with a transitive distance and an observed property, called K-means duality, our algorithm can be used to handle data sets with complex cluster shapes, multi-scale clusters, and noise. Moreover, no parameters except the number of clusters need to be set in our algorithm.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the efficiency and performance issues of existing clustering algorithms when dealing with datasets of complex shapes, multiple scales and containing noise. Specifically: 1. **High computational complexity**: Existing spectral clustering methods need to solve eigenvalue problems, and their computational complexity is $O(n^3)$. When the dataset is large, the computational cost is too high. 2. **Sensitivity to parameters**: Many clustering algorithms (such as K - means and EM) assume that the data has a certain underlying structure (for example, hyper - ellipsoidal or Gaussian distribution), and need to adjust parameters to obtain good results. 3. **Difficulty in handling clusters of complex shapes**: Traditional methods perform poorly when dealing with clusters of complex shapes or multi - scale clusters. To overcome these problems, the author proposes a clustering method based on transitive distance and K - means duality without eigenvalue problems. The main contributions of this method include: - **Introduction of transitive distance**: By defining the transitive distance, the actual relationship between samples can be better reflected, so that clusters of complex shapes can be more compactly represented in the new space. - **K - means duality**: Using K - means duality, clustering can be directly performed based on the distance matrix without relying on coordinates. - **Low computational complexity**: The computational complexity of the new algorithm is $O(n^2)$, which is more efficient than the $O(n^3)$ of spectral clustering methods. - **No need to adjust parameters**: Except for specifying the number of clusters, no other parameters need to be set, simplifying the use process. In summary, this paper aims to propose an efficient and robust clustering method that can significantly reduce computational complexity while maintaining performance comparable to spectral clustering algorithms, and can handle datasets of complex shapes, multiple scales and containing noise.

Clustering with Transitive Distance and K-Means Duality

Parallel spectral clustering algorithm

Subspace Clustering by Directly Solving Discriminative K-means

K-Means Clustering with Distributed Dimensions.

Distributed Information Theoretic Clustering

Centerless Clustering: An Efficient Variant of K-means Based on K-NN Graph

Spectral Clustering for Discrete Distributions

An Improved K-Means Clustering Algorithm Based on Spectral Method

Unified Spectral Clustering with Optimal Graph

A Novel K '-Means Algorithm For Clustering Analysis

Clustering Stable Instances of Euclidean k-means

Efficient Clustering with Limited Distance Information

Sub-One Quasi-Norm-Based k-Means Clustering Algorithm and Analyses

Clustering algorithm based on symmetry distance with direction constraint

A new distance measurement and its application in K-Means Algorithm

When Do Birds of a Feather Flock Together? K-Means, Proximity, and Conic Programming.

A new Kmeans clustering model and its generalization achieved by joint spectral embedding and rotation

A Tighter Analysis of Spectral Clustering, and Beyond

Multiclass Spectral Clustering Based on Discriminant Analysis

When do birds of a feather flock together?

Clustering by Mining Density Distributions and Splitting Manifold Structure