Abstract:The development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Markov clustering (MCL) is a renowned algorithm that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data, but it has never been generalized to deal with data nonlinearity. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here we propose MC-MCL, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic data with underlying nonlinear patterns. MC-MCL is compared with baseline clustering methods, including DBSCAN, K-means and affinity propagation. We find that Minimum Curvilinearity provides a valuable framework to estimate nonlinear distances also when its kernel is applied in combination with MCL. Indeed, MC-MCL overcomes classical MCL and even baseline clustering algorithms in different nonlinear datasets.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the limitations of the existing Markov Clustering (MCL) algorithm when dealing with non - linear data. Specifically, although the traditional MCL algorithm can detect the structural organization in data by simulating the random flow of sample similarity in the network, it has never been generalized to deal with the non - linear characteristics of data. To solve this problem, the author introduced the Minimum Curvilinearity (MC) principle and combined it with MCL to propose the MC - MCL algorithm. ### Main problems: 1. **Non - linear data processing**: The traditional MCL algorithm cannot effectively process data with non - linear patterns. 2. **Improving clustering performance**: A new method is required to enhance the performance of MCL on real and synthetic datasets, especially when the data presents non - linear patterns. ### Solutions: - **Minimum Curvilinearity (MC) principle**: The MC principle calculates the curvilinear distances of samples in the high - dimensional feature space, which are calculated through the lateral paths on the Minimum Spanning Tree (MST). These curvilinear distances are stored in a kernel function. - **MC - MCL algorithm**: Applying the MC principle to the MCL algorithm creates a non - linear kernel - extended version of MCL, called MC - MCL. This algorithm uses MC to enhance the performance of MCL on non - linear data. ### Experimental verification: To verify the effectiveness of MC - MCL, the author compared it with baseline clustering methods such as DBSCAN, K - means and Affinity Propagation. The experimental results show that MC - MCL not only outperforms the classical MCL on multiple non - linear datasets, but also outperforms other landmark clustering algorithms in the general evaluation framework. ### Formula representation: - **MC distance matrix**: \[ D_{MC} \] - **Sparse MC similarity kernel**: \[ f(x)=\max[0,(1 - x - t)] \] where \( x \) is the original MC distance and \( t \) is the automatically detected threshold. - **Converting Euclidean distance to similarity**: \[ f(x)=\max\left[0,\left(1-\frac{x}{\max(x)}-t\right)\right] \] Through these improvements, MC - MCL can provide more accurate and effective clustering results when dealing with non - linear data.

Nonlinear Markov Clustering by Minimum Curvilinear Sparse Similarity

Nonlinear clustering: methods and applications

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

Clustering populations by mixed linear models

A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering

SPARSE KERNEL MAXIMUM MARGIN CLUSTERING

Linearithmic Time Sparse and Convex Maximum Margin Clustering

Kernel Spectral Curvature Clustering (KSCC)

KLNCC: A new nonlinear correlation clustering algorithm based on KL-divergence

Unsupervised Manifold Linearizing and Clustering

Multi-view Clustering Via Multi-manifold Regularized Nonnegative Matrix Factorization

Robust Kernelized Multiview Clustering Based on High-Order Similarity Learning

Robust multi-view subspace clustering with missing data by aligning nonlinear manifolds

Robust Kernelized Multi-View Clustering Based on High-order Similarity Learning

A robust clustering method with noise identification based on directed K-nearest neighbor graph

Multi-view clustering indicator learning with scaled similarity

Deep Learning with Nonparametric Clustering

Projective Multiple Kernel Subspace Clustering.

Nonlinear subspace clustering by functional link neural networks

Simultaneous Global and Local Graph Structure Preserving for Multiple Kernel Clustering

Probabilistic K-means Clustering via Nonlinear Programming