SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems

MZ Naser,Ahmed Naser
2024-07-10
Abstract:This paper presents a novel clustering algorithm from the SPINEX (Similarity-based Predictions with Explainable Neighbors Exploration) algorithmic family. The newly proposed clustering variant leverages the concept of similarity and higher-order interactions across multiple subspaces to group data into clusters. To showcase the merit of SPINEX, a thorough set of benchmarking experiments was carried out against 13 algorithms, namely, Affinity Propagation, Agglomerative, Birch, DBSCAN, Gaussian Mixture, HDBSCAN, K-Means, KMedoids, Mean Shift, MiniBatch K-Means, OPTICS, Spectral Clustering, and Ward Hierarchical. Then, the performance of all algorithms was examined across 51 synthetic and real datasets from various domains, dimensions, and complexities. Furthermore, we present a companion complexity analysis to compare the complexity of SPINEX to that of the aforementioned algorithms. Our results demonstrate that SPINEX can outperform commonly adopted clustering algorithms by ranking within the top-5 best performing algorithms and has moderate complexity. Finally, a demonstration of the explainability capabilities of SPINEX, along with future research needs, is presented.
Machine Learning
What problem does this paper attempt to address?
The paper introduces a novel clustering algorithm called SPINEX (Similarity-based Prediction and Interpretable Neighbor Exploration). Traditional clustering algorithms face challenges in handling complex datasets, such as parameter selection, high-dimensional data processing, and lack of interpretability. The SPINEX algorithm utilizes similarity and high-order interactions within multiple subspaces to group data, aiming to overcome these challenges. The paper demonstrates the performance of SPINEX on 51 synthetic and real-world datasets through benchmark tests against 13 other clustering algorithms. It also includes an accompanying complexity analysis. The results show that SPINEX ranks among the top five algorithms with moderate complexity and provides interpretability, which is crucial for applications requiring result interpretation. The features of SPINEX include: 1. Utilizing multiple similarity measures (such as correlation, Spearman's rank correlation, kernel similarity, and cosine similarity) to adapt to different types and distributions of data. 2. Employing an adaptive method to dynamically adjust parameters, capable of handling both pre-defined and undefined numbers of clusters. 3. Using a multi-level clustering approach to identify hierarchy within the data. 4. Emphasizing interpretability by providing insights into how clusters are formed, aiding in understanding and interpreting results. 5. Applicable for handling noise and outliers, with scalability for large datasets. The algorithm workflow includes initialization, similarity matrix calculation, threshold setting, similarity-based clustering, performance evaluation, and selection of the best clustering. In addition, SPINEX provides functionalities such as feature contribution analysis and nearest neighbor analysis to enhance its interpretability. Through these characteristics, SPINEX aims to improve clustering performance, especially on complex and high-dimensional datasets.