A maximal-clique-based clustering approach for multi-observer multi-view data by using k-nearest neighbor with S-pseudo-ultrametric induced by a fuzzy similarity
Azadeh Zahedi Khameneh,Mehrdad Ghaznavi,Adem Kilicman,Zahari Mahad,Abbas Mardani
DOI: https://doi.org/10.1007/s00521-024-09560-x
2024-03-07
Neural Computing and Applications
Abstract:Partitioning multi-view data is a recent challenge in clustering methods, which traditionally consider single-view data. In clustering techniques, finding the similarity or distance between objects, handled by metrics in Rn$$\mathbb {R}^{n}$$, plays a central role in community detection. Under this framework, different algorithms have been developed where the output relies on an exact distance calculated based on the objects’ features. As feature information might be qualitative data defined in an ambiguous environment, this study offers a new class of metrics, so-called S-distance, as a dual of a fuzzy T-similarity, which successfully produces a collective distance based on all views/observers and provides a more flexible framework to define distance under uncertainty. Besides, most existing approaches handle multi-view clustering by aggregating each view’s clusters or using an iterative optimization method; both are time-consuming. Here, by transforming the multi-view clustering problem into node clustering, we suggest a new approach without iteration for multi-view and multi-observer data. Our proposed method, GMSkNN, uses an attribute-structural similarity relation between nodes to get more coherent clusters. To this end, we first build a k-nearest neighbor (kNN) directed graph using the proposed S-distance, then transform it into an undirected graph based on the neighborhood information of the nodes so that the resultant graph is characterized based on nodes interactions and initial features information of the nodes. Next, a new maximal-clique-based clustering is designed to complete the node partitioning. The proposed clustering algorithm is programmed and tested on synthetic and four real-world datasets using the R software. The clustering results are analyzed based on several indexes. This analysis shows the efficiency of the proposed algorithm compared to the traditional clustering methods.
computer science, artificial intelligence