Abstract:Cluster structure ensemble focuses on integrating multiple cluster structures extracted from different datasets into a unified cluster structure, instead of aligning the individual labels from the clustering solutions derived from multiple homogenous datasets in the cluster ensemble framework. In this article, we design a novel probabilistic cluster structure ensemble framework, referred to as Gaussian mixture model based cluster structure ensemble framework (GMMSE), to identify the most representative cluster structure from the dataset. Specifically, GMMSE first applies the bagging approach to produce a set of variant datasets. Then, a set of Gaussian mixture models are used to capture the underlying cluster structures of the datasets. GMMSE applies K-means to initialize the values of the parameters of the Gaussian mixture model, and adopts the Expectation Maximization approach (EM) to estimate the parameter values of the model. Next, the components of the Gaussian mixture models are viewed as new data samples which are used to construct the representative matrix capturing the relationships among components. The similarity between two components corresponding to their respective Gaussian distributions is measured by the Bhattycharya distance function. Afterwards, GMMSE constructs a graph based on the new data samples and the representative matrix, and searches for the most representative cluster structure. Finally, we also design four criteria to assign the data samples to their corresponding clusters based on the unified cluster structure. The experimental results show that (i) GMMSE works well on synthetic datasets and real datasets in the UCI machine learning repository. (ii) GMMSE outperforms most of the previous cluster ensemble approaches.

A Probabilistic Model Based on Uncertainty for Data Clustering.

Document Clustering Based on Probabilistic Topic Model

Incorporating Probabilistic Knowledge into Topic Models.

A Novel Probabilistic Clustering Model for Heterogeneous Networks

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Optimal Clustering under Uncertainty

Uncertain Data Clustering Based on Probability Distribution in Obstacle Space

A Probabilistic Approach to Latent Cluster Analysis

Enhancement of the Classification Performance of Fuzzy C-Means through Uncertainty Reduction with Cloud Model Interpolation

PCM and APCM Revisited: An Uncertainty Perspective

A generalized Bayes framework for probabilistic clustering

Interval-valued possibilistic fuzzy C-means clustering algorithm

A Comparative Study of A Practical Stochastic Clustering Method with Traditional Methods

Tracking High Quality Clusters over Uncertain Data Streams

A Sparse Framework for Robust Possibilistic K-Subspace Clustering

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

Probabilistic Cluster Structure Ensemble

Topic-based mixture language modelling

Clustering Uncertain Data via Representative Possible Worlds with Consistency Learning

Challenges in model‐based clustering

Knowledge discovery through directed probabilistic topic models: a survey