Abstract:The field of deep clustering combines deep learning and clustering to learn representations that improve both the learned representation and the performance of the considered clustering method. Most existing deep clustering methods are designed for a single clustering method, e.g., k-means, spectral clustering, or Gaussian mixture models, but it is well known that no clustering algorithm works best in all circumstances. Consensus clustering tries to alleviate the individual weaknesses of clustering algorithms by building a consensus between members of a clustering ensemble. Currently, there is no deep clustering method that can include multiple heterogeneous clustering algorithms in an ensemble to update representations and clusterings together. To close this gap, we introduce the idea of a consensus representation that maximizes the agreement between ensemble members. Further, we propose DECCS (Deep Embedded Clustering with Consensus representationS), a deep consensus clustering method that learns a consensus representation by enhancing the embedded space to such a degree that all ensemble members agree on a common clustering result. Our contributions are the following: (1) We introduce the idea of learning consensus representations for heterogeneous clusterings, a novel notion to approach consensus clustering. (2) We propose DECCS, the first deep clustering method that jointly improves the representation and clustering results of multiple heterogeneous clustering algorithms. (3) We show in experiments that learning a consensus representation with DECCS is outperforming several relevant baselines from deep clustering and consensus clustering. Our code can be found at <a class="link-external link-https" href="https://gitlab.cs.univie.ac.at/lukas/deccs" rel="external noopener nofollow">this https URL</a>

Consensus Clustering on Big Data

Combining multiple clusterings via k-modes algorithm

K-Means-Based Consensus Clustering: A Unified View

Distributed Affinity Propagation Clustering Based on MapReduce

Fuzzy Consensus Clustering with Applications on Big Data.

A Novel Kernel Possibitistic Fuzzy C-Means Clustering Algorithm For Large Scale Data Sets

Cludoop: an efficient distributed density-based clustering for big data using hadoop

A survey of data partitioning and sampling methods to support big data analysis

Hyperplane Division in Fuzzy C-Means: Clustering Big Data

Spectral Ensemble Clustering Via Weighted K-Means: Theoretical and Practical Evidence

Self-paced Adaptive Bipartite Graph Learning for Consensus Clustering

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Fuzzy Weighted Clustering Method for Numerical Attributes of Communication Big Data Based on Cloud Computing

Deep Clustering With Consensus Representations

Consensus Clustering With Co-Association Matrix Optimization

Consensus-based clustering and data aggregation in decentralized network of multi-agent systems

Writing summary for the state-of-the-art methods for big data clustering in distributed environment

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

Imbalanced Data Clustering using Equilibrium K-Means

K-Means Clustering With Incomplete Data