Abstract:Mixture model (MM) is a probabilistic framework allows us to define dataset containing $K$ different modes. When each of the modes is associated with a Gaussian distribution, we refer to it as Gaussian MM or GMM. Given a data point x, a GMM may assume the existence of a random index $k$ ∊ {1, …, K} identifying which Gaussian the particular data is associated with. In a traditional GMM paradigm, it is straightforward to compute in closed-form, the conditional likelihood p(x|k, θ) as well as the responsibility probability p(k| x, θ) describing the distribution weights for each data. Computing the responsibility allows us to retrieve many important statistics of the overall dataset, including the weights of each of the modes/clusters. Modern large datasets are often containing multiple unlabelled modes, such as paintings dataset may contain several styles; fashion images containing several unlabelled categories. In its raw representation, the Euclidean distances between the data (e.g., images) do not allow them to form mixtures naturally, nor it's feasible to compute responsibility distribution analytically, making GMM unable to apply. In this paper, we utilize the generative adversarial network (GAN) framework to achieve a plausible alternative method to compute these probabilities. The key insight is that we compute them at the data's latent space $z$ instead of x. However, this process of $z$ → $x$ is irreversible under GAN which renders the computation of responsibility p(k|x, θ) infeasible. Our paper proposed a novel method to solve it by using a socalled posterior consistency module (PCM). PCM acts like a GAN, except its generator $C$ PCM does not output the data, but instead it outputs a distribution to approximate p(k|x, θ). The entire network is trained in an “end-to-end” fashion. Trough these techniques, it allows us to model the dataset of very complex structure using GMM and subsequently to discover interesting properties of an unsupervised dataset, including its segments, as well as generating new “out-distribution” data by smooth linear interpolation across any combinations of the modes in a completely unsupervised manner.

Mixture of GANs for Clustering.

GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models

A General Transfer Learning-based Gaussian Mixture Model for Clustering

EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

GAN-based Gaussian Mixture Model Responsibility Learning

Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets

Manifold Regularized Gaussian Mixture Model for Semi-supervised Clustering.

Laplacian Regularized Gaussian Mixture Model for Data Clustering

Gaussian Mixture Model with Local Consistency

Multi-distribution Mixture Generative Adversarial Networks for Fitting Diverse Data Sets

Gaussian Mixture Model Clustering with Incomplete Data

A Nonparametric Model for Multi-Manifold Clustering with Mixture of Gaussians and Graph Consistency

Probabilistic Cluster Structure Ensemble

Hierarchical Mixtures of Generators for Adversarial Learning

Balanced Self-Paced Learning for Generative Adversarial Clustering Network.

A Novel Gaussian Mixture Model for Classification

GAN-based Clustering Solution Generation and Fusion of Diffusion

A Greedy Merge Learning Algorithm for Gaussian Mixture Model

HGMVAE: hierarchical disentanglement in Gaussian mixture variational autoencoder

An Entropy Weighting Mixture Model for Subspace Clustering of High-Dimensional Data

Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding