Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data

Andrea Payne,Anjali Silva,Steven J. Rothstein,Paul D. McNicholas,Sanjeena Subedi

2023-11-14

Abstract:A mixture of multivariate Poisson-log normal factor analyzers is introduced by imposing constraints on the covariance matrix, which resulted in flexible models for clustering purposes. In particular, a class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. Variational Gaussian approximation is used for parameter estimation, and information criteria are used for model selection. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies. Using real and simulated data, the models are shown to give favourable clustering performance. The GitHub R package for this work is available at

Machine Learning,Computation

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop a new factor - analysis mixture model based on the Multivariate Poisson - Log Normal (MPLN) for clustering count data. Specifically, the researchers introduced a class of eight parsimonious MPLN factor - analysis mixture models. By imposing constraints on the covariance matrix, they achieved a flexible model structure, reduced the number of parameters, and thus improved the applicability and efficiency of the model. These models are particularly suitable for clustering discrete data in RNA - sequencing studies, can handle over - dispersion in the data, and can accommodate positive and negative correlations. The method proposed in the paper aims to overcome the limitations of traditional univariate distribution models (such as the negative binomial distribution model) when dealing with multivariate RNA - seq data. These traditional models assume that variables are independent of each other and cannot effectively capture the correlations between variables. By introducing the Multivariate Poisson - Log Normal distribution, the researchers can better handle the complex structures in RNA - seq data and provide more accurate clustering results. In addition, the paper also explored the application of the variational Gaussian approximation method in parameter estimation and the role of information criteria in model selection, and demonstrated the good clustering performance of the proposed model on real - data and simulated data. These achievements are of great significance for data analysis in the field of bioinformatics, especially for the analysis of RNA - sequencing data.

Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data

A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data

Finite mixtures of matrix-variate Poisson-log normal distributions for three-way count data

Bayesian mixtures of common factor analyzers: Model, variational inference, and applications

Logistic Normal Multinomial Factor Analyzers for Clustering Microbiome Data

Infinite Mixtures of Infinite Factor Analysers

Clustering Multivariate Data using Factor Analytic Bayesian Mixtures with an Unknown Number of Components

Overfitting Bayesian Mixtures of Factor Analyzers with an Unknown Number of Components

Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers

A sparse factor model for clustering high‐dimensional longitudinal data

Model-based clustering based on sparse finite Gaussian mixtures

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Model based clustering of multinomial count data

A Hierarchical Finite Mixture Model That Accommodates Zero-Inflated Counts, Non-Independence, and Heterogeneity.

Factor Adjusted Spectral Clustering for Mixture Models

Nonparametric Bayesian Negative Binomial Factor Analysis

A sparse negative binomial mixture model for clustering RNA-seq count data

Parsimonious Mixtures of Matrix Variate Bilinear Factor Analyzers

A nonparametric variable clustering model

Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

Robust estimation for mixtures of Gaussian factor analyzers, based on trimming and constraints