Abstract:Non-Gaussian mixture models are gaining increasing attention for mixture model-based clustering particularly when dealing with data that exhibit features such as skewness and heavy tails. Here, such a mixture distribution is presented, based on the multivariate normal inverse Gaussian (MNIG) distribution. For parameter estimation of the mixture, a Bayesian approach via Gibbs sampler is used; for this, a novel approach to simulate univariate generalized inverse Gaussian random variables and matrix generalized inverse Gaussian random matrices is provided. The proposed algorithm will be applied to both simulated and real data. Through simulation studies and real data analysis, we show parameter recovery and that our approach provides competitive clustering results compared to other clustering approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to conduct effective cluster analysis when dealing with data with skewness and heavy - tailed characteristics. Specifically, the paper proposes a mixture model based on the multivariate normal - inverse Gaussian (MNIG) distribution and uses the Bayesian method to perform parameter estimation through Gibbs sampling. The traditional Gaussian mixture model can only model symmetric elliptical data, while the MNIG mixture model proposed in this paper can represent skewed and symmetric components more flexibly, thus providing more accurate clustering results. ### Main Contributions 1. **Proposing a Mixture Model Based on MNIG Distribution**: The paper introduces the multivariate normal - inverse Gaussian (MNIG) distribution as the basis of the mixture model to handle skewed and heavy - tailed data. 2. **Bayesian Parameter Estimation**: The Bayesian method is used to perform parameter estimation through Gibbs sampling, which solves the problems of slow convergence and unstable results in the traditional EM algorithm. 3. **Novel Random Variable Generation Method**: New methods for generating one - dimensional generalized inverse Gaussian (GIG) random variables and matrix generalized inverse Gaussian (MGIG) random matrices are provided, and these methods are very suitable in the MCMC framework. 4. **Performance Evaluation**: Through simulation studies and real - data analysis, the effectiveness and competitiveness of the proposed method are demonstrated. ### Key Technologies - **Multivariate Normal - inverse Gaussian (MNIG) Distribution**: This is a mean - variance mixture distribution that combines the characteristics of the multivariate normal distribution and the inverse Gaussian distribution and is suitable for modeling skewed and heavy - tailed data. - **Bayesian Method**: Parameter estimation is performed through Gibbs sampling, and prior and posterior distributions are used to infer model parameters. - **Gibbs Sampling**: A Monte Carlo Markov Chain (MCMC) method used to sample from complex posterior distributions. - **Model Selection**: The Bayesian Information Criterion (BIC) is used for model selection to determine the optimal number of clusters. ### Simulation Studies and Real - Data Analysis - **Simulation Studies**: By generating two - dimensional and four - dimensional data sets with skewness and heavy - tailed characteristics, the clustering performance of the proposed method is verified. The results show that the proposed method can accurately recover parameters and obtain a relatively high Adjusted Rand Index (ARI). - **Real - Data Analysis**: The proposed method is applied to the Old Faithful data set and the Fish Catch data set, demonstrating its effectiveness and competitiveness in practical problems. ### Conclusion The paper proposes a mixture model based on the multivariate normal - inverse Gaussian distribution and its Bayesian parameter estimation method, which can provide effective clustering results when dealing with skewed and heavy - tailed data. Through simulation studies and real - data analysis, the effectiveness and superiority of the proposed method are proved.

A Bayesian approach for clustering skewed data using mixtures of multivariate normal-inverse Gaussian distributions

Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

Clustering with the multivariate normal inverse Gaussian distribution

Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions

Variational Bayes Approximations for Clustering via Mixtures of Normal Inverse Gaussian Distributions

Clustering of non-Gaussian data by variational Bayes for normal inverse Gaussian mixture models

Bayesian mixtures of common factor analyzers: Model, variational inference, and applications

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions

Model-based clustering based on sparse finite Gaussian mixtures

Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour

Model-based clustering via skewed matrix-variate cluster-weighted models

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

Bayesian nonparametric location-scale-shape mixtures

Mixtures of skewed matrix variate bilinear factor analyzers

A Bayesian Approach to Clustering Matting Components in Spectral Matting

Modelling Skewed and Heavy-tailed Data Using a Normal Weighted Inverse Gaussian Distribution

Scale mixtures of multivariate centered skew-normal distributions

Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Bayesian finite mixtures of Ising models

Mixtures of Variance-Gamma Distributions