Model-based clustering via skewed matrix-variate cluster-weighted models

Michael P.B. Gallaugher,Salvatore D. Tomarchio,Paul D. McNicholas,Antonio Punzo
DOI: https://doi.org/10.48550/arXiv.2111.14952
2021-11-30
Abstract:Cluster-weighted models (CWMs) extend finite mixtures of regressions (FMRs) in order to allow the distribution of covariates to contribute to the clustering process. In a matrix-variate framework, the matrix-variate normal CWM has been recently introduced. However, problems may be encountered when data exhibit skewness or other deviations from normality in the responses, covariates or both. Thus, we introduce a family of 24 matrix-variate CWMs which are obtained by allowing both the responses and covariates to be modelled by using one of four existing skewed matrix-variate distributions or the matrix-variate normal distribution. Endowed with a greater flexibility, our matrix-variate CWMs are able to handle this kind of data in a more suitable manner. As a by-product, the four skewed matrix-variate FMRs are also introduced. Maximum likelihood parameter estimates are derived using an expectation-conditional maximization algorithm. Parameter recovery, classification assessment, and the capability of the Bayesian information criterion to detect the underlying groups are investigated using simulated data. Lastly, our matrix-variate CWMs, along with the matrix-variate normal CWM and matrix-variate FMRs, are applied to two real datasets for illustrative purposes.
Methodology,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to conduct clustering analysis more effectively in the framework of matrix variables when the data exhibits skewness or other non - normality characteristics. Specifically, the existing Matrix - Variate Cluster - Weighted Models (CWMs) are mainly based on the matrix - variate normal distribution, which may encounter problems when dealing with skewed data or data with outliers. Therefore, this paper introduces a new family of matrix - variate CWMs. These models allow the response variables and covariates to be modeled separately or simultaneously by four existing skewed matrix - variate distributions or the matrix - variate normal distribution. In this way, the new models can handle the skewness in the data more flexibly, thereby improving the clustering effect. The main contributions of the paper include: 1. **Model Extension**: Propose 24 new matrix - variate CWMs that can handle cases where the response variables and covariates are skewed simultaneously or separately. 2. **Parameter Estimation**: Use the Expectation - Conditional Maximization (ECM) algorithm to derive maximum - likelihood parameter estimates. 3. **Performance Evaluation**: Evaluate parameter recovery, classification performance, and the ability of the Bayesian Information Criterion (BIC) to detect latent group structures through simulated data. 4. **Practical Application**: Apply the proposed models to two real - world datasets to demonstrate their application effectiveness in practical problems. Through these methods, the paper aims to provide a more flexible and powerful tool for handling clustering and classification problems in complex data structures.