Mixture models for simultaneous classification and reduction of three-way data

Roberto Rocci,Maurizio Vichi,Monia Ranalli
DOI: https://doi.org/10.1007/s00180-024-01478-1
IF: 1.4049
2024-05-08
Computational Statistics
Abstract:Finite mixture of Gaussians are often used to classify two- (units and variables) or three- (units, variables and occasions) way data. However, two issues arise: model complexity and capturing the true cluster structure. Indeed, a large number of variables and/or occasions implies a large number of model parameters; while the existence of noise variables (and/or occasions) could mask the true cluster structure. The approach adopted in the present paper is to reduce the number of model parameters by identifying a sub-space containing the information needed to classify the observations. This should also help in identifying noise variables and/or occasions. The maximum likelihood model estimation is carried out through an EM-like algorithm. The effectiveness of the proposal is assessed through a simulation study and an application to real data.
statistics & probability
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address two main issues in the classification of three-way data: 1. **Model Complexity**: A large number of parameters need to be estimated when dealing with a large number of variables. 2. **Capturing True Clustering Structure**: Noise variables (and/or occasions) may obscure the true clustering structure. To solve these problems, the paper proposes a mixture model approach that reduces the number of model parameters by identifying a subspace containing the information needed for classification. This approach also helps in identifying noise variables and/or occasions. The paper employs a method similar to the EM algorithm for maximum likelihood model estimation and validates the effectiveness of the method through simulation studies and real data analysis.