Model-based multifacet clustering with high-dimensional omics applications

Wei Zong,Danyang Li,Marianne L Seney,Colleen A Mcclung,George C Tseng
DOI: https://doi.org/10.1093/biostatistics/kxae020
IF: 5.2789
2024-07-13
Biostatistics
Abstract:Summary High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.
statistics & probability,mathematical & computational biology
What problem does this paper attempt to address?