I-SVVS: Integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data
Tung Dang,Yushiro Fuji,Kie Kumaishi,Erika Usui,Shungo Kobori,Takumi Sato,Yusuke Toda,Kengo Sakurai,Yuji Yamasaki,Hisashi Tsujimoto,Masami Yokota Hirai,Yasunori Ichihashi,Hiroyoshi Iwata
DOI: https://doi.org/10.1101/2023.08.18.553796
2024-10-28
Abstract:High-dimensional multi-omics microbiome data plays an important role in elucidating microbial communities' interactions with their hosts and environment in critical diseases and ecological changes. Although Bayesian clustering methods have recently been used for the integrated analysis of multi-omics data, no method designed to analyze multi-omics microbiome data has been proposed. In this study, we propose a novel framework called integrative stochastic variational variable selection (I-SVVS), which is an extension of stochastic variational variable selection for high-dimensional microbiome data. The I-SVVS approach addresses a specific Bayesian mixture model for each type of omics data, such as an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data. This approach is expected to reduce the computational time of the clustering process and improve the accuracy of the clustering results. Additionally, I-SVVS identifies a critical set of representative variables in multi-omics microbiome data. Three datasets from soybean, mice, and humans (each set integrated microbiome and metabolome) were used to demonstrate the potential of I-SVVS. The results indicate that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets. It effectively identified key microbiome species and metabolites characterizing each cluster. For instance, the computational analysis of soybean dataset, including 377 samples with 16,943 microbiome species and 265 metabolome features, was completed in 2.18 hours using I-SVVS, compared to 2.35 days with Clusternomics and 1.12 days with iClusterPlus. The software for this analysis, written in Python, is freely available at https://github.com/tungtokyo1108/I-SVVS.
Bioinformatics