MicroPredict: predicting species-level taxonomic abundance of whole-shotgun metagenomic data using only 16S amplicon sequencing data

Chloe Soohyun Jang,Hakin Kim,Donghyun Kim,Buhm Han
DOI: https://doi.org/10.1007/s13258-024-01514-w
IF: 2.164
2024-05-08
Genes & Genomics
Abstract:Background The importance of the human microbiome in the analysis of various diseases is emerging. The two main methods used to profile the human microbiome are 16S rRNA gene sequencing (16S sequencing) and whole-genome shotgun sequencing (WGS). Owing to the full coverage of the genome in sequencing, WGS has multiple advantages over 16S sequencing, including higher taxonomic profiling resolution at the species-level and functional profiling analysis. However, 16S sequencing remains widely used because of its relatively low cost. Although WGS is the standard method for obtaining accurate species-level data, we found that 16S sequencing data contained rich information to predict high-resolution species-level abundances with reasonable accuracy. Objective In this study, we proposed MicroPredict, a method for accurately predicting WGS-comparable species-level abundance data using 16S taxonomic profile data. Methods We employed a mixed model using two key strategies: (1) modeling both sample- and species-specific information for predicting WGS abundances, and (2) accounting for the possible correlations among different species. Results We found that MicroPredict outperformed the other machine learning methods. Conclusion We expect that our approach will help researchers accurately approximate the species-level abundances of microbiome profiles in datasets for which only cost-effective 16S sequencing has been applied.
biochemistry & molecular biology,biotechnology & applied microbiology,genetics & heredity
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the following issues: 1. **Improving the Accuracy of Species-Level Abundance Prediction**: - In microbiome research, 16S rRNA gene sequencing (16S sequencing) is cost-effective and widely used, but its main drawback is the low resolution at the species level and the presence of technical biases. In contrast, whole-genome shotgun sequencing (WGS) provides higher species-level resolution and functional analysis but is more expensive. - Researchers aim to use 16S sequencing data to predict high-resolution species-level abundance comparable to WGS. 2. **Proposing a New Prediction Method, MicroPredict**: - The paper proposes a method called MicroPredict, which uses 16S sequencing data to accurately predict species-level abundance data comparable to WGS. - The goal of MicroPredict is to correct potential biases in 16S sequencing data while predicting unknown WGS information. 3. **Validating the Effectiveness of the New Method**: - By comparing with other machine learning methods (such as linear regression, autoencoders, and convolutional neural networks), the paper validates the superior performance of MicroPredict in predicting species-level abundance. - Benchmarking is conducted using three different cohort datasets (RESONANCE cohort, URC cohort, and Crohn's disease cohort) and evaluating its performance on independent test sets. In summary, the core objective of this paper is to develop an efficient method based on 16S sequencing data to accurately predict species-level abundance, thereby helping researchers obtain high-resolution microbiome information at a lower cost.