The Influences of Bioinformatics Tools and Reference Databases in Analyzing the Human Oral Microbial Community

Maria A. Sierra,Qianhao Li,Smruti Pushalkar,Bidisha Paul,Tito A. Sandoval,Angela R. Kamer,Patricia Corby,Yuqi Guo,Ryan Richard Ruff,Alexander V. Alekseyenko,Xin Li,Deepak Saxena

DOI: https://doi.org/10.3390/genes11080878

IF: 4.141

2020-08-03

Genes

Abstract:There is currently no criterion to select appropriate bioinformatics tools and reference databases for analysis of 16S rRNA amplicon data in the human oral microbiome. Our study aims to determine the influence of multiple tools and reference databases on α-diversity measurements and β-diversity comparisons analyzing the human oral microbiome. We compared the results of taxonomical classification by Greengenes, the Human Oral Microbiome Database (HOMD), National Center for Biotechnology Information (NCBI) 16S, SILVA, and the Ribosomal Database Project (RDP) using Quantitative Insights Into Microbial Ecology (QIIME) and the Divisive Amplicon Denoising Algorithm (DADA2). There were 15 phyla present in all of the analyses, four phyla exclusive to certain databases, and different numbers of genera were identified in each database. Common genera found in the oral microbiome, such as Veillonella, Rothia, and Prevotella, are annotated by all databases; however, less common genera, such as Bulleidia and Paludibacter, are only annotated by large databases, such as Greengenes. Our results indicate that using different reference databases in 16S rRNA amplicon data analysis could lead to different taxonomic compositions, especially at genus level. There are a variety of databases available, but there are no defined criteria for data curation and validation of annotations, which can affect the accuracy and reproducibility of results, making it difficult to compare data across studies.

genetics & heredity

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the issue of selecting appropriate bioinformatics tools and reference databases when analyzing human oral microbiome communities. Specifically, the study's goal is to evaluate the impact of different bioinformatics tools and reference databases on α-diversity measurements and β-diversity comparisons. By comparing the results of multiple tools (such as Quantitative Insights Into Microbial Ecology (QIIME) and Divisive Amplicon Denoising Algorithm (DADA2)) and multiple reference databases (such as Greengenes, Human Oral Microbiome Database (HOMD), National Center for Biotechnology Information (NCBI) 16S, SILVA, and Ribosomal Database Project (RDP)), the study seeks to reveal the effects of different tools and databases on taxonomic composition and diversity estimation when analyzing 16S rRNA amplicon data. ### Main Findings 1. **Differences in Taxonomic Composition**: - Using different reference databases leads to different taxonomic compositions, especially at the genus level. For example, some less common genera (such as Bulleidia and Paludibacter) are annotated only in larger databases like Greengenes. - Common oral microbiome genera (such as Veillonella, Rothia, and Prevotella) are annotated in all databases, but the number of genera identified varies among databases. 2. **Differences in Diversity Measurements**: - Different databases show significant differences in α-diversity measurements (such as the number of observed OTUs, Shannon index, Chao1, and ACE). - In β-diversity comparisons, the similarity and differences between samples vary when using different databases for sample annotation. Notably, when using the QIIME pipeline, sample annotations differ significantly between the SILVA and Greengenes databases, while using the DADA2 pipeline, the RDP database's sample annotations differ more compared to other databases. 3. **Methodological Differences**: - Different bioinformatics tools and databases use different methods for processing sequence data, which may lead to inconsistent results. For example, QIIME uses Operational Taxonomic Units (OTUs), while DADA2 uses Amplicon Sequence Variants (ASVs). - The lack of standardized data curation and validation standards may affect the accuracy and reproducibility of results, making cross-study data comparisons difficult. ### Conclusion The study emphasizes the importance of selecting appropriate bioinformatics tools and reference databases when analyzing 16S rRNA amplicon data. Different tools and databases may lead to different taxonomic compositions and diversity estimation results, highlighting the need for standardized methods and standards to improve the accuracy and reproducibility of results. This is crucial for better understanding the role of the oral microbiome in human health and disease.

The Influences of Bioinformatics Tools and Reference Databases in Analyzing the Human Oral Microbial Community

Influence of 16S rRNA reference databases in amplicon-based environmental microbiome research

A comparison between Greengenes, SILVA, RDP, and NCBI reference databases in four published microbiota datasets

Impact of DNA extraction method and targeted 16S-rRNA hypervariable region on oral microbiota profiling

Evaluating the Impact of DNA Extraction Method on the Representation of Human Oral Bacterial and Fungal Communities

Improved High-Throughput Sequencing of the Human Oral Microbiome: from Illumina to PacBio

Taxonomic profiling and functional characterization of the healthy human oral bacterial microbiome from the north Indian urban sub-population

Influence of DNA extraction on oral microbial profiles obtained via 16S rRNA gene sequencing

Analysis of the microbial community diversity in various regions of the healthy oral cavity

Pyrosequencing analysis of the human microbiota of healthy Chinese undergraduates

Evaluation of computational methods for human microbiome analysis using simulated data

Getting to Know "The Known Unknowns": Heterogeneity in the Oral Microbiome

Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences

Optimising high-throughput sequencing data analysis, from gene database selection to the analysis of compositional data: a case study on tropical soil nematodes

Improving Species Level‐taxonomic Assignment from 16S rRNA Sequencing Technologies

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline

Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys

Can metagenomics unravel the impact of oral bacteriome in human diseases?

Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

Analysis of the Human Oral Microbiome of Smokers and Non-Smokers Using PCR-RFLP and Ribotyping