To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences
Yuan-Mao Hung,Wei-Ni Lyu,Ming-Lin Tsai,Chiang-Lin Liu,Liang-Chuan Lai,Mong-Hsun Tsai,Eric Y. Chuang
DOI: https://doi.org/10.1016/j.compbiomed.2022.105416
IF: 7.7
2022-06-01
Computers in Biology and Medicine
Abstract:BackgroundTaxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.ObjectiveThe purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.MethodsBoth curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.ResultsThe accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.ConclusionThe performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology