The Naive Bayes Classifier++ for Metagenomic Taxonomic Classification -- Query Evaluation

Haozhe Neil Duan,Gavin Hearne,Robi Polikar,Gail L Rosen
DOI: https://doi.org/10.1101/2024.06.25.600711
2024-06-29
Abstract:This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, kmer size, databases, and input sample data size. NBC++ can successfully assess a wide range of superkingdoms using a small training database. We demonstrate that NBC++ and Kraken2 are affected by database depth with macro measures increasing with depth but that the full diversity of life, especially viruses, is still a challenge for these classifiers. NBC++ spends less time training but at the cost of long querying time. The major enhancements are to accommodate canonical $k$mer storage (with major storage savings), adaptable and optimized memory allocation that quickens the query analysis and allows the classifier to be run on almost any system, and enables output of the log-likelihood values against each training genome which provides users with valualbe confidence information.
Bioinformatics
What problem does this paper attempt to address?