CONSULT-II: Accurate taxonomic identification and profiling using locality-sensitive hashing

Ali Osman Berk Şapcı,Eleonora Rachtman,Siavash Mirarab
DOI: https://doi.org/10.1101/2023.11.07.566115
2024-01-08
Abstract:Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to ranks without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft LCA labeling and voting is, more accurate than alternatives in both taxonomic classification of individual reads and abundance profiling.
Bioinformatics
What problem does this paper attempt to address?