PanTax: Strain-level taxonomic classification of metagenomic data using pangenome graphs

Wenhai Zhang,Yuansheng Liu,Jialu Xu,Enlian Chen,Alexander Schonhuth,Xiao Luo
DOI: https://doi.org/10.1101/2024.11.15.623887
2024-11-17
Abstract:Microbes are omnipresent, thriving in a range of habitats from oceans to soils and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the strain diversity within microbial communities is crucial, as variations between strains can lead to distinct phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, the majority of existing taxonomic classifiers rely solely on a single linear representative genome as a reference, which fails to capture the strain diversity, thereby introducing single-reference biases. Here, we present PanTax, a pangenome graph-based taxonomic classification method that overcomes the shortcomings of single-reference genome-based approaches, because pangenome graphs possess the capability to depict the genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher precision or recall (at both species and strain levels), while maintaining comparable or better performance in other aspects across various datasets. PanTax is a user friendly open-source tool that is publicly accessible at https://github.com/LuoGroup2023/PanTax.
Bioinformatics
What problem does this paper attempt to address?