A Time Machine for Taxonomy

Austin Davis-Richardson,Timothy Reynolds
DOI: https://doi.org/10.1101/2024.12.11.627987
2024-12-12
Abstract:The NCBI Taxonomy Database is the primary resource for linking genomic information to taxonomic relationships, widely used across scientific disciplines and critically important to bioinformatics. This database is continuously changing as researchers discover and refine taxonomic relationships. Yet, tracking and comparing past taxonomic states is challenging due to frequent changes and the need to sift through numerous historical snapshots. To address this, we developed the Taxonomy Time Machine: a database for storing many snapshots of a taxonomic tree in a space-efficient manner. We have also created a web-based and programmatic (API) interface to make this data more accessible. This tool is capable of accurately reconstructing taxonomic lineages at any point in the history of the NCBI Taxonomy Database. We demonstrate that this tool is both perfectly accurate and significantly more efficient than loading and querying individual taxonomy snapshots, enabling its use on desktop computers as well as commodity web servers. We have made this tool available on the web (https://taxonomy.onecodex.com) as well as open source under the MIT license (https://github.com/onecodex/taxonomy-time-machine).
Bioinformatics
What problem does this paper attempt to address?