gplasCC: classification and reconstruction of plasmids from short-read sequencing data for any bacterial species

Julian A Paganini,Jesse J. Kerkvliet,Oscar Jordan,Gijs Teunis,Nienke L. Plantinga,Rob J.L. Willems,Sergio Arredondo-Alonso,Anita C. Schurch
DOI: https://doi.org/10.1101/2024.11.28.625923
2024-12-03
Abstract:Plasmids play a pivotal role in the spread of antibiotic resistance genes. Accurately reconstructing plasmids often requires long-read sequencing, but bacterial genomic data in publicly accessible repositories has historically been derived from short-read sequencing technology. We recently presented an approach for reconstructing Escherichia coli antimicrobial resistance plasmids using Illumina short reads. This method consisted of combining a robust binary classification tool named plasmidEC with gplas2, which is a tool that makes use of features of the assembly graph to bin predicted plasmid contigs into individual plasmids. Here, we developed gplasCC, a plasmidEC-simplification, capable of classifying plasmid contigs using Centrifuge databases. We have developed seven plasmidCC databases in addition to the database for E. coli: six species-specific models (Acinetobacter baumannii, Enterococcus faecium, Enterococcus faecalis, Klebsiella pneumoniae, Staphylococcus aureus and Salmonella enterica) and one species-independent model for less frequently studied bacterial species. We combined these models with gplas2 (now, gplasCC) to reconstruct plasmids from more than 100 bacterial species. This approach allows comprehensive analysis of the wealth of bacterial short-read sequencing data available in public repositories and advance our understanding of microbial plasmids.
Bioinformatics
What problem does this paper attempt to address?