GraphPlas: Refined Classification of Plasmid Sequences Using Assembly Graphs

Anuradha Wickramarachchi,Yu Lin
DOI: https://doi.org/10.1109/TCBB.2021.3082915
2022-02-02
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Plasmids are extra-chromosomal genetic materials with important markers that affect the function and behaviour of the microorganisms supporting their environmental adaptations. Hence the identification and recovery of such plasmid sequences from assemblies is a crucial task in metagenomics analysis. In the past, machine learning approaches have been developed to separate chromosomes and plasmids. However, there is always a compromise between precision and recall in the existing classification approaches. The similarity of compositions between chromosomes and their plasmids makes it difficult to separate plasmids and chromosomes with high accuracy. However, high confidence classifications are accurate with a significant compromise of recall, and vice versa. Hence, the requirement exists to have more sophisticated approaches to separate plasmids and chromosomes accurately while retaining an acceptable trade-off between precision and recall. We present GraphPlas, a novel approach for plasmid recovery using coverage, composition and assembly graph topology. We evaluated GraphPlas on simulated and real short read assemblies with varying compositions of plasmids and chromosomes. Our experiments show that GraphPlas is able to significantly improve accuracy in detecting plasmid and chromosomal contigs on top of popular state-of-the-art plasmid detection tools. The source code is freely available at: https://github.com/anuradhawick/GraphPlas .
What problem does this paper attempt to address?