Flanker: a tool for comparative genomics of gene flanking regions

William Matlock,Samuel Lipworth,Bede Constantinides,Timothy E.A. Peto,A. Sarah Walker,Derrick Crook,Susan Hopkins,Liam P. Shaw,Nicole Stoesser
DOI: https://doi.org/10.1101/2021.02.22.432255
2021-02-22
Abstract:Abstract Analysing the flanking sequences surrounding genes of interest is often highly relevant to understanding the role of mobile genetic elements (MGEs) in horizontal gene transfer, particular for antimicrobial resistance genes. Here, we present Flanker, a Python package which performs alignment-free clustering of gene flanking sequences in a consistent format, allowing investigation of MGEs without prior knowledge of their structure. These clusters, known as ‘flank patterns’, are based on Mash distances, allowing for easy comparison of similarity across sequences. Additionally, Flanker can be flexibly parameterised to finetune outputs by characterising upstream and downstream regions separately and investigating variable lengths of flanking sequence. We apply Flanker to two recent datasets describing plasmid-associated carriage of important carbapenemase genes (blaOXA-48 and blaKPC-2/3) and show that it successfully identifies distinct clusters of flank patterns, including both known and previously uncharacterised structural variants. For example, Flanker identified four Tn4401 profiles that could not be sufficiently characterised using TETyper or MobileElementFinder, demonstrating the utility of Flanker for flanking gene characterisation. Similarly, using a large (n=226) European isolate dataset, we confirm findings from a previous smaller study demonstrating association between Tn1999.2 and bla OXA-48 upregulation and demonstrate 17 flank patterns (compared to the 5 previously identified). More generally the demonstration in this study that flank patterns are associated with to geographical regions and antibiotic susceptibility phenotypes suggests that they may be useful as epidemiological markers. Flanker is freely available under an MIT license at https://github.com/wtmatlock/flanker . Data Summary NCBI accession numbers for all sequencing data used in this study is provided in Supplementary Table 1. The analysis performed in this manuscript can be reproduced in a binder environment provided on the Flanker Github page ( https://github.com/wtmatlock/flanker ).
What problem does this paper attempt to address?