Abstract:Background: Antimicrobial resistance (AMR) in Escherichia coli is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to variants in a curated reference database, with the implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study, we evaluated the performance of the AMRFinder tool and, subsequently, the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility phenotype predictions in E coli. Methods: In this cross-sectional study of international genome sequence data, we assembled a global dataset of 9001 E coli sequences from five publicly available data collections predominantly deriving from human bloodstream infections from: Norway, Oxfordshire (UK), Thailand, the UK, and Sweden. 8555 of these sequences had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes (relevant to amoxicillin-clavulanic acid, ampicillin, ceftriaxone, ciprofloxacin, gentamicin, piperacillin-tazobactam, and trimethoprim) extracted using the National Center for Biotechnology Information AMRFinder tool (using both default and strict [100%] coverage and identity filters). We assessed the predictive value of the presence of these genes for predicting resistance or susceptibility against US Food and Drug Administration thresholds for major and very major errors. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (≥70% match) and antibiotic-resistance genes (ARGs; 100% match) and categorised these according to their frequency in the dataset. Accumulation curves were simulated and correlations between gene frequency in the Oxfordshire and other datasets calculated using the Spearman coefficient. Firth regression was used to model the association between the presence of blaTEM-1 variants and amoxicillin-clavulanic acid or piperacillin-tazobactam resistance, adjusted for the presence of other relevant ARGs. Findings: The performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet US Food and Drug Administration thresholds for any of the seven antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, most explainable resistance was associated with the presence of a small number of genes. There was a proportion of resistance that could not be explained by known ARGs; this ranged from 75·1% for amoxicillin-clavulanic acid to 3·4% for ciprofloxacin. Only 18 199 (51·5%) of the 35 343 ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1042 unique ARGs, of which 126 (12·1%) were present ten times or more, 313 (30·0%) were present between two and nine times, and 603 (57·9%) were present only once. Simulated accumulation curves revealed that discovery of new (100% match) ARGs present more than once in the dataset plateaued relatively quickly, whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0·76 [95% CI 0·73-0·80], p<0·0001) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed six times in Oxfordshire always being found elsewhere. Finally, using the example of blaTEM-1, we showed that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences; for example, two common, uncatalogued blaTEM-1 alleles with only synonymous mutations compared with the known reference were associated with reduced resistance to amoxicillin-clavulanic acid (adjusted odds ratio 0·58 [95% CI 0·35-0·95], p=0·031) and piperacillin-tazobactam (0·50 [95% CI 0·29-0·82], p=0·005). Interpretation: We highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery. Funding: National Institute for Health and Care Research, Wellcome, and UK Medical Research Council.

Exploring uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli: an observational analysis

Validating the AMRFinder Tool and Resistance Gene Database by Using Antimicrobial Resistance Genotype-Phenotype Correlations in a Collection of Isolates

Estimating the effect of antimicrobial resistance genes on minimum inhibitory concentration in Escherichia coli

Genetic determinants underlying the progressive phenotype of β-lactam/β-lactamase inhibitor resistance in Escherichia coli

Functional Screening of Antibiotic Resistance Genes from Human Gut Microbiota Reveals a Novel Gene Fusion.

Antibiotic resistance determination using Enterococcus faecium whole-genome sequences: a diagnostic accuracy study using genotypic and phenotypic data

Utilizing co-abundances of antimicrobial resistance genes to identify potential co-selection in the resistome

Towards routine employment of computational tools for antimicrobial resistance determination via high-throughput sequencing

AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence

Intraspecific variation in antibiotic resistance potential within E. coli

Antimicrobial Resistance Genes Analysis of Publicly Available Staphylococcus aureus Genomes

Predictive Modeling of Phenotypic Antimicrobial Susceptibility of Selected Beta-Lactam Antimicrobials from Beta-Lactamase Resistance Genes

Identification and specificity validation of unique and antimicrobial resistance genes to trace suspected pathogenic AMR bacteria and to monitor the development of AMR in non-AMR strains in the environment and clinical settings

Concordance in molecular methods for detection of antimicrobial resistance: A cross sectional study of the influent to a wastewater plant

An omics-based framework for assessing the health risk of antimicrobial resistance genes

Temporal dynamics and persistence of resistance genes to broad spectrum antibiotics in an urban community

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

Analysis of Antimicrobial Resistance in Bacterial Pathogens Recovered from Food and Human Sources: Insights from 639,087 Bacterial Whole-Genome Sequences in the NCBI Pathogen Detection Database

Context-Seq: CRISPR-Cas9 Targeted Nanopore Sequencing for Transmission Dynamics of Antimicrobial Resistance

Detection of Antibiotic Resistance Genes in Pseudomonas aeruginosa by Whole Genome Sequencing

More than just the gene: investigating expression using a non-native plasmid and host and its impact on resistance conferred by β-lactamase OXA-58 isolated from a hospital wastewater microbiome