Deep learning uncovers sequence-specific amplification bias in multi-template PCR

Andreas L Gimpel,Bowen Fan,Dexiong Chen,Laetitia O. D. Wolfle,Max Horn,Laetitia Meng-Papaxanthos,Philipp L. Antkowiak,Wendelin J. Stark,Beat Christen,Karsten Borgwardt,Robert N Grass
DOI: https://doi.org/10.1101/2024.09.20.614030
2024-09-20
Abstract:Multi-template polymerase chain reaction is a key step in many amplicon sequencing protocols enabling parallel amplification of diverse DNA molecules sharing common adapters in applications, ranging as wide as quantitative molecular biology and DNA data storage. However, this process results in a skewed amplicon abundance, due to sequence-specific amplification biases. In this study, one-dimensional convolutional neural networks (1D-CNNs) were trained on synthetic DNA pools to learn the PCR amplification efficiency of individual templates. These 1D-CNN models can predict poorly amplifying templates based solely on sequence information, achieving an AUROC/AUPRC of up to 0.88/0.44 with very imbalanced prevalence of 2%, thereby greatly outperforming baseline models relying only on GC content and nucleotide frequency as predictors. A new, general-purpose framework for interpreting deep learning models, termed CluMo provides mechanistic insights into the amplification biases. Most strikingly, specific amplification reactions were identified as suffering from adaptor template self-priming a mechanism previously disregarded in PCR.
Molecular Biology
What problem does this paper attempt to address?