AptaGPT: Advancing aptamer design with a generative pre-trained language model

Shijian Ding,Xin Yang,Chi Ho Chan,Yuan Ma,Sifan Yu,Luyao Wang,Aiping Lyu,Baoting Zhang,Yuanyuan Yu,Ge Zhang
DOI: https://doi.org/10.1101/2024.05.23.594910
2024-05-28
Abstract:Aptamers, synthetic oligonucleotide ligands, have shown significant promise for therapeutic and diagnostic applications owing to their high specificity and affinity for target molecules. However, the conventional Systematic Evolution of Ligands by Exponential Enrichment (SELEX) for aptamer selection is time-consuming and often yields limited candidates. To address these limitations, we introduce AptaGPT, a novel computational strategy that leverages a Generative Pre-trained Transformer (GPT) model to design and optimize aptamers. By training on SELEX data from early rounds, AptaGPT generated a diverse array of aptamer sequences, which were then computationally screened for binding using molecular docking. The results of this study demonstrated that AptaGPT is an effective tool for generating potential high-affinity aptamer sequences, significantly accelerating the discovery process and expanding the potential for aptamer research. This study showcases the application of generative language models in bioengineering and provides a new avenue for rapid aptamer development.
Bioinformatics
What problem does this paper attempt to address?
The problem this paper attempts to address is the long time consumption and limited number of candidates in the traditional SELEX (Systematic Evolution of Ligands by Exponential Enrichment) method during the aptamer screening process. Specifically: 1. **Time Consumption**: The traditional SELEX method requires multiple rounds of iterative screening, each of which takes a long time, making the entire process very time-consuming. 2. **Limited Number of Candidates**: Although SELEX can screen out high-affinity aptamers, the number of candidates obtained after each round of screening is limited, which restricts the diversity and range of choices for subsequent research. To solve these problems, the authors propose AptaGPT, a novel computational strategy based on the Generative Pre-trained Transformer (GPT) model, for designing and optimizing aptamers. By training on early SELEX data, AptaGPT can generate a large number of diverse aptamer sequences and perform computational screening through molecular docking, significantly accelerating the discovery process of aptamers and expanding the potential for aptamer research.