PlasmidGPT: a generative framework for plasmid design and annotation

BIN SHAO
DOI: https://doi.org/10.1101/2024.09.30.615762
2024-10-01
Abstract:We introduce PlasmidGPT, a generative language model pretrained on 153k engineered plasmid sequences from Addgene. PlasmidGPT generates de novo sequences that share similar characteristics with engineered plasmids but show low sequence identity to the training data. We demonstrate its ability to generate plasmids in a controlled manner based on the input sequence or specific design constraint. Moreover, our model learns informative embeddings of both engineered and natural plasmids, allowing for efficient prediction of a wide range of sequence-related attributes.
Bioinformatics
What problem does this paper attempt to address?