Mapping AAV capsid sequences to functions through function-guided in silico evolution

Hanyu Zheng,Binjie Guo,Aisheng Mo,Hongyan Wei,Yile Wu,Xurong Lin,Haohan Jiang,Hengguang Li,Yunshuo Zhang,Zhuoyuan Song,Xuebin Ni,Yan Huang,Xiaosong Gu,Bin Yu,Ningtao Cheng,Xuhua Wang
DOI: https://doi.org/10.1101/2024.10.11.617764
2024-10-11
Abstract:Artificial intelligence (AI) has been suggested to facilitate time- and cost-effective functional engineering of adeno-associated virus (AAV) capsid sequences. Nevertheless, an AI-empowered approach to identify AAV capsid sequence-to-multifunction relationships remains elusive. To overcome this challenge, we propose a machine-intelligent design method to map an AAV capsid sequence to multiple functions, thereby enabling direct in silico engineering of AAV capsids. To fuse multiple functions into a single capsid sequence, a heuristic algorithm coupled with contrastive learning and reinforcement learning, named function-guided evolution (FE), was introduced to steer further evolution of the high-performing capsid sequences generated by a naive language model toward functions. We then illustrated the evolutionary mechanism of the FE approach for function-guided generation of capsid sequences. Further optimization steers the evolution toward desired functions within a function-guided landscape. Despite the constraint of datasets of only 129 entries, we successfully constructed a model to map AAV capsid sequences to multiple functions of improved viability coupled with central nervous system (CNS) tropism. In vivo experiments confirmed that two of the top eight engineered variants exhibited enhanced viability and remarkable CNS tropism. This interpretable machine-intelligent design method represents a pioneering effort enabling direct in silico engineering of AAV capsids for effective gene delivery.
Bioengineering
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to design the adeno - associated virus (AAV) capsid protein sequences with multiple functions. Specifically, the research aims to overcome the following challenges: 1. **A single capsid sequence mapping to multiple functions**: Currently, there is a lack of a framework that can map a single AAV capsid sequence to multiple functions that are crucial for gene delivery. 2. **Explanatory power of the evolutionary mechanism**: It is necessary to understand how capsid sequences evolve into functional entities in computational models, so as to optimize the models to map sequences to the desired functions. To address these challenges, the researchers developed a generative artificial intelligence architecture named ALICE (Artificial Intelligence Custom Engineering). ALICE achieves its goals through the following steps: 1. **Pre - training**: Pre - train the generative language model using a data set containing 2.89 million peptide sequences to understand the language features of natural peptides. 2. **Semantic tuning**: Through transfer learning, further tune the model on a data set containing 72,753 capsid sequences, making it learn the biological characteristics of AAV capsids. 3. **Ranking and filtering process**: Rank the generated sequences according to their multi - functional properties and screen out the best - performing sequences. 4. **Function - Oriented Evolution (FE)**: Combine heuristic algorithms, contrastive learning and reinforcement learning to guide high - performing sequences to evolve towards the desired functions. 5. **Elite promotion**: Select the best candidate sequences through comprehensive evaluation. 6. **Wet - experiment verification**: Verify the effectiveness of the selected AAV variants in in - vivo experiments. Through this series of steps, ALICE has successfully generated AAV variants with enhanced viability and central nervous system tropism. In particular, two newly - designed variants, AAV.ALICE - N2 and AAV.ALICE - N6, show significant improvements, with the viability of AAV.ALICE - N2 increased by approximately two times and the transduction efficiency in the central nervous system increased by 372 times. In conclusion, this research provides an innovative method that can directly design AAV capsid proteins with multiple functions computationally, providing a new tool for effective gene delivery.