Using Genetic Programming to Predict and Optimize Protein Function

Iliya Miralavy,Alexander Bricco,Assaf Gilad,Wolfgang Banzhaf
DOI: https://doi.org/10.48550/arXiv.2202.04039
2022-02-23
Abstract:Protein engineers conventionally use tools such as Directed Evolution to find new proteins with better functionalities and traits. More recently, computational techniques and especially machine learning approaches have been recruited to assist Directed Evolution, showing promising results. In this paper, we propose POET, a computational Genetic Programming tool based on evolutionary computation methods to enhance screening and mutagenesis in Directed Evolution and help protein engineers to find proteins that have better functionality. As a proof-of-concept we use peptides that generate MRI contrast detected by the Chemical Exchange Saturation Transfer contrast mechanism. The evolutionary methods used in POET are described, and the performance of POET in different epochs of our experiments with Chemical Exchange Saturation Transfer contrast are studied. Our results indicate that a computational modelling tool like POET can help to find peptides with 400% better functionality than used before.
Neural and Evolutionary Computing,Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to find new proteins with better functions in protein engineering. Specifically, the author proposes a computational genetic programming tool named POET, which aims to enhance the screening and mutation processes in directed evolution through evolutionary computing methods and help protein engineers discover more functional proteins. Taking the peptides that generate MRI contrast as an example, the paper demonstrates the effectiveness of POET. In particular, under the chemical exchange saturation transfer (CEST) contrast mechanism, POET can find peptides whose functionality is 400% higher than that of the previously used peptides. ### Main Objectives of the Paper 1. **Enhance Directed Evolution**: Assist directed evolution through computational models to reduce blindness and cost in experiments. 2. **Predict and Optimize Protein Function**: Use genetic programming techniques to predict protein functions and optimize protein sequences through evolutionary algorithms. 3. **Explore Protein Space Unexplored by Natural Evolution**: Explore a broader protein sequence space through computational models to discover new functional proteins. ### Specific Methods - **Genetic Programming (GP)**: POET uses genetic programming to learn important motifs in protein sequences and assign weights to these motifs to form a prediction model. - **Model Training**: POET trains the model from a dataset containing protein sequences and their CEST contrast values. - **Protein Optimization and Prediction**: Use the trained model to evaluate randomly generated protein sequences and select sequences with potentially high functionality. - **Wet - Experiment Verification**: Synthesize the predicted protein sequences in the laboratory and measure their CEST contrast values, and feed the results back into the dataset to further optimize the model. ### Results The results of the paper show that POET can find new peptides with significantly better functionality than existing peptides in a relatively short time, thus demonstrating the great potential of computational models in protein engineering. ### Summary By proposing the computational tool POET, this paper addresses the challenge of finding more functional proteins in protein engineering and shows the important role of computational methods in accelerating and optimizing the directed evolution process.