Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Jiahao Qiu,Hui Yuan,Jinghong Zhang,Wentao Chen,Huazheng Wang,Mengdi Wang
2024-01-08
Abstract:While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the combination of local search and bandit learning method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient and able to find top designs using reasonably small mutation counts.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently explore the vast protein sequence space in protein sequence optimization to find protein designs with optimal functional characteristics. Specifically, the paper focuses on gradually screening protein sequences with higher fitness (such as stability, binding affinity or catalytic activity) by adding mutations and recombining mutations on the basis of a given wild - type or lead protein sequence. Since the protein sequence space is extremely large, traditional screening methods are inefficient and it is difficult to find the optimal design, so new algorithms need to be developed to improve the efficiency of this process. The paper proposes a tree - search - based multi - armed bandit learning method. This method starts from the initial sequence to expand the tree and uses the multi - armed bandit machine learning model for guidance during the expansion process. Through this method, the local sequence space can be effectively explored while keeping the total number of mutations small, so as to quickly find a near - optimal design. In addition, the paper also provides a theoretical analysis, proving that this method can efficiently find an approximately optimal design under specific assumptions, and gives the Bayesian regret bound, further verifying the effectiveness of the method.