Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Jiahao Qiu,Hui Yuan,Jinghong Zhang,Wentao Chen,Huazheng Wang,Mengdi Wang

2024-01-08

Abstract:While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the combination of local search and bandit learning method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient and able to find top designs using reasonably small mutation counts.

Biomolecules,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently explore the vast protein sequence space in protein sequence optimization to find protein designs with optimal functional characteristics. Specifically, the paper focuses on gradually screening protein sequences with higher fitness (such as stability, binding affinity or catalytic activity) by adding mutations and recombining mutations on the basis of a given wild - type or lead protein sequence. Since the protein sequence space is extremely large, traditional screening methods are inefficient and it is difficult to find the optimal design, so new algorithms need to be developed to improve the efficiency of this process. The paper proposes a tree - search - based multi - armed bandit learning method. This method starts from the initial sequence to expand the tree and uses the multi - armed bandit machine learning model for guidance during the expansion process. Through this method, the local sequence space can be effectively explored while keeping the total number of mutations small, so as to quickly find a near - optimal design. In addition, the paper also provides a theoretical analysis, proving that this method can efficiently find an approximately optimal design under specific assumptions, and gives the Bayesian regret bound, further verifying the effectiveness of the method.

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Protein Sequence Design with Batch Bayesian Optimisation

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments

Improving few-shot learning-based protein engineering with evolutionary sampling

Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling

Proximal Exploration for Model-guided Protein Sequence Design

Active Finetuning Protein Language Model: A Budget-Friendly Method for Directed Evolution

Self-play reinforcement learning guides protein engineering

Evolutionary context-integrated deep sequence modeling for protein engineering

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

Designing diverse and high-performance proteins with a large language model in the loop

Adaptive machine learning for protein engineering

Latent-based Directed Evolution accelerated by Gradient Ascent for Protein Sequence Design

Combining Bayesian optimization with sequence- or structure-based strategies for optimization of protein-peptide binding

Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design

Protein Design by Integrating Machine Learning with Quantum Annealing and Quantum-inspired Optimization

Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation