Abstract:This study introduces a novel Bayesian Optimization (BO) method to support the design and optimization of bioactive peptide sequences in the context of a fully automated closed-loop Design-Make-Test (DMT) pipeline. Using the major histocompatibility complex class I receptor system as test case, we showed that BO is capable to efficiently navigate vast sequence spaces. Starting from a single peptide-lead sequence in the $\mu$M IC50 range, the method is able to optimize a peptide sequence to its optimal binding affinity in less than 5 DMT cycles, with 96 peptide sequences per batch. We extensively evaluated its performance, in various conditions and with different parameters, providing valuable insights for peptide optimization tasks in future closed-loop DMT environments. Different sequence- and structure-based initialization strategies were also tested, to generate the initial batch of peptide sequences, as well as different molecular fingerprints and protein language models. Additionally, the method developed here can natively handle various peptide sequence lengths and scaffolds (e.g. macrocycles) and support any arbitrary non-standard amino acids or residue modifications. The source code of our method, Mobius, is publicly available under the Apache license at https://git.scicore.unibas.ch/schwede/mobius.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently explore and optimize the peptide sequence space in protein - peptide binding optimization. Specifically, the research introduced a new Bayesian Optimization (BO) method to support the design and optimization of bioactive peptide sequences in a fully - automatic closed - loop Design - Make - Test (DMT) pipeline. By using the major histocompatibility complex class I receptor system as a test case, it was demonstrated that the BO method can effectively navigate the vast sequence space and, starting from a single peptide lead sequence (in the micromolar IC50 range), optimize the peptide sequence to its optimal binding affinity within less than 5 DMT cycles with 96 peptide sequences per batch. In addition, this method can handle peptide sequences and scaffolds of different lengths (such as macrocycles) and support any arbitrary non - standard amino acids or residue modifications. ### Key Point Summary: 1. **Objective**: Develop an efficient Bayesian optimization method for the optimization of protein - peptide binding. 2. **Method**: Utilize the Bayesian optimization method combined with sequence - or structure - based strategies to design and optimize peptide sequences in a fully - automatic closed - loop DMT pipeline. 3. **Test Case**: Use the major histocompatibility complex class I receptor system as a test case to verify the effectiveness of the method. 4. **Performance Evaluation**: The performance of the method was extensively evaluated under different conditions and parameters, providing valuable insights for future peptide optimization tasks in a closed - loop DMT environment. 5. **Innovation**: This method can handle peptide sequences and scaffolds of different lengths and support non - standard amino acids or residue modifications, having high flexibility and versatility. ### Formula Explanation: - **Gaussian Process Regression (GPR) in Bayesian Optimization**: - Gaussian process regression is a non - parametric regression method used to predict the binding of peptides to specific MHC alleles or other protein targets. Its core lies in defining a mean function $ m(x) $ and a positive - definite covariance function $ k(x, x') $. - The mean function is usually set to zero, i.e., $ m(x)=0 $. - The covariance function $ k(x, x') $ controls the shape of the function distribution, and common forms include the radial basis function (RBF) and Tanimoto similarity (TS) kernel functions. - **RBF Kernel Function**: \[ k_{\text{RBF}}(x, x')=\alpha \exp \left(-\frac{\|x - x'\|^{2}}{2l^{2}}\right) \] where $ \alpha $ and $ l $ are the scaling factor and length scale respectively, controlling the smoothness and overall variance of the covariance matrix. - **Tanimoto Similarity Kernel Function**: \[ k_{\text{TS}}(x, x')=\alpha \frac{\sum_{j = 1}^{n}x_j x'_j}{\sum_{j = 1}^{n}x_j^{2}+\sum_{j = 1}^{n}x'^{2}-\sum_{j = 1}^{n}x_j x'_j} \] where $ \alpha $ is the scaling factor and $ n $ is the size of the input vector. Through the application of these methods and formulas, this research shows how to achieve efficient and accurate optimization in complex peptide sequence optimization tasks.

Combining Bayesian optimization with sequence- or structure-based strategies for optimization of protein-peptide binding

Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design

Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update

ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution

Protein Sequence Design with Batch Bayesian Optimisation

Protocol for iterative optimization of modified peptides bound to protein targets

An integrative approach to protein sequence design through multiobjective optimization

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design

Diagnosing and fixing common problems in Bayesian optimization for molecule design

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments

Batched Bayesian optimization with correlated candidate uncertainties

AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation

Computer-aided multi-objective optimization in small molecule discovery

A Novel Multi-objectivisation Approach for Optimising the Protein Inverse Folding Problem

De Novo Design of Peptide Binders to Conformationally Diverse Targets with Contrastive Language Modeling

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

A novel multi-objective metaheuristic algorithm for protein-peptide docking and benchmarking on the LEADS-PEP dataset

Improved design and screening of high bioactivity peptides for drug discovery

Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization

Transfer learning Bayesian optimization for competitor DNA molecule design for use in diagnostic assays

Scalable protein design using optimization in a relaxed sequence space