Abstract:Generative models trained on unlabeled protein datasets have demonstrated a remarkable ability to predict some biological functions without any task-specific training data. However, this capability does not extend to all relevant functions and, in many cases, the unsupervised model still underperforms task-specific, supervised baselines. We hypothesize that this is due to a fundamental "alignment gap" in which the rules learned during unsupervised training are not guaranteed to be related to the function of interest. Here, we demonstrate how to provide protein generative models with useful task-specific information without losing the rich, general knowledge learned during pretraining. Using an optimization task called Direct Preference Optimization (DPO), we align a structure-conditioned language model to generate stable protein sequences by encouraging the model to prefer stabilizing over destabilizing variants given a protein backbone structure. Our resulting model, ProteinDPO, is the first structure-conditioned language model preference-optimized to experimental data. ProteinDPO achieves competitive stability prediction and consistently outperforms both unsupervised and finetuned versions of the model. Notably, the aligned model also performs well in domains beyond its training data to enable absolute stability prediction of large proteins and binding affinity prediction of multi-chain complexes, while also enabling single-step stabilization of diverse backbones. These results indicate that ProteinDPO has learned generalizable information from its biophysical alignment data.

What problem does this paper attempt to address?

The paper mainly explores how to improve the ability of unsupervised trained protein generation models in biological function prediction, especially stability prediction. The researchers proposed that although these models have certain predictive abilities in certain biological functions, their performance is still inferior to supervised baseline models in function-specific tasks. They believe that this is because the rules learned in unsupervised training may not be completely relevant to the interested biological functions, resulting in an "alignment gap". To address this issue, the paper introduces a method called Direct Preference Optimization (DPO), which allows structurally conditioned language models to obtain task-relevant useful information while maintaining their pre-training knowledge. Through DPO, the model is adjusted to generate stable protein sequences, encouraging the model to choose stable variants rather than unstable ones that align with the backbone structure of proteins. The model generated by this approach, ProteinDPO, performs better in stability prediction compared to unadjusted and fine-tuned models, and also excels in domains outside the training data, such as predicting the absolute stability of large proteins and the binding affinity of multi-chain complexes. In the paper, the researchers trained the model using experimental stability data and compared it with the supervised fine-tuning (SFT) method. The results show that ProteinDPO not only outperforms unsupervised and fine-tuned models in stability scoring, but also demonstrates better generalization ability when dealing with unseen protein structures and complex tasks, such as predicting the thermal stability of antibodies and the binding affinity of protein-protein complexes. This suggests that ProteinDPO has learned generalizable rules from experimental biophysical alignment data. In conclusion, this paper proposes a novel approach, DPO, to reduce the alignment gap between unsupervised protein generation models and actual biological functions, thereby improving the performance of the model in multiple biological prediction tasks.

Aligning protein generative models with experimental fitness via Direct Preference Optimization

Global-Context Aware Generative Protein Design

Generative De Novo Protein Design with Global Context

Decomposed Direct Preference Optimization for Structure-Based Drug Design

Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

Inverse Protein Folding Using Deep Bayesian Optimization

Active Finetuning Protein Language Model: A Budget-Friendly Method for Directed Evolution

DNDesign: Enhancing Physical Understanding of Protein Inverse Folding Model via Denoising

InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions

AlphaFold meets de novo drug design: leveraging structural protein information in multi-target molecular generative models

Preference optimization of protein language models as a multi-objective binder design paradigm

Protein Language Model Fitness Is a Matter of Preference

Building Confidence in Deep Generative Protein Design

An integrative approach to protein sequence design through multiobjective optimization

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds

Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods

A Novel Multi-objectivisation Approach for Optimising the Protein Inverse Folding Problem

Orthogonal Finetuning for Direct Preference Optimization

Protein Structure Prediction Using A New Optimization-Based Evolutionary and Explainable Artificial Intelligence Approach

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space