Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models

Ben Fauber

2024-06-27

Abstract:We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The paper primarily explores how to accurately predict the affinity of ligand-protein interactions (LPI), also known as drug-target interactions (DTI). The current methods face challenges in predicting these affinities, which are crucial for molecular screening and optimization in drug discovery processes. In this study, the authors used pretrained small language models (SLMs) and fine-tuned them with domain-specific data instructions to achieve accurate predictions of various affinity values, using only the SMILES string of the ligand and the amino acid sequence of the target protein as inputs. Compared to existing machine learning (ML) and free energy perturbation (FEP) methods, this approach demonstrates significant improvements in predicting LPI affinities. The paper indicates that this accurate predictive ability can accelerate drug discovery activities targeting challenging therapeutic targets. The study also includes a review of existing works such as machine learning, deep learning, and physics-based methods like FEP, highlighting the limitations of these methods, especially when dealing with continuous rather than binary affinity data. Additionally, the paper introduces the construction and formatting of the dataset, as well as the fine-tuning process of the underlying pretrained language models (such as the OPT series). By increasing the number of training instances, the model's performance is enhanced, and the prediction accuracy for different affinity values is improved. Ultimately, these results suggest that SLMs fine-tuned with domain-specific instructions can effectively predict LPI affinities, providing a powerful tool for drug development.

Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models

On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction

BAPULM: Binding Affinity Prediction using Language Models

PLM-interact: extending protein language models to predict protein-protein interactions

Learning Binding Affinities via Fine-tuning of Protein and Ligand Language Models

Improved prediction of ligand-protein binding affinities by meta-modeling

Prediction of protein–ligand binding affinity via deep learning models

Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?

Predicting the Protein-Ligand Affinity from Molecular Dynamics Trajectories

SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors

Hybrid protein-ligand binding residue prediction with protein language models: Does the structure matter?

Development and evaluation of a deep learning model for protein-ligand binding affinity prediction

Machine Learning for Sequence and Structure-Based Protein–Ligand Interaction Prediction

Protein-Protein Interaction Prediction is Achievable with Large Language Models

Fusing Sequence and Structural Knowledge by Heterogeneous Models to Accurately and Interpretively Predict Drug–Target Affinity

DTI-LM: Language Model Powered Drug-Target Interaction Prediction

Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

DeepLPI: a novel deep learning-based model for protein–ligand interaction prediction for drug repurposing

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches

Can molecular dynamics simulations improve predictions of protein-ligand binding affinity with machine learning?

DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity