GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models

Seongwon Kim,Parisa Mollaei,Akshay Antony,Rishikesh Magar,Amir Barati Farimani

2023-10-31

Abstract:With the rise of Transformers and Large Language Models (LLMs) in Chemistry and Biology, new avenues for the design and understanding of therapeutics have opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence datasets. In this paper, we developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs). GPCRs are the target of over one-third of FDA-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship between amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, E/DRY). By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights, and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.

Machine Learning,Biomolecules

What problem does this paper attempt to address?

The main objective of this paper is to develop a model named GPCR-BERT to deeply understand the higher-order interactions in the sequence design of G protein-coupled receptors (GPCRs) and to explore the relationship between conserved motifs in these receptors and their functions. Specifically, the paper aims to address the following key issues: 1. **Correlation between conserved region variations and other amino acids**: Investigate the correlation between variations within conserved motifs in GPCRs (such as NPxxY, CWxP, and E/DRY) and amino acids in other sequences. 2. **Possibility of predicting the complete sequence from partial sequences**: Explore whether it is possible to predict the entire amino acid sequence based on partially known sequences of GPCRs. 3. **Identification of key amino acids**: Identify which amino acids contribute the most to conformational changes in GPCRs and may play important roles in receptor function. To achieve the above objectives, the researchers adopted a large language model (LLM)-based approach, specifically utilizing the pre-trained protein language model Prot-BERT and fine-tuning it for GPCRs. By analyzing attention weights and hidden states, the researchers were able to reveal the roles of different amino acids in determining the specific amino acid types within conserved motifs. Additionally, the paper compared the performance of GPCR-BERT with other machine learning models (such as the original BERT and SVM), demonstrating the superior performance of GPCR-BERT in prediction tasks. Through this series of studies, the paper not only provides new insights into the sequence design of GPCRs but also demonstrates how advanced natural language processing techniques can be used to understand and predict the functional characteristics of biomolecules. This offers an important theoretical foundation and technical means for future drug design and protein engineering.

GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models

GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models

DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins

The Exploration of the Conformational Space of G Protein-Coupled Receptors

Revolutionizing GPCR–ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery

Revolutionizing GPCR-Ligand Predictions: DeepGPCR with experimental Validation for High-Precision Drug Discovery

GPCR-ModSim: A comprehensive web based solution for modeling G-protein coupled receptors

PeptideBERT: A Language Model based on Transformers for Peptide Property Prediction

Molecular Recognition of Metabotropic Glutamate Receptor Type 1 (mglur1): Synergistic Understanding with Free Energy Perturbation and Linear Response Modeling

De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

From PSSM to Pre-Trained Language Models

Exploiting Pretrained Biochemical Language Models for Targeted Drug Design

Study on human GPCR-inhibitor interactions by proteochemometric modeling.

Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model

GPCR-IPL score: multilevel featurization of GPCR–ligand interaction patterns and prediction of ligand functions from selectivity to biased activation

State-specific Peptide Design Targeting G Protein-coupled Receptors

Fine-Tuned Deep Transfer Learning Models for Large Screenings of Safer Drugs Targeting Class A GPCRs

A Hybrid Approach to Structure and Function Modeling of G Protein-Coupled Receptors

Homologous G Protein-Coupled Receptors Boost the Modeling and Interpretation of Bioactivities of Ligand Molecules.

High end GPCR design: crafted ligand design and druggability analysis using protein structure, lipophilic hotspots and explicit water networks

DeepREAL: a deep learning powered multi-scale modeling framework for predicting out-of-distribution ligand-induced GPCR activity