Abstract:Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA , a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA .

Geometry-based BERT: an experimentally validated deep learning model for molecular property prediction in drug discovery

MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction

FG-BERT: a Generalized and Self-Supervised Functional Group-Based Molecular Representation Learning Framework for Properties Prediction.

MolRoPE-BERT: an Enhanced Molecular Representation with Rotary Position Embedding for Molecular Property Prediction

Knowledge-based BERT: a method to extract molecular features such as computational chemists

Knowledge-based BERT: a Method to Extract Molecular Features Like Computational Chemists

G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction

SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction

Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration

A Spatial-Temporal Gated Attention Module for Molecular Property Prediction Based on Molecular Geometry

Geometric deep learning for molecular property predictions with chemical accuracy across chemical space

Structure-based drug design with geometric deep learning

Geometry-enhanced molecular representation learning for property prediction

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Geometric Deep Learning for Drug Discovery

CIMG-BERT: Pretraining Bidirectional Transformers with Chemistry Knowledge for Molecular Property Prediction

Geometric Deep Learning for Structure-Based Ligand Design

Geometry-Augmented Molecular Representation Learning for Property Prediction

A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design

DMPNN-Bert: a deep learning architecture for molecular property prediction.

LGGA-MPP: Local Geometry-Guided Graph Attention for Molecular Property Prediction