Multimodal Protein-Ligand Contrastive Pretraining for Effective and Efficient Drug Discovery
Zhen Wang,Zhanfeng Wang,Maohua Yang,Long Pang,Fangyuan Nie,Siyuan Liu,Zhifeng Gao,Guojiang Zhao,Xiaohong Ji,Dandan Huang,Zhengdan Zhu,Dongdong Li,Yannan Yuan,Hang Zheng,Linfeng Zhang,Guolin Ke,Dongdong Wang,Feng Yu
DOI: https://doi.org/10.1101/2024.08.22.609123
2024-01-01
Abstract:Accurate modeling of protein-ligand interactions (PLIs) is critical for drug discovery. Despite advancements, most existing PLIs modeling methods rely on single-modal data, restricting their effectiveness and applicability. In this study, we introduce Uni-Clip, a contrastive learning framework that incorporates multi-modalities, specifically structure and residue features of proteins, along with conformation and graph features of ligands. Through optimization with specifically designed CF-InfoNCE loss, Uni-Clip achieves comprehensive representations for PLIs. Uni-Clip demonstrates superior performance in benchmark evaluations on widely acknowledged datasets, LIT-PCBA and DUD-E, achieving a 147% and 218% improvements in enrichment factors at 1% compared to baselines. Furthermore, Uni-Clip serves as a practical tool for various applications in drug discovery, as demonstrated through virtual screening for a flat and challenging protein target GPX4, where it identified potent inhibitors with an IC50 of 4.17 uM, and through target fishing for benzbromarone, which highlights the potential for repurposing benzbromarone in cancer therapy. ### Competing Interest Statement The authors have declared no competing interest.