3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Taojie Kuang,Yiming Ren,Zhixiang Ren
2024-06-28
Abstract:Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
This paper proposes a solution to the problem of 3D molecular property prediction. Existing deep learning methods have limitations in utilizing 3D spatial information, leading to insufficient molecular encoding, where one encoding may correspond to multiple different molecules. Moreover, most methods only focus on the most stable 3D conformations, ignoring other possible conformations that exist in reality. To address these issues, the paper introduces the 3D-Mol framework, which better extracts geometric information through a three-level graph representation and pre-trains on 20 million unlabeled data using contrastive learning. 3D-Mol considers molecular conformations with the same topological structure as positive pairs, based on the similarity of 3D conformation descriptors and fingerprints weighted, while different conformations are considered as negative pairs. Experimental results demonstrate that 3D-Mol outperforms various state-of-the-art methods in 7 benchmark tests, proving its superior performance in molecular property prediction.