Abstract:Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.

What problem does this paper attempt to address?

The paper primarily addresses the issue of molecular property prediction, particularly in fields such as drug discovery and material design, focusing on how to effectively utilize large-scale unlabeled molecular data to improve prediction performance. The paper proposes a new framework called D&D (Denoise and Distill), aimed at solving the following core problems: 1. **How to efficiently utilize a large amount of unlabeled molecular data**: Since obtaining true property labels of molecules is costly, it becomes crucial to extract useful information from unlabeled data through self-supervised learning methods. 2. **Limitations of existing 2D graph pre-training methods**: Although pre-training methods based on 2D graph structures can learn certain molecular representations, the performance improvement in downstream tasks is not significant, especially when data augmentation on graph structures may disrupt their topological structure. 3. **Advantages and challenges of 3D structure pre-training**: Utilizing 3D molecular structures for pre-training can significantly improve performance, but it requires accurate 3D coordinate information, which is often difficult to obtain on a large scale in practical applications. To address the above issues, the paper proposes the D&D framework, which includes two steps: - **Step 1 (Denoise)**: First, a 3D teacher model is pre-trained through a denoising task of 3D molecular conformations. This method involves adding Gaussian noise and then attempting to restore the original state, which can approximately learn the force field information in physical space. - **Step 2 (Distill)**: Next, the knowledge of the 3D teacher model is transferred to a 2D student model, i.e., a 2D graph encoder, through cross-modal knowledge distillation. In this way, the 2D model can benefit from the knowledge brought by 3D pre-training even when only 2D molecular graphs are available. In this manner, the D&D framework not only leverages the advantages brought by 3D information but also avoids the need for expensive and hard-to-obtain 3D coordinate information in downstream tasks. Experimental results show that the 2D graph encoder under the D&D framework can effectively mimic the behavior of the 3D conformation encoder and exhibits superior performance and label efficiency in various molecular property prediction tasks.

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Pre-training with fractional denoising to enhance molecular property prediction

Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Fractional Denoising for 3D Molecular Pre-training

Automated 3D Pre-Training for Molecular Property Prediction

Unified 2D and 3D Pre-Training of Molecular Representations

Leveraging 2D molecular graph pretraining for improved 3D conformer generation with graph neural networks

3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

3D Molecular Pretraining via Localized Geometric Generation

Coordinating Cross-modal Distillation for Molecular Property Prediction

3D Infomax improves GNNs for Molecular Property Prediction

Data-Free Adversarial Distillation

Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction

Dual-view Molecular Pre-training

May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations

3D graph contrastive learning for molecular property prediction

Two-Stage Pretraining for Molecular Property Prediction in the Wild

Pre-training Molecular Graph Representation with 3D Geometry