3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Sungjun Cho,Dae-Woong Jeong,Sung Moon Ko,Jinwoo Kim,Sehui Han,Seunghoon Hong,Honglak Lee,Moontae Lee
2023-09-08
Abstract:Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.
Machine Learning,Artificial Intelligence,Chemical Physics
What problem does this paper attempt to address?
The paper primarily addresses the issue of molecular property prediction, particularly in fields such as drug discovery and material design, focusing on how to effectively utilize large-scale unlabeled molecular data to improve prediction performance. The paper proposes a new framework called D&D (Denoise and Distill), aimed at solving the following core problems: 1. **How to efficiently utilize a large amount of unlabeled molecular data**: Since obtaining true property labels of molecules is costly, it becomes crucial to extract useful information from unlabeled data through self-supervised learning methods. 2. **Limitations of existing 2D graph pre-training methods**: Although pre-training methods based on 2D graph structures can learn certain molecular representations, the performance improvement in downstream tasks is not significant, especially when data augmentation on graph structures may disrupt their topological structure. 3. **Advantages and challenges of 3D structure pre-training**: Utilizing 3D molecular structures for pre-training can significantly improve performance, but it requires accurate 3D coordinate information, which is often difficult to obtain on a large scale in practical applications. To address the above issues, the paper proposes the D&D framework, which includes two steps: - **Step 1 (Denoise)**: First, a 3D teacher model is pre-trained through a denoising task of 3D molecular conformations. This method involves adding Gaussian noise and then attempting to restore the original state, which can approximately learn the force field information in physical space. - **Step 2 (Distill)**: Next, the knowledge of the 3D teacher model is transferred to a 2D student model, i.e., a 2D graph encoder, through cross-modal knowledge distillation. In this way, the 2D model can benefit from the knowledge brought by 3D pre-training even when only 2D molecular graphs are available. In this manner, the D&D framework not only leverages the advantages brought by 3D information but also avoids the need for expensive and hard-to-obtain 3D coordinate information in downstream tasks. Experimental results show that the 2D graph encoder under the D&D framework can effectively mimic the behavior of the 3D conformation encoder and exhibits superior performance and label efficiency in various molecular property prediction tasks.