Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Yuyan Ni,Shikun Feng,Wei-Ying Ma,Zhi-Ming Ma,Yanyan Lan
2023-11-03
Abstract:While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.
Biomolecules,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the issue of insufficient physical interpretability in molecular pretraining methods, particularly in improving their application effectiveness in the field of drug discovery. Specifically, the paper focuses on the limitations of current molecular pretraining methods in capturing the explanatory factors behind observed data, which leads to limited generalization ability and robustness of the learned representations. To address the aforementioned issues, the authors propose a new molecular pretraining method called "Sliced Denoising" (SliDe). This method is based on the intramolecular potential energy theory in classical mechanics and introduces a novel noise strategy by perturbing bond lengths, bond angles, and dihedral angles to achieve better conformation sampling. Additionally, SliDe introduces a random slicing technique to avoid the computationally expensive Jacobian matrix, which is crucial for estimating the force field. By aligning with physical principles, SliDe improves the accuracy of force field estimation by 42% compared to existing denoising methods and outperforms traditional baseline methods in various molecular property prediction tasks. Therefore, SliDe not only enhances physical consistency but also excels in downstream tasks, achieving significant results in molecular property prediction tasks on the QM9 and MD17 datasets.