EnviroDetaNet: Pretrained E(3)-equivariant Message-Passing Neural Networks with Multi-Level Molecular Representations for Organic Molecule Spectra Prediction

Tong Zhu,Yuzhi Xu,Daqian Bian,Cheng-Wei Ju,Fanyu Zhao,Pujun Xie,Yuanqing Wang,Wei Hu,Zhenrong Sun,John Zhang
DOI: https://doi.org/10.26434/chemrxiv-2024-scchg
2024-09-06
Abstract:Fast and accurate spectral prediction plays a crucial role in molecular design within fields such as pharmaceutical and materials science. Nevertheless, predicting molecular spectra typically requires quantum chemistry calculations, posing significant challenges for fast predictions and high-throughput screening. In this paper, we propose an equivariant, fast, and robust model, named EnviroDetaNet, which integrates molecular environment information. EnviroDetaNet employs an E(3)-equivariant message-passing neural network combining intrinsic atomic properties, spatial features, and environmental information, allowing it to comprehensively capture both local and global molecular information. Compared to state-of-the-art models, EnviroDetaNet excels in various predictive tasks and maintains high accuracy even with a 50% reduction in training data, demonstrating strong generalization capabilities. Ablation studies confirm that molecular environment information is crucial for improving model stability and accuracy. EnviroDetaNet also shows outstanding performance in spectral predictions for complex molecular systems, making it a powerful tool for accelerating molecular discovery.
Chemistry
What problem does this paper attempt to address?
The paper aims to address key challenges in molecular spectroscopy prediction, particularly the rapid and accurate prediction of organic molecular spectra in fields such as drug discovery and materials science. Traditionally, the prediction of molecular spectra relies on quantum chemistry calculations (such as Density Functional Theory (DFT) and Ab Initio Molecular Dynamics (AIMD)), which, although reliable, are computationally expensive, especially for real-time predictions or high-throughput screening. The paper proposes a novel equivariant model, EnviroDetaNet, which integrates molecular environmental information to efficiently and accurately predict molecular spectra. EnviroDetaNet is based on the E(3)-equivariant message passing neural network, combining intrinsic atomic properties, spatial features, and environmental information, thereby comprehensively capturing both local and global features of molecules. Compared to existing models, EnviroDetaNet excels in various prediction tasks and maintains high accuracy even when the training data is reduced by 50%, demonstrating strong generalization capabilities. Additionally, ablation studies confirm the importance of molecular environmental information in enhancing the model's stability and accuracy. Overall, EnviroDetaNet performs excellently in the spectral prediction of complex molecular systems, providing a powerful tool for accelerating molecular discovery.