Deep Molecular Representation Learning via Fusing Physical and Chemical Information

Shuwen Yang,Ziyao Li,Guojie Song,Lingsheng Cai
2021-11-28
Abstract:Molecular representation learning is the first yet vital step in combining deep learning and molecular science. To push the boundaries of molecular representation learning, we present PhysChem, a novel neural architecture that learns molecular representations via fusing physical and chemical information of molecules. PhysChem is composed of a physicist network (PhysNet) and a chemist network (ChemNet). PhysNet is a neural physical engine that learns molecular conformations through simulating molecular dynamics with parameterized forces; ChemNet implements geometry-aware deep message-passing to learn chemical / biomedical properties of molecules. Two networks specialize in their own tasks and cooperate by providing expertise to each other. By fusing physical and chemical information, PhysChem achieved state-of-the-art performances on MoleculeNet, a standard molecular machine learning benchmark. The effectiveness of PhysChem was further corroborated on cutting-edge datasets of SARS-CoV-2.
Quantitative Methods,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of molecular representation learning. Specifically, its goal is to improve methods for molecular representation learning by integrating physical and chemical information of molecules. To achieve this, the researchers propose a novel neural network architecture called PhysChem, which consists of two parts: the Physicist Network (PhysNet) and the Chemist Network (ChemNet). - **Physicist Network (PhysNet)**: This part simulates molecular dynamics to learn the conformations of molecules. It learns the positions and momenta of atoms in a molecule through parameterized forces and simulates according to classical mechanics laws. PhysNet does not require conformation data of the target molecules, making it more versatile in handling cases such as drug candidates generated by neural networks. - **Chemist Network (ChemNet)**: This part uses a message-passing framework to capture the properties of atoms and chemical bonds, thereby learning chemical/biomedical properties. ChemNet generates messages from atomic states and local geometric structures and updates the states of atoms and chemical bonds. These two networks collaborate by sharing information. PhysNet uses the chemical bond states from ChemNet to generate torsional forces, while ChemNet utilizes the local geometric structures of intermediate conformations provided by PhysNet. Experimental results in the paper show that PhysChem outperforms existing techniques on multiple benchmark datasets (including MoleculeNet and SARS-CoV-2 related datasets), achieving state-of-the-art performance in both molecular conformation learning and property prediction tasks. In summary, the main contribution of this paper is the proposal of a new molecular representation learning method—PhysChem, which effectively integrates physical and chemical information to improve the quality of molecular representations, thereby enhancing the performance of downstream tasks such as drug discovery.