SE3Set: Harnessing equivariant hypergraph neural networks for molecular representation learning

Hongfei Wu,Lijun Wu,Guoqing Liu,Zhirong Liu,Bin Shao,Zun Wang
2024-05-26
Abstract:In this paper, we develop SE3Set, an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning. Hypergraphs are not merely an extension of traditional graphs; they are pivotal for modeling high-order relationships, a capability that conventional equivariant graph-based methods lack due to their inherent limitations in representing intricate many-body interactions. To achieve this, we first construct hypergraphs via proposing a new fragmentation method that considers both chemical and three-dimensional spatial information of molecular system. We then design SE3Set, which incorporates equivariance into the hypergragh neural network. This ensures that the learned molecular representations are invariant to spatial transformations, thereby providing robustness essential for accurate prediction of molecular properties. SE3Set has shown performance on par with state-of-the-art (SOTA) models for small molecule datasets like QM9 and MD17. It excels on the MD22 dataset, achieving a notable improvement of approximately 20% in accuracy across all molecules, which highlights the prevalence of complex many-body interactions in larger molecules. This exceptional performance of SE3Set across diverse molecular structures underscores its transformative potential in computational chemistry, offering a route to more accurate and physically nuanced modeling.
Machine Learning,Artificial Intelligence,Computational Physics
What problem does this paper attempt to address?
This paper focuses on how to improve molecular representation learning, especially for handling complex many-body interactions. The authors propose a novel method called SE3Set, which is an SE(3)-equivariant hypergraph neural network architecture designed for molecular systems. Traditional graph neural networks (GNN) are useful in molecular modeling but often struggle to capture high-order interactions. Hypergraph neural networks (HGNN), on the other hand, connect multiple vertices with hyperedges and can better represent multi-body phenomena such as electron delocalization and collective vibrations. SE3Set first constructs a hypergraph through a new fragmentation approach that combines chemical and 3D spatial information of molecules. Then, they design an SE3Set model that introduces equivariance into the hypergraph neural network, ensuring that the learned molecular representation is invariant to spatial transformations and thus improving the accuracy of predicting molecular properties. On small molecule datasets QM9 and MD17, SE3Set performs comparably to the state-of-the-art (SOTA) models and demonstrates excellent performance on the large molecule dataset MD22, reducing the average absolute error by approximately 20%, demonstrating its superiority in handling complex molecular structures. The paper also discusses the existing work on GNN and HGNN, as well as how to overcome some challenges through improved fragmentation methods and the SE3Set architecture. The SE3Set model consists of embedding layers, attention blocks, and output heads, with the equivariant hypergraph attention blocks being crucial as they update node and hyperedge features to capture precise molecular structures. Through experiments on the QM9, MD17, and MD22 datasets, SE3Set demonstrates its advantage in handling many-body interactions, especially in large molecules. Additionally, the paper conducts ablation studies that highlight the importance of the fragmentation method and model architecture selection, showing the potential of SE3Set in molecular property prediction. Overall, this work provides new insights into molecular representation learning and is expected to bring about transformation in the field of computational chemistry.