VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification

Florian Sestak,Lisa Schneckenreiter,Johannes Brandstetter,Sepp Hochreiter,Andreas Mayr,Günter Klambauer
2024-04-11
Abstract:Being able to identify regions within or around proteins, to which ligands can potentially bind, is an essential step to develop new drugs. Binding site identification methods can now profit from the availability of large amounts of 3D structures in protein structure databases or from AlphaFold predictions. Current binding site identification methods heavily rely on graph neural networks (GNNs), usually designed to output E(3)-equivariant predictions. Such methods turned out to be very beneficial for physics-related tasks like binding energy or motion trajectory prediction. However, the performance of GNNs at binding site identification is still limited potentially due to the lack of dedicated nodes that model hidden geometric entities, such as binding pockets. In this work, we extend E(n)-Equivariant Graph Neural Networks (EGNNs) by adding virtual nodes and applying an extended message passing scheme. The virtual nodes in these graphs are dedicated quantities to learn representations of binding sites, which leads to improved predictive performance. In our experiments, we show that our proposed method VN-EGNN sets a new state-of-the-art at locating binding site centers on COACH420, HOLO4K and PDBbind2020.
Machine Learning,Artificial Intelligence,Biomolecules
What problem does this paper attempt to address?
This paper focuses on the problem of protein binding site recognition, which is a critical computational challenge in drug discovery. With the development of technologies such as AlphaFold, the availability of a large amount of protein 3D structure data provides new opportunities for this problem. Current methods mainly rely on graph neural networks (GNNs), especially E(3)-equivariant GNNs, which perform well in predicting physical tasks such as binding energy or motion trajectory. However, the performance of GNNs in binding site recognition is still limited, possibly due to the lack of dedicated nodes that can simulate hidden geometric entities such as binding pockets. The paper proposes a new method called VN-EGNN (Virtual Node E(3)-equivariant Graph Neural Network), which enhances GNNs by adding virtual nodes and adopting an extended message-passing scheme. Virtual nodes are designed to learn the representation of binding sites, thereby improving prediction performance. Experiments show that VN-EGNN sets new state-of-the-art standards on datasets such as COACH420, HOLO4K, and PDBbind2020, accurately locating the center of binding sites. The paper also discusses the limitations of GNNs, including limited expressive power, over-smoothing, and over-compression, and points out that virtual nodes can alleviate these issues. By updating the coordinates of virtual nodes, VN-EGNN can predict the center of binding sites, forming a useful neural representation of the binding sites. Finally, the paper proposes a new E(3)-equivariant GNN architecture incorporating virtual nodes, which is the first of its kind that does not rely on prior knowledge, and evaluates its performance on various benchmark datasets.