A Graph Neural Network Approach

Niccolò Pancino,Caterina Gallegati,Fiamma Romagnoli,Pietro Bongini,Monica Bianchini
DOI: https://doi.org/10.3390/ijms25115870
IF: 5.6
2024-05-29
International Journal of Molecular Sciences
Abstract:Protein–protein interactions (PPIs) are fundamental processes governing cellular functions, crucial for understanding biological systems at the molecular level. Compared to experimental methods for PPI prediction and site identification, computational deep learning approaches represent an affordable and efficient solution to tackle these problems. Since protein structure can be summarized as a graph, graph neural networks (GNNs) represent the ideal deep learning architecture for the task. In this work, PPI prediction is modeled as a node-focused binary classification task using a GNN to determine whether a generic residue is part of the interface. Biological data were obtained from the Protein Data Bank in Europe (PDBe), leveraging the Protein Interfaces, Surfaces, and Assemblies (PISA) service. To gain a deeper understanding of how proteins interact, the data obtained from PISA were assembled into three datasets: Whole, Interface, and Chain, consisting of data on the whole protein, couples of interacting chains, and single chains, respectively. These three datasets correspond to three different nuances of the problem: identifying interfaces between protein complexes, between chains of the same protein, and interface regions in general. The results indicate that GNNs are capable of solving each of the three tasks with very good performance levels.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?
The paper aims to address key issues in Protein-Protein Interactions (PPIs), particularly by predicting interaction sites between proteins using Graph Neural Networks (GNNs). Specifically, the objectives of the study can be summarized as follows: 1. **PPI Prediction and Site Identification**: Interactions between proteins are crucial for understanding cellular functions. Although experimental methods are reliable, they are limited by time and cost. Therefore, this study explores the use of deep learning methods, especially GNNs, for efficient and cost-effective PPI prediction and site identification. 2. **Application of Graph Neural Networks**: Since protein structures can naturally be represented as graph structures, GNNs are an ideal choice for this task. GNNs can effectively handle graph-structured data, performing feature extraction and classification while preserving spatial information. 3. **Different Levels of PPI Analysis**: To gain a deeper understanding of how proteins interact, the study constructed three different datasets: Whole, Interface, and Chain. These datasets correspond to different levels of the problem: identifying interfaces between protein complexes, interfaces between chains within the same protein, and general interface regions, respectively. 4. **Experimental Results**: The results show that GNNs achieved good performance on all three datasets. They performed best on the Whole dataset, accurately identifying about 81% of interaction nodes while maintaining high precision values. They also achieved good results on the Interface dataset, and although performance declined on the most general Chain dataset, it still reached a good level. In summary, this study is primarily focused on developing a GNN-based method to effectively predict interaction sites between proteins and validating the method's effectiveness and generalization ability through three different levels of datasets.