Abstract:Abstract Background Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila , Escherichia coli ( E. coli ), and Caenorhabditis elegans ( C. elegan ) datasets. Conclusion Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli , C.elegan , and Drosophila .

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is **how to use graph - structure information and sequence information to predict protein - protein interactions (PPIs) more accurately**. Specifically, the researchers focus on the following points: 1. **Background problems**: Protein - protein interactions (PPIs) play a crucial role in many biological processes, such as signal transduction, immune response, cell proliferation, etc. However, although experimental methods (such as yeast two - hybrid, affinity purification, etc.) can detect PPIs, they have problems such as being time - consuming, costly, and having a high false - positive rate. Therefore, it is particularly important to develop efficient computational methods to predict PPIs. 2. **Limitations of existing methods**: - Traditional machine - learning methods mainly rely on sequence information and ignore the structural information in the PPI network (such as the degree, position, and neighbor nodes of nodes). - Although deep - learning methods perform well in feature extraction, most methods only focus on sequence data and fail to fully utilize the graph - structure information of the PPI network. 3. **Research objectives**: Propose a deep - learning method that combines graph - structure information and sequence information to predict PPIs more accurately. To this end, the author introduced an improved graph representation - learning model - **Signed Variational Graph Auto - Encoder (S - VGAE)**, by regarding the PPI network as an undirected weighted graph and combining sequence features for modeling. --- ### Formula summary The main formulas involved in the paper include those related to evaluation metrics and model architecture: #### 1. Evaluation metrics The paper uses the following formulas to measure model performance: - **Accuracy (Accuracy rate)**: $$ \text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}} $$ - **Sensitivity (Sensitivity)**: $$ \text{Sensitivity}=\frac{\text{TP}}{\text{TP}+\text{FN}} $$ - **Specificity (Specificity)**: $$ \text{Specificity}=\frac{\text{TN}}{\text{TN}+\text{FP}} $$ - **Precision (Precision rate)**: $$ \text{Precision}=\frac{\text{TP}}{\text{TP}+\text{FP}} $$ - **F - score (F - value)**: $$ F\text{-score}=2\cdot\frac{\text{Precision}\cdot\text{Sensitivity}}{\text{Precision}+\text{Sensitivity}} $$ Among them, $\text{TP}$, $\text{TN}$, $\text{FP}$, and $\text{FN}$ represent true positive, true negative, false positive, and false negative respectively. #### 2. Core ideas of the S - VGAE model The S - VGAE model is based on the Variational Graph Auto - Encoder (VGAE) and improves the cost function, focusing on high - confidence interaction information. Specific improvements include: - **Modify the cost function**: Only consider high - confidence interaction information. - **Assign different weights**: Assign different signs to different interactions in the adjacency matrix to enhance the influence of negative interactions. - **Classifier design**: Use a simple three - layer softmax classifier instead of the generative model for final prediction. --- ### Core contributions of the solution 1. **Combination of graph - structure and sequence information**: The S - VGAE model not only considers the graph - structure information of the PPI network (such as the degree, position, and neighbor relationships of nodes), but also combines protein sequence features, thereby improving the prediction performance. 2. *

Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Deep Learning Frameworks for Protein–protein Interaction Prediction

Predicting Protein-Protein Interactions Using Sequence and Network Information via Variational Graph Autoencoder

Prediction of protein–protein interaction using graph neural networks

DSSGNN-PPI: A Protein-Protein Interactions prediction model based on Double Structure and Sequence graph neural networks

Graph embedding-based novel protein interaction prediction via higher-order graph convolutional network

An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding

SGPPI: structure-aware prediction of protein–protein interactions in rigorous conditions with graph convolutional network

Effectiveness and efficiency: label-aware hierarchical subgraph learning for protein-protein interaction

Protein-Protein Interactions Prediction Based on Bi-directional Gated Recurrent Unit and Multimodal Representation

Graph-based machine learning model for weight prediction in protein–protein networks

An Integration of Deep Learning with Feature Embedding for Protein–protein Interaction Prediction

Sequence-based Prediction of Protein Protein Interaction Using a Deep-Learning Algorithm

Hierarchical graph learning for protein–protein interaction

Integration of protein sequence and protein–protein interaction data by hypergraph learning to identify novel protein complexes

Protein-protein interaction prediction via structure-based deep learning

In memoriam J. A. Ryle.

A Deep Learning Framework for Improving Protein Interaction Prediction Using Sequence Properties

A Graph Neural Network Approach

Graph Neural Networks for Protein-Protein Interactions -- A Short Survey

DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning