HN-PPISP: a Hybrid Network Based on MLP-Mixer for Protein-Protein Interaction Site Prediction.

Yan Kang,Yulong Xu,Xinchao Wang,Bin Pu,Xuekun Yang,Yulong Rao,Jianguo Chen
DOI: https://doi.org/10.1093/bib/bbac480
IF: 9.5
2023-01-01
Briefings in Bioinformatics
Abstract:Motivation: Biological experimental approaches to protein-protein interaction (PPI) site prediction are critical for understanding the mechanisms of biochemical processes but are time-consuming and laborious. With the development of Deep Learning (DL) techniques, the most popular Convolutional Neural Networks (CNN) -based methods have been proposed to address these problems. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in protein sequences. Current methods cannot efficiently explore the nature of Position Specific Scoring Matrix (PSSM), secondary structure and raw protein sequences by processing them all together. For PPI site prediction, how to effectively model the PPI context with attention to prediction remains an open problem. In addition, the long-distance dependencies of PPI features are important, which is very challenging for many CNN -based methods because the innate ability of CNN is difficult to outperform auto -regressive models like Transformers. Results: To effectively mine the properties of PPI features, a novel hybrid neural network named HN-PPISP is proposed, which integrates a Multi -layer Perceptron Mixer (MLP-Mixer) module for local feature extraction and a two -stage multi -branch module for global feature capture. The model merits Transformer, TextCNN and Bi-LSTM as a powerful alternative for PPI site prediction. On the one hand, this is the first application of an advanced Transformer (i.e. MLP-Mixer) with a hybrid network for sequence -based PPI prediction. On the other hand, unlike existing methods that treat global features altogether, the proposed two -stage multi -branch hybrid module firstly assigns different attention scores to the input features and then encodes the feature through different branch modules. In the first stage, different improved attention modules are hybridized to extract features from the raw protein sequences, secondary structure and PSSM, respectively. In the second stage, a multi -branch network is designed to aggregate information from both branches in parallel. The two branches encode the features and extract dependencies through several operations such as TextCNN, Bi-LSTM and different activation functions. Experimental results on real -world public datasets show that our model consistently achieves state-of-the-art performance over seven remarkable baselines. Availability: The source code of HN-PPISP model is available at https://github.com/y1xu05/HN-PPISP.
What problem does this paper attempt to address?