Abstract:Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .

A machine learning-based approach to identify reliable gold standards for protein complex composition prediction

Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15.

Gaining Confidence on Molecular Classification Through Consensus Modeling and Validation

Trends in co-fractionation mass spectrometry: A new gold-standard in global protein interaction network discovery

PCprophet: a framework for protein complex prediction and differential analysis using proteomic data

Machine Learning Enables Comprehensive Prediction of the Relative Protein Abundance of Multiple Proteins on the Protein Corona

CORUM: the comprehensive resource of mammalian protein complexes–2022

Refining Markov Clustering for Protein Complex Prediction by Incorporating Core-Attachment Structure.

CORUM in 2024: protein complexes as drug targets

Machine learning predicts the functional composition of the protein corona and the cellular recognition of nanoparticles

Prediction of outer membrane proteins using support vector machine with combined features]

An Evolutionary and Structural Characterization of Mammalian Protein Complex Organization

Using Correlation Analysis and Nonnegative Matrix Factorization to Predict Protein Structural Classes via Position-Specific Scoring Matrix

Protein Complexes Identification with Family-Wise Error Rate Control

An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions

Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods.

Construction of Co-Complex Score Matrix for Protein Complex Prediction from AP-MS Data

Identification Of Essential Proteins Based On A New Combination Of Local Interaction Density And Protein Complexes

Predicting Protein Complexes Via the Integration of Multiple Biological Information

Predicting direct physical interactions in multimeric proteins with deep learning

Identifying Subcellular Localizations of Mammalian Protein Complexes Based on Graph Theory with a Random Forest Algorithm.