A MapReduce Based Parallel SVM for Large-Scale Predicting Protein–protein Interactions

Zhu-Hong You,Jian-Zhong Yu,Lin Zhu,Shuai Li,Zhen-Kun Wen
DOI: https://doi.org/10.1016/j.neucom.2014.05.072
IF: 6
2014-01-01
Neurocomputing
Abstract:Protein–protein interactions (PPIs) are crucial to most biochemical processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. Although large amount of protein–protein interaction data for different species has been generated by high-throughput experimental techniques, the number is still limited compared to the total number of possible PPIs. Furthermore, the experimental methods for identifying PPIs are both time-consuming and expensive. Therefore, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. In this article, we propose a novel MapReduce-based parallel SVM model for large-scale predicting protein–protein interactions only using the information of protein sequences. First, the local sequential features represented by autocorrelation descriptor are extracted from protein sequences. Then the MapReduce framework is employed to train support vector machine (SVM) classifiers in a distributed way, obtaining significant improvement in training time while maintaining a high level of accuracy. The experimental results demonstrate that the proposed parallel algorithms not only can tackle large-scale PPIs dataset, but also perform well in terms of the evaluation metrics of speedup and accuracy. Consequently, the proposed approach can be considered as a new promising and powerful tools for large-scale predicting PPI with excellent performance and less time.
What problem does this paper attempt to address?