A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data

Min Zeng,Min Li,Zhihui Fei,Fang-Xiang Wu,Yaohang Li,Yi Pan
DOI: https://doi.org/10.1109/bibm.2018.8621551
2018-01-01
Abstract:Identifying essential proteins is of vital importance for disease study and drug design. A lot of topology-based and machine learning-based methods have been proposed to identify essential proteins. However, traditional topology-based methods only focus on explicitly described characteristics of network topology and are not expressive enough to capture the complexity of connectivity patterns observed in biological networks. In addition, identification of essential proteins is an imbalanced learning problem due to the fact that there are significantly more non-essential proteins than the essential ones. Few machine learning-based methods take the imbalanced nature into consideration. We propose a new deep learning framework, to tackle the above limitations. In our model, we make use of the node2vec technique to learn topological features from protein-protein interaction (PPI) network without manual feature selection. To overcome the problem of the imbalanced nature of dataset, we use a sampling method, which does not bias to the majority and minority classes in a training step and tend to make full use of all samples during the whole training process. To evaluate the performance of our model, we test it on S. cerevisiae dataset. Our results show that it greatly outperforms topology-based methods including DC, BC, CC, EC, NC, LAC, PeC and WDC. It also outperforms machine learning-based methods including support vector machine (SVM), decision tree, random forest and Adaboost.
What problem does this paper attempt to address?