Protein Function Prediction Based on Kernel Logistic Regression with 2-order Graphic Neighbor Information
Jingwei Liu
DOI: https://doi.org/10.48550/arXiv.1207.4463
2012-07-18
Quantitative Methods
Abstract:To enhance the accuracy of protein-protein interaction function prediction, a 2-order graphic neighbor information feature extraction method based on undirected simple graph is proposed in this paper, which extends the 1-order graphic neighbor featureextraction method. And the chi-square test statistical method is also involved in feature combination. To demonstrate the effectiveness of our 2-order graphic neighbor feature, four logistic regression models (logistic regression (abbrev. LR), diffusion kernel logistic regression (abbrev. DKLR), polynomial kernel logistic regression (abbrev. PKLR), and radial basis function (RBF) based kernel logistic regression (abbrev. RBF KLR)) are investigated on the two feature sets. The experimental results of protein function prediction of Yeast Proteome Database (YPD) using the the protein-protein interaction data of Munich Information Center for Protein Sequences (MIPS) show that 2-order graphic neighbor information of proteins can significantly improve the average overall percentage of protein function prediction especially with RBF KLR. And, with a new 5-top chi-square feature combination method, RBF KLR can achieve 99.05% average overall percentage on 2-order neighbor feature combination set.
What problem does this paper attempt to address?