Abstract:Predicting subcellular localization of human proteins is a challenging problem, especially when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered. To tackle the challenge, protein samples are expressed by hybridizing the gene ontology (GO) database and amphiphilic pseudo amino acid composition (PseAA). Based on such a representation frame, a novel ensemble classifier, called "Hum-PLoc", was developed by fusing many basic individual classifiers through a voting system. The "engine" of these basic classifiers was operated by the KNN (K-nearest neighbor) rule. As a demonstration, tests were performed with the ensemble classifier for human proteins among the following 12 locations: (1) centriole; (2) cytoplasm; (3) cytoskeleton; (4) endoplasmic reticulum; (5) extracell; (6) Golgi apparatus; (7) lysosome; (8) microsome; (9) mitochondrion; (10) nucleus; (11) peroxisome; (12) plasma membrane. To get rid of redundancy and homology bias, none of the proteins investigated here had > or = 25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the jackknife cross-validation test and independent dataset test were 81.1% and 85.0%, respectively, which are more than 50% higher than those obtained by the other existing methods on the same stringent datasets. Furthermore, an incisive and compelling analysis was given to elucidate that the overwhelmingly high success rate obtained by the new predictor is by no means due to a trivial utilization of the GO annotations. This is because, for those proteins with "subcellular location unknown" annotation in Swiss-Prot database, most (more than 99%) of their corresponding GO numbers in GO database are also annotated with "cellular component unknown". The information and clues for predicting subcellular locations of proteins are actually buried into a series of tedious GO numbers, just like they are buried into a pile of complicated amino acid sequences although with a different manner and "depth". To dig out the knowledge about their locations, a sophisticated operation engine is needed. And the current predictor is one of these kinds, and has proved to be a very powerful one. The Hum-PLoc classifier is available as a web-server at http://202.120.37.186/bioinf/hum.

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization

pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information

Psi: A Comprehensive And Integrative Approach For Accurate Plant Subcellular Localization Prediction

Identification of Multiple Subcellular Locations for Proteins in Budding Yeast

Critical evaluation of web-based prediction tools for human protein subcellular localization

Predicting protein subnuclear localization using GO-amino-acid composition features

HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source

Predicting Human Protein Subcellular Locations by Using a Combination of Network and Function Features

Protein Subcellular Localization Based on PSI-BLAST and Machine Learning.

Prediction of Protein Subcellular Locations Using Fuzzy K-Nn Method

HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units

Esub8: a Novel Tool to Predict Protein Subcellular Localizations in Eukaryotic Organisms.

Subcellular Localization Prediction of Human Proteins Using Multifeature Selection Methods

A Novel Method for Protein Subcellular Localization: Combining Residue-Couple Model and SVM.

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations.

A Novel Method for Protein Subcellular Localization Based on Boosting and Probabilistic Neural Network..

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data

Prediction of protein subcellular localization in single cells

GO Molecular Function Coding Based Protein Subcellular Localization Prediction

Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites