Abstract:Predicting subcellular localization of human proteins is a challenging problem, especially when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered. To tackle the challenge, protein samples are expressed by hybridizing the gene ontology (GO) database and amphiphilic pseudo amino acid composition (PseAA). Based on such a representation frame, a novel ensemble classifier, called "Hum-PLoc", was developed by fusing many basic individual classifiers through a voting system. The "engine" of these basic classifiers was operated by the KNN (K-nearest neighbor) rule. As a demonstration, tests were performed with the ensemble classifier for human proteins among the following 12 locations: (1) centriole; (2) cytoplasm; (3) cytoskeleton; (4) endoplasmic reticulum; (5) extracell; (6) Golgi apparatus; (7) lysosome; (8) microsome; (9) mitochondrion; (10) nucleus; (11) peroxisome; (12) plasma membrane. To get rid of redundancy and homology bias, none of the proteins investigated here had > or = 25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the jackknife cross-validation test and independent dataset test were 81.1% and 85.0%, respectively, which are more than 50% higher than those obtained by the other existing methods on the same stringent datasets. Furthermore, an incisive and compelling analysis was given to elucidate that the overwhelmingly high success rate obtained by the new predictor is by no means due to a trivial utilization of the GO annotations. This is because, for those proteins with "subcellular location unknown" annotation in Swiss-Prot database, most (more than 99%) of their corresponding GO numbers in GO database are also annotated with "cellular component unknown". The information and clues for predicting subcellular locations of proteins are actually buried into a series of tedious GO numbers, just like they are buried into a pile of complicated amino acid sequences although with a different manner and "depth". To dig out the knowledge about their locations, a sophisticated operation engine is needed. And the current predictor is one of these kinds, and has proved to be a very powerful one. The Hum-PLoc classifier is available as a web-server at http://202.120.37.186/bioinf/hum.

PScL-HDeep: Image-Based Prediction of Protein Subcellular Location in Human Tissue Using Ensemble Learning of Handcrafted and Deep Learned Features with Two-Layer Feature Selection.

PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data

Prediction of Human Protein Subcellular Localization Using Deep Learning

Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks.

ImPLoc: a Multi-Instance Deep Learning Model for the Prediction of Protein Subcellular Localization Based on Immunohistochemistry Images.

PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method

An Artificial Intelligence-Based Stacked Ensemble Approach for Prediction of Protein Subcellular Localization in Confocal Microscopy Images

Single-cell Subcellular Protein Localisation Using Novel Ensembles of Diverse Deep Architectures

Protein Subcellular Localization Prediction by Concatenation of Convolutional Blocks for Deep Features Extraction From Microscopic Images

Deep Model-Based Feature Extraction for Predicting Protein Subcellular Localizations from Bio-Images.

HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units

SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks

Protein subcellular localization based on deep image features and criterion learning strategy

DeepSP: A Deep Learning Framework for Spatial Proteomics.

Image-Based Human Protein Subcellular Location Prediction Using Local Tetra Patterns Descriptor

HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source

Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier.

Deep Learning-Based Classification of Protein Subcellular Localization from Immunohistochemistry Images

Extracting Cellular Location of Human Proteins Using Deep Learning

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization

Image-based Classification of Protein Subcellular Location Patterns in Human Reproductive Tissue by Ensemble Learning Global and Local Features