DMLDA-LocLIFT: Identification of multi-label protein subcellular localization using DMLDA dimensionality reduction and LIFT classifier
Qi Zhang,Shan Li,Bin Yu,Qingmei Zhang,Yu Han,Yan Zhang,Qin Ma
DOI: https://doi.org/10.1016/j.chemolab.2020.104148
IF: 4.175
2020-11-01
Chemometrics and Intelligent Laboratory Systems
Abstract:<p>Multi-label proteins occur in two or more subcellular locations, which play a vital role in cell development and metabolism. Prediction and analysis of multi-label subcellular localization (SCL) can present new perspective with drug target identification and new drug design. However, the prediction of multi-label protein SCL using biological experiments is expensive and labor-intensive. Therefore, predicting large-scale SCL with machine learning methods has turned into a popular study topic in bioinformatics. In this study, a novel multi-label learning methods for protein SCL prediction, called DMLDA-LocLIFT, is proposed. Firstly, the dipeptide composition (DC), encoding based on grouped weight (EBGW), pseudo amino acid composition (PseAAC), gene ontology (GO) and pseudo position specific scoring matrix (PsePSSM) are employed to encode subcellular protein sequences. Then, using direct multi-label linear discriminant analysis (DMLDA) to get rid of noise information of the fused feature vector. Lastly, the first-best feature vectors are input into the multi-label learning with Label-specIfic FeaTures (LIFT) classifier to predict. The leave-one-out cross validation (LOOCV) shows that the overall actual accuracy on Gram-negative bacteria, Gram-positive bacteria, plant datasets, virus dataset and human dataset are 98.6%, 99.6%, 97.9%, 94.7% and 96.1% respectively, which are obviously better than other state-of-the-art prediction methods. The proposed model can effectively predict SCL of multi-label proteins and provide references for experimental identification of SCL. The source codes and datasets are available at <a href="https://github.com/QUST-AIBBDRC/DMLDA-LocLIFT/">https://github.com/QUST-AIBBDRC/DMLDA-LocLIFT/</a>.</p>
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical