Application of Bag of Words Model in the Prediction of Protein Subcellular Location

Nan ZHAO,Liang ZHANG,Wei XUE,Xiongfei WANG,Shougang REN
DOI: https://doi.org/10.3969/j.issn.1673-1689.2017.03.011
2017-01-01
Abstract:Predecessors have done a lot of work in the feature extraction of protein and subcellular localization prediction.Previous studies showed that prediction accuracy obtained by traditional feature extraction algorithm is low.In order to improve accuracy,bag of words model combined with traditional protein features extraction algorithm is used to extract feature of protein sequence in this study.Firstly,K-means algorithm is used to construct feature dictionary.Then bag of words features of protein sequences are counted by dictionary.Finally extracted feature is inputted into SVM classifier to forecast the protein subcellular location.Results showed that predictionaccuracy of subcellular localization has been improved.
What problem does this paper attempt to address?