Identification of Type VI Effector Proteins Using a Novel Ensemble Classifier

Chunyu Wang,Jialin Li,Ying Zhang,Maozu Guo
DOI: https://doi.org/10.1109/access.2020.2985111
IF: 3.9
2020-01-01
IEEE Access
Abstract:The type VI secretion system (T6SS) delivers effector proteins (Type VI secretion system effectors, termed T6SEs) into neighboring target cells. Many human pathogens express T6SEs, including Vibrio cholera, Burkholderia spp., and Pseudomonas aeruginosa. T6SEs play vital roles in the competitive survival and pathogenesis of bacterial populations. Several machine-learning methods are able to distinguish T6SEs from non-T6SEs. However, we believe there is room for further development. Therefore, herein we propose a more powerful ensemble predictor for identifying T6SEs. Initially, we construct a benchmark dataset from existing studies and databases. Then we use k-separated-bigrams-PSSM (a type of feature encoding) to convert the protein sequences to mathematical vectors. A synthetic minority oversampling technique (SMOTE) is next employed to solve the training data imbalance problem. Finally, we employ a soft voting strategy to construct an integrated model combining six fine-tuned base classifiers. The model we propose performs excellently in terms of accuracy (ACC, 99.0%), Matthew's correlation coefficient (MCC, 97.8%), sensitivity (SN, 97.1%), and specificity (SP, 100%) in independent testing.
What problem does this paper attempt to address?