Iesgene-Zcpseknc: Identify Essential Genes Based on Z Curve Pseudo $k$ -Tuple Nucleotide Composition.

Jiahai Chen,Yongmin Liu,Qing Liao,Bin Liu
DOI: https://doi.org/10.1109/access.2019.2952237
IF: 3.9
2019-01-01
IEEE Access
Abstract:As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo k -tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo k -tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/.
What problem does this paper attempt to address?