Iori-Enst: Identifying Origin of Replication Sites Based on Elastic Net and Stacking Learning

Y. Yao,S. Zhang,Y. Liang
DOI: https://doi.org/10.1080/1062936x.2021.1895884
IF: 3.681
2021-01-01
SAR and QSAR in Environmental Research
Abstract:DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets S-1 and S-2 with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset S-3 is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.
What problem does this paper attempt to address?