RF‐SVM: Identification of DNA‐binding Proteins Based on Comprehensive Feature Representation Methods and Support Vector Machine

Yanping Zhang,Jianwei Ni,Ya Gao
DOI: https://doi.org/10.1002/prot.26229
2021-01-01
Proteins Structure Function and Bioinformatics
Abstract:Protein‐DNA interactions play an important role in biological progress, such as DNA replication, repair, and modification processes. In order to have a better understanding of its functions, the one of the most important steps is the identification of DNA‐binding proteins. We propose a DNA‐binding protein predictor, namely, RF‐SVM, which contains four types features, that is, pseudo amino acid composition (PseAAC), amino acid distribution (AAD), adjacent amino acid composition frequency (ACF) and Local‐DPP. Random Forest algorithm is utilized for selecting top 174 features, which are established the predictor model with the support vector machine (SVM) on training dataset UniSwiss‐Tr. Finally, RF‐SVM method is compared with other existing methods on test dataset UniSwiss‐Tst. The experimental results demonstrated that RF‐SVM has accuracy of 84.25%. Meanwhile, we discover that the physicochemical properties of amino acids for OOBM770101(H), CIDH920104(H), MIYS990104(H), NISK860101(H), VINM940103(H), and SNEP660101(A) have contribution to predict DNA‐binding proteins. The main code and datasets can gain in https://github.com/NiJianWei996/RF-SVM.
What problem does this paper attempt to address?