Prediction Of Dna-Binding Protein Using Random Forest And Elastic Net

Qiujian Chen,Lei Li,Kun Yang,Rong Long,Feng Shi
DOI: https://doi.org/10.1109/FSKD.2017.8393048
2017-01-01
Abstract:Recognition of DNA-binding protein is a very meaningful work, because DNA-binding proteins act as the very vital roles in many biological processes. In order to reveal the inner connection between intrinsic information of protein and the binding force of DNA and protein, a 314-dimensional vector is inputted for DNA-binding protein prediction. And all the 314 dimensional values are identified as the more vital digital feature and coded from the multiple properties of protein. A large number of mathematical experiments are performed with 5-fold cross validation test to find the optimal parameters and construct the available models with random forest and elastic net. The numeric features of box-counting dimension, information entropies of chaos game representation and information entropies of dipeptide composition are regarded as more crucial roles showed by a large number of experiments. The performance of random forest model and elastic net model of this study is slightly better than the one of DNA-Prot for test dataset. The Matthew's correlation coefficient (MCC) is 0.7374 and 0.7591 and accuracy (ACC) achieves respectively 0.8750 and 0.8698. For independent dataset1 and independent dataset2 it gains slightly lower MCC and ACC value than DNA-Prot [1].
What problem does this paper attempt to address?