DBPPred-PDSD: Machine Learning Approach for Prediction of DNA-binding Proteins Using Discrete Wavelet Transform and Optimized Integrated Features Space

Farman Ali,Muhammad Kabir,Muhammad Arif,Zar Nawab Khan Swati,Zaheer Ullah Khan,Matee Ullah,Dong-Jun Yu
DOI: https://doi.org/10.1016/j.chemolab.2018.08.013
IF: 4.175
2018-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:DNA-binding proteins play a crucial role in various biological processes such as regulation of DNA modification, repair, replication, and transcription. These proteins widely participate in the production of drugs, antibiotics, and steroids. Many computational approaches have been developed to identify DNA-binding proteins, but some methods are time-consuming and expensive while some are laborious. Still, it is a challenging task for the researchers to develop highly promising computational methods to identify DNA-binding proteins with high precision. In our work, we developed a new computational approach named as DBPPred-PDSD which has more promising prediction power for DNA-binding proteins. We employed two datasets, extracted features via Split Amino Acid Composition (SAAC) and Position Specific Scoring Matrix (PSSM). Further, we applied the Discrete Wavelet Transform (DWT) on PSSM to extract dominant features. From these features space, optimal features are generated by Maximum Relevance and Minimum Redundancy (mRMR) and fused. To obtain highly informative features, we used Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and provided to well-known classifiers namely Support Vector Machine (SVM) and Random Forest (RF). Our model with the SVM classifier on three tests i.e. Jackknife cross-validation, 10-fold cross-validation and Independent tests achieved the highest success rate than other existing methods in the literature.
What problem does this paper attempt to address?