RFDTI: Using Rotation Forest with Feature Weighted for Drug-Target Interaction Prediction from Drug Molecular Structure and Protein Sequence

Lei Wang,Zhu-Hong You,Li-Ping Li,Xin Yan
DOI: https://doi.org/10.1101/2020.01.06.895755
2020-01-01
Abstract:AbstractThe identification and prediction of Drug-Target Interactions (DTIs) is the basis for screening drug candidates, which plays a vital role in the development of innovative drugs. However, due to the time-consuming and high cost constraints of biological experimental methods, traditional drug target identification technologies are often difficult to develop on a large scale. Therefore,in silicomethods are urgently needed to predict drug-target interactions in a genome-wide manner. In this article, we design a newin silicoapproach, named RFDTI to predict the DTIs combine Feature weighted Rotation Forest (FwRF) classifier with protein amino acids information. This model has two outstanding advantages: a) using the fusion data of protein sequence and drug molecular fingerprint, which can fully carry information; b) using the classifier with feature selection ability, which can effectively remove noise information and improve prediction performance. More specifically, we first use Position-Specific Score Matrix (PSSM) to numerically convert protein sequences and utilize Pseudo Position-Specific Score Matrix (PsePSSM) to extract their features. Then a unified digital descriptor is formed by combining molecular fingerprints representing drug information. Finally, the FwRF is applied to implement onEnzyme,Ion Channel,GPCR, andNuclear Receptordata sets. The results of the five-fold cross-validation experiment show that the prediction accuracy of this approach reaches 91.68%, 88.11%, 84.72% and 78.33% on four benchmark data sets, respectively. To further validate the performance of the RFDTI, we compare it with other excellent methods and Support Vector Machine (SVM) model. In addition, 7 of the 10 highest predictive scores in predicting novel DTIs were validated by relevant databases. The experimental results of cross-validation indicated that RFDTI is feasible in predicting the relationship among drugs and target, and can provide help for the discovery of new candidate drugs.
What problem does this paper attempt to address?