Abstract:Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.

Feature selection and transduction for prediction of molecular bioactivity for drug design

Predicting Drug-Target Interactions Between New Drugs and New Targets via Pairwise K-nearest Neighbor and Automatic Similarity Selection.

An Ensemble Learning-Based Method for Inferring Drug-Target Interactions Combining Protein Sequences and Drug Fingerprints

Improved Prediction Of Drug-Target Interactions Based On Ensemble Learning With Fuzzy Local Ternary Pattern

Prediction of Effective Drug Combinations by Chemical Interaction, Protein Interaction and Target Enrichment of KEGG Pathways

Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features.

A Systematic Prediction Of Multiple Drug-Target Interactions From Chemical, Genomic, And Pharmacological Data

Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques

Computational Prediction of DrugTarget Interactions Using Chemical, Biological, and Network Features

Identification of potential drug-targets by combining evolutionary information extracted from frequency profiles and molecular topological structures

Computational Prediction of DrugTarget Interactions Using Chemical, Biological, and Network Features

XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set

Incorporating Chemical Sub-Structures and Protein Evolutionary Information for Inferring Drug-Target Interactions

A Computational Approach For Predicting Drug-Target Interactions From Protein Sequence And Drug Substructure Fingerprint Information

A deep learning method for drug-target affinity prediction based on sequence interaction information mining

Drug-target Affinity Prediction Method Based on Multi-Scale Information Interaction and Graph Optimization

Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey.

Prediction of Drug–Target Interactions by Combining Dual-Tree Complex Wavelet Transform with Ensemble Learning Method

Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs

Analysis and prediction of drug–drug interaction by minimum redundancy maximum relevance and incremental feature selection

Identification of Drug-Target Interactions Via Multiple Information Integration.