Machine learning-based virtual screening for STAT3 anticancer drug target.
Abdul Wadood,Amar Ajmal,Muhammad Junaid,Ashfaq Ur Rehman,Reaz Uddin,Syed Sikander Azam,Alam zeb Khan,Asad Ali
DOI: https://doi.org/10.2174/1381612828666220728120523
IF: 3.31
2022-08-03
Current Pharmaceutical Design
Abstract:Background: Signal transducers and activators of transcription (STAT) family consist of seven structurally similar and highly conserved members, including STAT1, STAT2, STAT3, STAT4, STAT5a, STAT5b, and STAT6. The STAT3 signaling cascade is activated by upstream kinase signals and undergoes phosphorylation, homo-dimerization, nuclear translocation, and DNA binding, resulting in the expression of target genes involved in tumor cell proliferation, metastasis, angiogenesis, and immune editing. STAT3 hyperactivation has been documented in a number of tumors, including head and neck, breast, lung, liver, kidney, prostate, pancreas cancer, multiple myeloma, and acute myeloid leukemia. Drug discovery is a time-consuming and costly process, to bring a single drug to the market, may take ten to fifteen years. Machine learning algorithms are very fast as well as effective that are commonly used in the field such as drug discovery. These algorithms are ideal for the virtual screening of large compound libraries to classify molecules as active or inactive. Objective: The present work aims to perform machine learning based virtual screening for STAT3 drug target. Methods: Machine learning models such as k-nearest neighbor, support vector machine, Gaussian naïve Bayes, and random forest for classifying the active and inactive inhibitors against a STAT3 drug target were developed. Ten-fold cross-validation was used for model validation. Then the test dataset prepared from the zinc database was screened using the random forest model. Total of 20 compounds with 88% accuracy was predicted as active against STAT3. Furthermore, these twenty compounds were docked into the active site of STAT3. The two complexes with good docking scores as well as the reference compound were subjected to MD simulation. A total of 100ns MD simulation was performed. Results: Compared to all other models, random forest model revealed better results. Compared to the standard reference compound, the top two hits revealed greater stability and compactness. Conclusion: In conclusion, our predicted hits have the ability to inhibit STAT3 overexpression to combat the STAT3 associated diseases. Keywords: Machine learning, STAT3, Virtual screening, Docking, MD simulation, drug target.
pharmacology & pharmacy