Unsupervised machine learning, QSAR modelling and web tool development for streamlining the lead identification process of antimalarial flavonoids

J.H. Zothantluanga,D. Chetia,S. Rajkhowa,A.K. Umar
DOI: https://doi.org/10.1080/1062936X.2023.2169347
IF: 3.681
2023-02-06
SAR and QSAR in Environmental Research
Abstract:Identification of lead compounds with the traditional laboratory approach is expensive and time-consuming. Nowadays, in silico techniques have emerged as a promising approach for lead identification. In this study, we aim to develop robust and predictive 2D-QSAR models to identify lead flavonoids by predicting the IC 50 against Plasmodium falciparum . We applied machine learning algorithms (Principal component analysis followed by K-means clustering) and Pearson correlation analysis to select 9 molecular descriptors (MDs) for model building. We selected and validated the three best QSAR models after execution of multiple linear regression (MLR) 100 times with different combinations of MDs. The developed models have fulfilled the five principles for QSAR models as specified by the Organization for Economic Co-operation and Development. The outcome of the study is a reliable and sustainable in silico method of IC 50 (Mean ± SD) prediction that will positively impact the antimalarial drug development process by reducing the money and time required to identify potential antimalarial lead compounds from the class of flavonoids. We also developed a web tool (JazQSAR, https://etflin.com/news/4) to offer an easily accessible platform for the developed QSAR models.
environmental sciences,toxicology,computer science, interdisciplinary applications,chemistry, multidisciplinary,mathematical & computational biology
What problem does this paper attempt to address?