Synergy Between Machine Learning and Natural Products Cheminformatics: Application to the Lead Discovery of Anthraquinone Derivatives
Said Moshawih,Hui Poh Goh,Nurolaini Kifli,Azam Che Idris,Hayati Yassin,Vijay Kotra,Khang Wen Goh,Kai Bin Liew,Long Chiau Ming
DOI: https://doi.org/10.1111/cbdd.14062
2022-05-03
Abstract:The colossal effect of machine learning combined with expanding chemical space of natural products on discovering and developing new drugs. A focused look on anthraquinone derivatives for optimizing lead compounds. Cheminformatics including machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural macrocyclic compounds. Due to the natural products' structural complexity, uniqueness and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations, are essential in predicting best binding poses, binding free energy values, and molecular mechanics' force fields. Those applications are leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High‐throughput screening utilized for fractionating NPs and high‐throughput virtual screening and their strategies, as well as significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds including anthraquinone derivatives and approaches in computer‐aided and ML‐based drug discovery. Anthraquinone macrocycles can be optimized when new lead compounds are developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. The power of principal component analysis in understanding relevant protein conformations, molecular modeling of protein–polyphenol and protein–ligand interaction was also presented. Apart from being a concise reference about cheminformatics, this review is a useful text to understand the application of ML‐based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.