Hdac3i-Finder: A Machine Learning-Based Computational Tool To Screen For Hdac3 Inhibitors

Shan Li,Yu Ding,Miaomiao Chen,Ya Chen,Johannes Kirchmair,Zihao Zhu,Song Wu,Jie Xia
DOI: https://doi.org/10.1002/minf.202000105
IF: 4.05
2021-01-01
Molecular Informatics
Abstract:Histone deacetylase 3 (HDAC3) is a potential drug target for treatment of human diseases such as cancer, chronic inflammation, neurodegenerative diseases and diabetes. Machine learning (ML) as an essential cheminformatics approach has been widely used for QSAR modeling. However, none of them has been applied to HDAC3. To this end, we carefully compiled a set of 1098 compounds from the ChEMBL database that have been assayed against HDAC3 and calculated three different sets of molecular features for each compound, i. e. two-dimensional Mordred descriptors, MACCS keys (166 bits) and Morgan2 fingerprints (1024 bits). Five ML classifiers, i. e. k-Nearest Neighbour (KNN), Support Vector Machine (SVM), Random forest (RF), eXtreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN) were trained on each feature set and optimized for classification. A total of 15 models were generated and carefully compared, among which the best-performing one was the XGBoost model based on the Morgan2 fingerprints, i. e. XGBoost_morgan2. Evaluated on a well-curated benchmarking set named MUBD-HDAC3, this model achieved a high early ROC enrichment (ROCE0.5 %: 41.02). A further retrospective screening of an annotated chemical library in PubChem demonstrated that the best model could identify 8 novel-scaffold HDAC3 inhibitors while assaying only 1 % of the compounds. To make this model accessible for the scientific community, we developed a python GUI application named HDAC3i-Finder to facilitate prospective screening for HDAC3 inhibitors. The source code of HDAC3i-Finder is available at https://github.com/jwxia2014/HDAC3i-Finder.
What problem does this paper attempt to address?