Combining Multi-Dimensional Molecular Fingerprints to Predict the Herg Cardiotoxicity of Compounds

Weizhe Ding,Yang Nan,Juanshu Wu,Chenyang Han,Xiangxin Xin,Siyuan Li,Hongsheng Liu,Li Zhang
DOI: https://doi.org/10.1016/j.compbiomed.2022.105390
IF: 7.7
2022-01-01
Computers in Biology and Medicine
Abstract:Recently, drug toxicity has become a critical problem with heavy medical and economic burdens. Acquired long QT syndrome (acLQTS) is an acquired cardiac ion channel disease caused by drugs blocking the hERG channel. Therefore, it is necessary to avoid cardiotoxicity in drug design, and computer models have been widely used to fix this predicament. In this study, we collected a hERG inhibitor dataset containing 8671 compounds, and then, these compounds were featurized by traditional molecular fingerprints (including Baseline2D, ECFP4, PropertyFP, and 3DFP) and the newly proposed molecular dynamics fingerprint (MDFP). Subsequently, regression prediction models were established by using four machine learning algorithms based on these fingerprints and the combined multi-dimensional molecular fingerprints (MultiFP). After cross-validation and independent test dataset validation, the results show that the best model was built by the consensus of four algorithms with MultiFP, and this model bests recently published methods in terms of hERG cardiotoxicity prediction with a RMSE of 0.531 and a R-2 of 0.653 on the test dataset. Feature importance analysis and correlation analysis identified some novel structural features and molecular dynamics features that are highly associated with the hERG inhibition of compounds. Our findings provide new insight into multi-dimensional molecular fingerprints and consensus models for hERG cardiotoxicity prediction.
What problem does this paper attempt to address?