Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method

Xiaoqi Shan,Xiangeng Wang,Cheng-dong Li,Yanyi Chu,Yufang Zhang,Yi Xiong,Dong-Qing Wei
DOI: https://doi.org/10.1021/acs.jcim.9b00749
IF: 6.162
2019-01-01
Journal of Chemical Information and Modeling
Abstract:A drug may be metabolized by multiple cytochrome P450 (CYP450) isoforms. Predicting the metabolic fate of drugs is very important to prevent drug-drug interactions in the development of novel pharmaceuticals. Prediction of CYP450 enzyme-substrate selectivity is formulized as a multilabel learning task in this study. First, we compared the performance of feature combinations based on four different categories of features, which are physiochemical property descriptors, mol2vec descriptors, extended connectivity fingerprints, and molecular access system key fingerprints on modeling. After identifying the best combination of features, we applied seven different multilabel models, which are multilabel k-nearest neighbor (ML-kNN), multilabel twin support vector machine, and five network-based label space division (NLSD)-based methods (NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM). All of the six models (ML-kNN, NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM) in this paper exhibit better performances than the previous work. Besides, NLSD-XGB achieves the best performance with the average top-1 prediction success of 91.1%, the average top-2 prediction success of 96.2%, and the average top-3 prediction success of 98.2%. When compared with the previous work, NLSD-XGB shows a significant improvement over 11% on top-1 in the 10 times repeated 5-fold cross-validation test and over 14% on top-1 in the 10 times repeated hold-out method. To the best of our knowledge, the network-based label space division model is first introduced in drug metabolism and performs well in this task.
What problem does this paper attempt to address?