Abstract:Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP (https://kipp.idruglab.cn) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve several key problems in kinase inhibitor selectivity prediction: 1. **Limitations of the dataset**: - The current datasets used for kinase profiling prediction involve a relatively small number of kinases, which limits the generality of the model. - The relatively small number of compounds in the dataset may lead to limited generalization ability of the established model. 2. **Limitations of single - molecule representation and machine - learning methods**: - Existing models are usually constructed based on specific molecule representations (such as molecular descriptors or fingerprints) and limited machine - learning methods. There is a lack of comprehensive screening of different molecular features and machine - learning algorithms, resulting in the model performance may not be optimal. 3. **Insufficient application of deep - learning methods**: - Most existing kinase profiling prediction models mainly use traditional machine - learning algorithms (such as KNN, NB, SVM, and RF), while advanced deep - learning algorithms (especially graph neural networks, GNN) are less applied in kinase profiling prediction. 4. **Availability of tools**: - The reported kinase profiling prediction models have not been integrated into easy - to - use tools (such as local software packages or online platforms), which limits the application of these models among experts and non - experts. ### Solutions To solve the above problems, the authors constructed a comprehensive kinase profiling prediction benchmark dataset (called KinaseNet), covering 354 kinases, and made a systematic comparison using multiple machine - learning and deep - learning methods. The specific steps include: 1. **Dataset construction**: - 141,086 unique compounds and 216,823 clear bioassay data points were collected from multiple sources, covering 354 kinases in the human kinome. - The dataset was randomly divided into a training set (80%), a validation set (10%), and a test set (10%). 2. **Model construction and evaluation**: - 136,290 prediction models were constructed using three types of molecule representations (molecular descriptors, five different molecular fingerprints, and molecular graphs) and 12 machine - learning and deep - learning algorithms. - The performance of these models was comprehensively compared and evaluated, including specificity (SP/TNR), sensitivity (SE/TPR/Recall), balanced accuracy (BA), F1 - score, Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic curve (AUC). 3. **Application of the best model**: - Based on the comprehensive comparison results, an online platform KIPP (https://kipp.idruglab.cn) and Python software were developed to support tasks related to kinase inhibitor drug discovery, including virtual screening, compound re - positioning, and target capture. ### Main findings 1. **Descriptor - based machine - learning models perform better than fingerprint - based machine - learning models**. 2. **The performance of single - task graph - based deep - learning models is generally not as good as that of traditional descriptor - based and fingerprint - based machine - learning models, but multi - task models significantly improve the average accuracy of kinase profiling prediction**. 3. **Fusion models (based on voting and stacking methods) further improve the performance of kinase profiling prediction tasks, among which the RF::AtomPairs + FP2 + RDKitDes fusion model performs the best, with an average AUC value of 0.825 on the test set**. These findings provide useful guidance for selecting machine - learning and deep - learning methods in kinase profiling prediction tasks.

Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors

MinKLIFSAI: a simple machine learning approach toward selective kinase inhibitor

Establishment of extensive artificial intelligence models for kinase inhibitor prediction: Identification of novel PDGFRB inhibitors

The Development and Application of KinomePro-DL: A Deep Learning Based Online Small Molecule Kinome Selectivity Profiling Prediction Platform

Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors

Multimodal multi-task deep neural network framework for kinase–target prediction

Leveraging multiple data types for improved compound-kinase bioactivity prediction

A Hybrid Structure-Based Machine Learning Approach for Predicting Kinase Inhibition by Small Molecules

Ksrmkl: a Novel Method for Identification of Kinase–substrate Relationships Using Multiple Kernel Learning

Docking-informed machine learning for kinome wide affinity prediction

Prediction of Small Molecule Kinase Inhibitors for Chemotherapy Using Deep Learning

Artificial intelligence methods in kinase target profiling: advances and challenges

Crowdsourced mapping of unexplored target space of kinase inhibitors

Construction of an altered proton donation mechanism in Escherichia coli dihydrofolate reductase.

An HPLC method associated with a thermodynamic analysis to compare the binding of TRAIL and its nanovectorized form to death receptors DR4 and DR5 and their relationship to cytotoxicity.

Kinome-Wide Virtual Screening by Multi-Task Deep Learning

Improving the Performance of Protein Kinase Identification Via High Dimensional Protein-Protein Interactions and Substrate Structure Data

Machine learning-based classification models for non-covalent Bruton's tyrosine kinase inhibitors: predictive ability and interpretability

Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

THE FORMATION OF ORGANIC COMPOUNDS ON THE PRIMITIVE EARTH

Calibrated geometric deep learning improves kinase–drug binding predictions