Lessons learned during the journey of data: from experiment to model for predicting kinase affinity, selectivity, polypharmacology, and resistance

Raquel López-Ríos de Castro,Jaime Rodríguez-Guerra,David Schaller,Talia B. Kimber,Corey Taylor,Jessica B. White,Michael Backenköhler,Alexander Payne,Ben Kaminow,Iván Pulido,Sukrit Singh,Paula Linh Kramer,Guillermo Pérez-Hernández,Andrea Volkamer,John D. Chodera

DOI: https://doi.org/10.1101/2024.09.10.612176

2024-09-10

Abstract:Recent advances in machine learning (ML) are reshaping drug discovery. Structure-based ML methods use physically-inspired models to predict binding affinities from protein:ligand complexes. These methods promise to enable the integration of data for many related targets, which addresses issues related to data scarcity for single targets and could enable generalizable predictions for a broad range of targets, including mutants. In this work, we report our experiences in building KinoML, a novel framework for ML in target-based small molecule drug discovery with an emphasis on structure-enabled methods. KinoML focuses currently on kinases as the relative structural conservation of this protein superfamily, particularly in the kinase domain, means it is possible to leverage data from the entire superfamily to make structure-informed predictions about binding affinities, selectivities, and drug resistance. Some key lessons learned in building KinoML include: the importance of reproducible data collection and deposition, the harmonization of molecular data and featurization, and the choice of the right data format to ensure reusability and reproducibility of ML models. As a result, KinoML allows users to easily achieve three tasks: accessing and curating molecular data; featurizing this data with representations suitable for ML applications; and running reproducible ML experiments that require access to ligand, protein, and assay information to predict ligand affinity. Despite KinoML focusing on kinases, this framework can be applied to other proteins. The lessons reported here can help guide the development of platforms for structure-enabled ML in other areas of drug discovery.

Biophysics

What problem does this paper attempt to address?

The paper attempts to address the problem of how to effectively collect, process, and apply kinase-related data in structure-based machine learning (ML) methods to improve performance in binding affinity prediction, selectivity, polypharmacology, and resistance in drug discovery. Specifically: 1. **Constructing the KinoML Framework**: The paper introduces a new machine learning framework, KinoML, which focuses on structure-based methods to discover small molecule drugs, particularly targeting the kinase protein family. By leveraging the highly conserved structural characteristics of the kinase family, KinoML can integrate data across the entire kinase superfamily to predict binding affinity, selectivity, and drug resistance guided by structural information. 2. **Overcoming Data Challenges**: Despite the vast amount of kinase-related data, effectively organizing this data and ensuring its accuracy and reproducibility is a significant challenge. The paper discusses how to acquire and curate data from different sources and emphasizes the importance of adhering to the FAIR principles (Findable, Accessible, Interoperable, and Reusable). 3. **Comparison of Structured and Unstructured Methods**: The paper also compares ligand-based methods with structure-based methods in kinase drug discovery, noting that structure-based methods may have better generalization capabilities due to their ability to integrate information about the relevant targets, especially when dealing with mutants. 4. **Lessons Learned**: The authors share key lessons learned during the development process, including the reproducibility of data collection, standardization of molecular data, and the choice of appropriate data formats. These lessons are not only applicable to kinase research but can also be extended to other drug discovery fields. In summary, the paper aims to address data processing and model training issues in structure-based kinase drug discovery by constructing a modular and extensible ML framework and provides practical solutions and lessons learned.

Lessons learned during the journey of data: from experiment to model for predicting kinase affinity, selectivity, polypharmacology, and resistance

A Hybrid Structure-Based Machine Learning Approach for Predicting Kinase Inhibition by Small Molecules

Leveraging Machine Learning and AlphaFold2 Steering to Discover State-Specific Inhibitors Across the Kinome

Leveraging multiple data types for improved compound-kinase bioactivity prediction

Machine learning in preclinical drug discovery

Transformative Target-Based Approaches

Lessons learnt from machine learning in early stages of drug discovery

Docking-informed machine learning for kinome wide affinity prediction

Integrated Molecular Modeling and Machine Learning for Drug Design

Machine learning framework to predict pharmacokinetic profile of small molecule drugs based on chemical structure

A comprehensive exploration of the druggable conformational space of protein kinases using AI-predicted structures

MinKLIFSAI: a simple machine learning approach toward selective kinase inhibitor

Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors

Applications of machine learning in drug discovery and development

Novel Big Data-Driven Machine Learning Models for Drug Discovery Application

Modern machine‐learning for binding affinity estimation of protein–ligand complexes: Progress, opportunities, and challenges

KinomeMETA: meta-learning enhanced kinome-wide polypharmacology profiling

Machine Learning Small Molecule Properties in Drug Discovery

PharML.Bind: Pharmacologic Machine Learning for Protein-Ligand Interactions

Integrating Molecular Dynamics and Machine Learning Algorithms to Predict the Functional Profile of Kinase Ligands

The Development and Application of KinomePro-DL: A Deep Learning Based Online Small Molecule Kinome Selectivity Profiling Prediction Platform