Interpretable deep-learning pKa prediction for small molecule drugs via atomic sensitivity analysis

Joseph DeCorte,Benjamin Brown,Jens Meiler

DOI: https://doi.org/10.26434/chemrxiv-2024-hr692

2024-06-12

Abstract:Machine learning (ML) models play a crucial role in predicting properties essential to drug development, such as a drug’s logscale acid-dissociation constant (pKa). Despite recent architectural advances, these models often generalize poorly to novel compounds due to a scarcity of ground-truth data. Further, these models lack interpretability, in part due to a dependence on explicit encodings of input molecules’ molecular substructures. To this end, atomic-resolution information is accessible in chemical structures by observing model response to atomic perturbations of an input molecule; however, no methods exist that systematically utilize this information for model and molecular analysis. Here, we present BCL-XpKa, a substructure-independent, deep neural network (DNN)-based pKa predictor that generalizes well to novel small molecules. BCL-XpKa discretizes pKa prediction from a regression problem into a multitask-classification problem, which accumulates data for prediction at biologically relevant pH values and records the model’s uncertainty in its prediction as a discrete distribution for each pKa prediction. BCL-XpKa outperforms modern ML pKa predictors and accurately models the effects of common molecular modifications on a molecule’s ionizability. We then leverage BCL-XpKa’s substructure independence to introduce atomic sensitivity analysis (ASA), which quickly decomposes a molecule’s predicted pKa value into its respective atomic contributions without model retraining. When paired with BCL-XpKa, ASA informs that BCL-XpKa has implicitly learned high-resolution information about molecular substructures. We further demonstrate ASA’s utility in structure preparation for protein-ligand docking by identifying ionization sites in 97.8% and 83.4% of complex small molecule acids and bases. We then apply ASA with BCL-XpKa to understand the physicochemical liabilities and guide optimization of a recently published KRAS-degrading PROTAC.

Chemistry

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the accuracy and generalization ability of predicting the acid dissociation constant (pKₐ) of small - molecule drugs, while enhancing the interpretability of the model. Specifically: 1. **Improve prediction accuracy**: By developing a new multi - task classifier (BCL - XpKa), the continuous pKₐ prediction problem is transformed into a multi - task classification problem, thereby improving the prediction accuracy while reducing information loss. 2. **Enhance model generalization ability**: Traditional machine - learning methods often perform poorly when dealing with new compounds because these models rely on explicitly encoded molecular sub - structure features, which limit their adaptability to new compounds. BCL - XpKa improves the model's generalization ability for new compounds by using local atomic - environment embedding instead of relying on specific molecular sub - structures. 3. **Improve model interpretability**: In order to better understand the prediction results of the model, the paper introduces a new atomic sensitivity analysis (ASA) method. This method can quickly decompose the predicted pKₐ value of a molecule by performing atomic - level perturbations on the input molecule, thereby providing an atomic - level contribution analysis without retraining the model. 4. **Application examples**: The paper shows the applications of BCL - XpKa and ASA in actual drug design, especially when optimizing KRAS - degrading PROTAC (a small - molecule complex for targeted protein degradation). By identifying and modifying the key atoms that affect molecular ionization, the bioavailability and cell permeability of PROTAC are improved. In summary, this paper aims to solve the limitations of existing pKₐ prediction models in terms of accuracy and generalization ability by improving the architecture of the prediction model and introducing new interpretation methods, and to provide more powerful tools for drug design.

Interpretable deep-learning pKa prediction for small molecule drugs via atomic sensitivity analysis

KaMLs for Predicting Protein pKa Values and Ionization States: Are Trees All You Need?

Machine learning framework to predict pharmacokinetic profile of small molecule drugs based on chemical structure

Accurate and Rapid Prediction of Protein pKa: Protein Language Models Reveal the Sequence-pKa Relationship

An Adaptive Graph Learning Method for Automated Molecular Interactions and Properties Predictions

GR-pKa: a message-passing neural network with retention mechanism for pKa prediction

Prediction of chemical compounds properties using a deep learning model

Machine Learning Methods for Pka Prediction of Small Molecules: Advances and Challenges

Machine learning-based classification models for non-covalent Bruton's tyrosine kinase inhibitors: predictive ability and interpretability

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Leveraging our Teacher’s Experience to Improve Machine Learning: Application to pKa Prediction

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

Development and evaluation of a deep learning model for protein-ligand binding affinity prediction

Holistic Prediction of Pka in Diverse Solvents Based on Machine Learning Approach

Genome-wide Prediction of Small Molecule Binding to Remote Orphan Proteins Using Distilled Sequence Alignment Embedding

PharML.Bind: Pharmacologic Machine Learning for Protein-Ligand Interactions

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

A deep learning method for drug-target affinity prediction based on sequence interaction information mining

Machine Learning Small Molecule Properties in Drug Discovery

Docking-informed machine learning for kinome wide affinity prediction