MolAnchor – Explaining Compound Predictions Based on Substructures

Jürgen Bajorath,Alec Lamens

DOI: https://doi.org/10.26434/chemrxiv-2024-vwvph

2024-08-08

Abstract:In medicinal chemistry, the impact of machine learning remains limited if predictions are not understood, which often precludes experimental follow-up. Therefore, chemically intuitive approaches that aid in model understanding and interpretation at the molecular level of detail are sought after. While feature attribution methods quantifying feature importance for model decisions are widely used in many areas, they must typically be combined with visualization techniques, if possible, to render the results accessible from a chemical viewpoint. On the other hand, there are approaches such as counterfactuals that yield closely related chemical structures with different prediction outcomes, providing direct access to structural features that critically influence model decisions. Herein, we introduce another approach designed to rationalize chemical predictions based on molecular structure. Therefore, we adapt principles underlying the anchor concept from explainable artificial intelligence (XAI) and alter them for molecular machine learning. The resulting method, termed MolAnchor, systematically identifies substructures in test compounds that determine property predictions, thus ensuring chemical interpretability. The MolAnchor methodology is made freely to the medicinal chemistry community available as a part of our study.

Chemistry

What problem does this paper attempt to address?

The main objective of this paper is to propose a new method to explain compound prediction results based on molecular structure, called MolAnchor. Specifically, the researchers aim to address the following issues: 1. **Enhancing the interpretability of machine learning predictions**: In the field of medicinal chemistry, although the application of machine learning is increasingly widespread, understanding and interpreting the prediction results of these models remains a challenge. This limits the trust and adoption of these predictions by experimental scientists. 2. **Developing methods suitable for chemists**: Existing explanation techniques (such as feature attribution methods) can quantify the importance of features but usually require additional visualization tools to understand the results from a chemical perspective. Additionally, counterfactual methods can generate compounds that are similar to the original compound structure but have different prediction results, thereby directly revealing the key structural features that influence model decisions. 3. **Introducing the MolAnchor method**: Inspired by the concept of "anchors" in the field of explainable artificial intelligence (XAI), the researchers developed a new method—MolAnchor, to explain compound activity predictions by identifying substructures within test compounds. This method ensures chemical interpretability by decomposing compounds into substructures and systematically identifying the substructures that determine the prediction results. In summary, the main aim of this study is to enhance the interpretability and practicality of machine learning applications in the field of medicinal chemistry through the new MolAnchor method, thereby better guiding experimental design and decision-making processes.

MolAnchor – Explaining Compound Predictions Based on Substructures

ComABAN: refining molecular representation with the graph attention mechanism to accelerate drug discovery

PrefixMol: Target- and Chemistry-aware Molecule Design Via Prefix Embedding

Explainable Fragment-Based Molecular Property Attribution

Explainable machine learning predictions of dual-target compounds reveal characteristic structural features

Interpretable deep-learning pKa prediction for small molecule drugs via atomic sensitivity analysis

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

What can Attribution Methods show us about Chemical Language Models?

Explainability Techniques for Chemical Language Models

Exploiting Structural Information in Patent Specifications for Key Compound Prediction

PocketAnchor: Learning Structure-based Pocket Representations for Protein-Ligand Interaction Prediction

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

MAGNet: Motif-Agnostic Generation of Molecules from Shapes

MOL-MOE: Learning Drug Molecular Characterization Based on Mixture of Expert Mechanism

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery

Machine Learning Small Molecule Properties in Drug Discovery

Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction

Extracting structural motifs from pair distribution function data of nanostructures using explainable machine learning

Extracting medicinal chemistry intuition via preference machine learning

MEG: Generating Molecular Counterfactual Explanations for Deep Graph Networks