MolAnchor – Explaining Compound Predictions Based on Substructures

Jürgen Bajorath,Alec Lamens
DOI: https://doi.org/10.26434/chemrxiv-2024-vwvph
2024-08-08
Abstract:In medicinal chemistry, the impact of machine learning remains limited if predictions are not understood, which often precludes experimental follow-up. Therefore, chemically intuitive approaches that aid in model understanding and interpretation at the molecular level of detail are sought after. While feature attribution methods quantifying feature importance for model decisions are widely used in many areas, they must typically be combined with visualization techniques, if possible, to render the results accessible from a chemical viewpoint. On the other hand, there are approaches such as counterfactuals that yield closely related chemical structures with different prediction outcomes, providing direct access to structural features that critically influence model decisions. Herein, we introduce another approach designed to rationalize chemical predictions based on molecular structure. Therefore, we adapt principles underlying the anchor concept from explainable artificial intelligence (XAI) and alter them for molecular machine learning. The resulting method, termed MolAnchor, systematically identifies substructures in test compounds that determine property predictions, thus ensuring chemical interpretability. The MolAnchor methodology is made freely to the medicinal chemistry community available as a part of our study.
Chemistry
What problem does this paper attempt to address?
The main objective of this paper is to propose a new method to explain compound prediction results based on molecular structure, called MolAnchor. Specifically, the researchers aim to address the following issues: 1. **Enhancing the interpretability of machine learning predictions**: In the field of medicinal chemistry, although the application of machine learning is increasingly widespread, understanding and interpreting the prediction results of these models remains a challenge. This limits the trust and adoption of these predictions by experimental scientists. 2. **Developing methods suitable for chemists**: Existing explanation techniques (such as feature attribution methods) can quantify the importance of features but usually require additional visualization tools to understand the results from a chemical perspective. Additionally, counterfactual methods can generate compounds that are similar to the original compound structure but have different prediction results, thereby directly revealing the key structural features that influence model decisions. 3. **Introducing the MolAnchor method**: Inspired by the concept of "anchors" in the field of explainable artificial intelligence (XAI), the researchers developed a new method—MolAnchor, to explain compound activity predictions by identifying substructures within test compounds. This method ensures chemical interpretability by decomposing compounds into substructures and systematically identifying the substructures that determine the prediction results. In summary, the main aim of this study is to enhance the interpretability and practicality of machine learning applications in the field of medicinal chemistry through the new MolAnchor method, thereby better guiding experimental design and decision-making processes.