MARS: A neurosymbolic approach for interpretable drug discovery

Lauren Nicole DeLong,Yojana Gadiya,Paola Galdi,Jacques D. Fleuriot,Daniel Domingo-Fernández
2024-10-14
Abstract:Neurosymbolic (NeSy) artificial intelligence describes the combination of logic or rule-based techniques with neural networks. Compared to neural approaches, NeSy methods often possess enhanced interpretability, which is particularly promising for biomedical applications like drug discovery. However, since interpretability is broadly defined, there are no clear guidelines for assessing the biological plausibility of model interpretations. To assess interpretability in the context of drug discovery, we devise a novel prediction task, called drug mechanism-of-action (MoA) deconvolution, with an associated, tailored knowledge graph (KG), MoA-net. We then develop the MoA Retrieval System (MARS), a NeSy approach for drug discovery which leverages logical rules with learned rule weights. Using this interpretable feature alongside domain knowledge, we find that MARS and other NeSy approaches on KGs are susceptible to reasoning shortcuts, in which the prediction of true labels is driven by "degree-bias" rather than the domain-based rules. Subsequently, we demonstrate ways to identify and mitigate this. Thereafter, MARS achieves performance on par with current state-of-the-art models while producing model interpretations aligned with known MoAs.
Artificial Intelligence,Machine Learning,Logic in Computer Science
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the Mechanism-of-Action (MoA) problem in Drug Discovery (DD). Specifically, the authors propose a new prediction task—MoA deconvolution—to evaluate the interpretability of Neuro-Symbolic (NeSy) methods and other Knowledge Graph (KG)-based methods. The goal of MoA deconvolution is to computationally reveal how drugs achieve their pharmacological effects through a series of physical and molecular interactions. ### Background and Challenges 1. **Current State of Drug Discovery**: - Drug discovery typically involves screening thousands of small molecule compounds. - Computational methods are widely used to accelerate this process, especially KG-based methods. - However, most state-of-the-art techniques rely on "black-box" models, lacking transparency and interpretability. 2. **Importance of Mechanism-of-Action**: - Understanding the MoA is crucial for revealing how drugs work and their potential side effects. - Traditional computational methods often rely on associative patterns rather than mechanistic patterns, thus having limited capability in revealing drug mechanisms. 3. **Advantages and Challenges of Neuro-Symbolic Methods**: - Neuro-Symbolic methods combine logical rules and neural networks, offering enhanced interpretability and better integration of domain knowledge. - However, due to the broad definition of interpretability, there is currently a lack of clear evaluation standards, especially for novel tasks like MoA deconvolution. ### Solution 1. **Proposing a New Prediction Task**: - The authors propose MoA deconvolution as a new task to evaluate interpretability. - A dedicated knowledge graph, MoA-net, is generated for benchmarking this task. 2. **Developing the MARS System**: - MARS (MoA Retrieval System) is a Neuro-Symbolic method that improves interpretability by dynamically updating rule weights. - MARS uses logical rules and learned rule weights, combined with domain knowledge, to predict drug-biological process relationships. 3. **Identifying and Mitigating Reasoning Shortcuts**: - The authors found that MARS and other Neuro-Symbolic methods are susceptible to "degree bias," where predictions rely more on node connectivity than domain knowledge. - By removing reverse edges and pruning the knowledge graph, the authors demonstrate how to identify and mitigate these reasoning shortcuts. ### Experiments and Results 1. **Performance Evaluation**: - Extensive benchmarking was conducted on MoA-net and its variants, including various baseline methods and state-of-the-art Neuro-Symbolic methods. - Results show that MARS P2H performs excellently on standard metrics and pruned metrics, with significant performance improvement after removing reverse edges. 2. **External Validation**: - MARS P2H performed excellently on a test set containing known drug-biological process pairs with known MoAs, successfully recovering all 33 known MoAs. ### Main Contributions 1. Proposing a new prediction task, MoA deconvolution, to evaluate the interpretability of Neuro-Symbolic methods and other KG-based methods. 2. Providing a dedicated knowledge graph, MoA-net, for benchmarking the MoA deconvolution task. 3. Developing the MARS system, which improves interpretability by dynamically updating rule weights. 4. Identifying a critical reasoning shortcut affecting some Neuro-Symbolic models through the high interpretability of MARS. 5. Demonstrating how to test and mitigate this reasoning shortcut, making MARS more aligned with domain knowledge.