Automated Reasoning in Systems Biology: a Necessity for Precision Medicine

Pedro Zuidberg Dos Martires,Vincent Derkinderen,Luc De Raedt,Marcus Krantz
2024-10-17
Abstract:Recent developments in AI have reinvigorated pursuits to advance the (life) sciences using AI techniques, thereby creating a renewed opportunity to bridge different fields and find synergies. Headlines for AI and the life sciences have been dominated by data-driven techniques, for instance, to solve protein folding with next to no expert knowledge. In contrast to this, we argue for the necessity of a formal representation of expert knowledge - either to develop explicit scientific theories or to compensate for the lack of data. Specifically, we argue that the fields of knowledge representation (KR) and systems biology (SysBio) exhibit important overlaps that have been largely ignored so far. This, in turn, means that relevant scientific questions are ready to be answered using the right domain knowledge (SysBio), encoded in the right way (SysBio/KR), and by combining it with modern automated reasoning tools (KR). Hence, the formal representation of domain knowledge is a natural meeting place for SysBio and KR. On the one hand, we argue that such an interdisciplinary approach will advance the field SysBio by exposing it to industrial-grade reasoning tools and thereby allowing novel scientific questions to be tackled. On the other hand, we see ample opportunities to move the state-of-the-art in KR by tailoring KR methods to the field of SysBio, which comes with challenging problem characteristics, e.g. scale, partial knowledge, noise, or sub-symbolic data. We stipulate that this proposed interdisciplinary research is necessary to attain a prominent long-term goal in the health sciences: precision medicine.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to provide the necessary system - level understanding for precision medicine by integrating techniques in the fields of Knowledge Representation (KR) and Systems Biology (SysBio). Specifically, the authors believe that current modeling methods and techniques have limitations when dealing with Signal Transduction Networks (STNs) and cannot meet the needs of precision medicine. Therefore, they advocate using formal knowledge representation methods to compensate for data insufficiency or data noise, and propose logical formulas (including probabilistic and neural extensions) as the natural intersection between the two. ### Main problem decomposition 1. **Lack of system - level understanding**: - Precision medicine requires obtaining a systematic functional understanding from molecular - level observations. However, current research progress is relatively slow, partly due to the complexity of the research methods themselves and the lack of formality in the languages used to represent and reason about biomedical knowledge. 2. **Limitations of existing techniques**: - Current systems biology modeling techniques (such as micro - state, component - level, and reaction - dependent formalisms) face challenges when dealing with large - scale signal transduction networks. For example, the micro - state model needs to enumerate all possible state combinations, which is infeasible in complex biological systems; and the component - level formalism loses crucial detailed information. 3. **Combination of knowledge representation and automated reasoning**: - To overcome the above problems, the authors emphasize the importance of introducing KR techniques into SysBio. In particular, logical formulas can be an effective tool to represent and reason about the complex behaviors of biological systems, thereby better handling noisy data and partially known information. 4. **Application requirements of precision medicine**: - The ultimate goal is to achieve precision medicine through these improved technical means, that is, to customize personalized treatment plans according to the medical history and genetic background of individual patients. To this end, it is necessary to develop models that can handle large - scale, highly complex biological signal transduction networks, and these models must be able to accurately describe the behavior of the system at different resolutions. ### Formula examples The logical formulas mentioned in the paper are used to represent the protein phosphorylation process: \[ R_{t + 1}\leftrightarrow\neg S_{1,t}\land ATP_t \] \[ S_{1,t + 1}\leftrightarrow S_{1,t}\lor R_t \] where: - \(R_t\) represents the reaction that occurs at time step \(t\). - \(S_{1,t}\) represents whether the specific position (site 1) of the protein is phosphorylated at time step \(t\). - \(ATP_t\) represents whether there are ATP molecules present at time step \(t\). These formulas describe the relationship between the states of reaction \(R\) and site \(S_1\) over time. ### Summary The core problem of the paper is to explore how to overcome the limitations of current modeling methods by combining the techniques of KR and SysBio to achieve the system - level understanding required for precision medicine. The authors propose using logical formulas and other KR tools as solutions and emphasize the importance of interdisciplinary cooperation.