Optimizing Data-driven Causal Discovery Using Knowledge-guided Search

Uzma Hasan,Md Osman Gani
2024-07-08
Abstract:Learning causal relationships solely from observational data often fails to reveal the underlying causal mechanisms due to the vast search space of possible causal graphs, which can grow exponentially, especially for greedy algorithms using score-based approaches. Leveraging prior causal information, such as the presence or absence of causal edges, can help restrict and guide the score-based discovery process, leading to a more accurate search. In the healthcare domain, prior knowledge is abundant from sources like medical journals, electronic health records (EHRs), and clinical intervention outcomes. This study introduces a knowledge-guided causal structure search (KGS) approach that utilizes observational data and structural priors (such as causal edges) as constraints to learn the causal graph. KGS leverages prior edge information between variables, including the presence of a directed edge, the absence of an edge, and the presence of an undirected edge. We extensively evaluate KGS in multiple settings using synthetic and benchmark real-world datasets, as well as in a real-life healthcare application related to oxygen therapy treatment. To obtain causal priors, we use GPT-4 to retrieve relevant literature information. Our results show that structural priors of any type and amount enhance the search process, improving performance and optimizing causal discovery. This guided strategy ensures that the discovered edges align with established causal knowledge, enhancing the trustworthiness of findings while expediting the search process. It also enables a more focused exploration of causal mechanisms, potentially leading to more effective and personalized healthcare solutions.
Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily aims to address two key issues in the field of causal discovery: 1. **Optimizing the data-driven causal discovery process**: Traditional greedy algorithms face an exponential growth problem when searching the space of possible causal graphs, leading to inefficient searches and high computational costs. By introducing prior knowledge to guide the search process, the number of states that need to be explored can be significantly reduced, thereby improving search efficiency. 2. **Utilizing existing knowledge to improve causal structure learning**: In fields such as healthcare, there is a wealth of prior knowledge (e.g., knowledge obtained from electronic health records, clinical trials, etc.) that can be used in the causal discovery process. However, most existing causal discovery methods rely solely on data-driven approaches and do not fully utilize this prior knowledge. This study proposes a new method called Knowledge-Guided Causal Structure Search (KGS), which aims to effectively integrate such prior knowledge into the causal discovery process. Specifically, the KGS method uses three types of prior knowledge constraints to guide the search process: - **Directed Edges**: Represent known causal relationship directions. - **Forbidden Edges**: Represent situations where no causal relationship exists. - **Undecided Edges**: Represent the existence of a causal relationship but with an unknown direction. Through extensive experimental evaluation on synthetic and real-world datasets, the paper demonstrates that the KGS method can effectively utilize these prior knowledge constraints to improve the accuracy of causal discovery, reduce the search space, and lower computational complexity. Additionally, the paper explores how large language models (such as GPT-4) can be used to extract causal prior knowledge from relevant literature, further enhancing the practicality of the KGS method.