Mining Medical Causality for Diagnosis Assistance.
Sendong Zhao
DOI: https://doi.org/10.1145/3018661.3022752
2017-01-01
Abstract:In the medical context, causal knowledge usually refers to causal relations between diseases and symptoms, living habits and diseases, symptoms which get better and therapy, drugs and side-effects, etc [3]. All these causal relations are usually in medical literature, forum and clinical cases and compose the core part of medical diagnosis. Therefore, mining these causal knowledge to predict disease and recommend therapy is of great value for assisting patients and professionals. The task of mining these causal knowledge for diagnosis assistance can be decomposed into four constitutes: (1) mining medical causality from text; (2) medical treatment effectiveness measurement; (3) disease prediction and (4) explicable medical treatment recommendation. However, these tasks have never been systemically studied before. For my PhD thesis, I plan to formally define the problem of mining medical domain causality for diagnosis assistance and propose methods to solve this problem. 1. Ming these textual causalities can be very useful for discovering new knowledge and making decisions. Many studies have been done for causal extraction from the text [1, 4, 5]. However, all these studies are based on pattern or causal triggers, which greatly limit their power to extract causality and rarely consider the frequency of co-occurrence and contextual semantic features. Besides, none of them take the transitivity rules of causality leading to reject those causalities which can be easily get by simple inference. Therefore, we formally define the task of mining causality via frequency of event co-occurrence, semantic distance between event pairs and transitivity rules of causality, and present a factor graph to combine these three resources for causality mining. 2. Treatment effectiveness analysis is usually taken as a subset of causal analysis on observational data. For such real observational data, PSM and RCM are two dominant methods. On one hand, it is usually difficult for PSM to find the matched cases due to the sparsity of symptom. On the other hand, we should check every possible (symptom, treatment) pair by exploiting RCM, leading to make the characteristic of exploding up, especially when we want to check the causal relation between a combination of symptoms and a combination of drugs. Besides, the larger number of symptom or treatment in the combination the less number of patient case retrieved, which lead to the lack of statistical significance. Specifically, patients tend to take tens of herbs as the treatment each time in Traditional Chinese Medicine (TCM). Therefore, how to evaluate the effectiveness of herbs separately and jointly is really a big challenge. This is also a very fundamental research topic supporting many downstream applications. 3. Both hospitals and on-line forums have accumulated sheer amount of records, such as clinical text data and online diagnosis Q&A pairs. The availability of such data in large volume enables automatic disease prediction. There are some papers on disease prediction with electronic health record (EHR) [2], but the research on disease prediction with raw symptoms is still necessary and challenging. Therefore, we propose a general new idea of using the rich contextual information of diseases and symptoms to bridge the gap of disease candidates and symptoms, and detach it from the specific way of implementing the idea using network embedding. 4. Recommendation in medical domain is usually a decision-making issue, which requires the ability of explaining "why". The ability of explaining "why" are basically from two paths. Consider the recommendation suggest you eat more vegetables. You probably do not believe it if there is nothing attached. But if the recommendation gives the literally reasons why eating more vegetables is good you might like to take this suggestion. Consider another scenario, if the recommendation gives you the data of the contrast which show that people who eat more vegetables are healthier than those eat less, it is certain that you also want to take this recommendation. Based on these two intuitions, we present a recommendation model based on proofs which are either literally reasons or difference from contrast. This work was supported by the 973 program (No. 2014CB340503) and the NSFC (No. 61133012 and No. 61472107).