KG-TREAT: Pre-training for Treatment Effect Estimation by Synergizing Patient Data with Knowledge Graphs

Ruoqi Liu,Lingfei Wu,Ping Zhang
2024-03-06
Abstract:Treatment effect estimation (TEE) is the task of determining the impact of various treatments on patient outcomes. Current TEE methods fall short due to reliance on limited labeled data and challenges posed by sparse and high-dimensional observational patient data. To address the challenges, we introduce a novel pre-training and fine-tuning framework, KG-TREAT, which synergizes large-scale observational patient data with biomedical knowledge graphs (KGs) to enhance TEE. Unlike previous approaches, KG-TREAT constructs dual-focus KGs and integrates a deep bi-level attention synergy method for in-depth information fusion, enabling distinct encoding of treatment-covariate and outcome-covariate relationships. KG-TREAT also incorporates two pre-training tasks to ensure a thorough grounding and contextualization of patient data and KGs. Evaluation on four downstream TEE tasks shows KG-TREAT's superiority over existing methods, with an average improvement of 7% in Area under the ROC Curve (AUC) and 9% in Influence Function-based Precision of Estimating Heterogeneous Effects (IF-PEHE). The effectiveness of our estimated treatment effects is further affirmed by alignment with established randomized clinical trial findings.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key issues in Treatment Effect Estimation (TEE). Specifically: 1. **Dependence on Limited Labeled Data**: Current TEE methods rely on limited labeled data, leading to insufficient generalization ability and accuracy in complex relationships. 2. **Sparse and High-Dimensional Observational Patient Data**: The characteristics of these data make it difficult for existing methods to capture the complex relationships between variables, treatments, and outcomes, resulting in estimation bias. 3. **Challenges in Applying Foundation Models**: Although foundation models trained on large-scale datasets can improve generalization ability, medical data typically have high dimensionality and sparsity, posing challenges for foundation models. To address these issues, the authors propose a new pre-training and fine-tuning framework—KG-TREAT, which enhances TEE by combining large-scale observational patient data with biomedical Knowledge Graphs (KGs). Unlike previous methods, KG-TREAT constructs a dual-focus knowledge graph and introduces a deep dual-layer attention synergy method to achieve deep information fusion, capable of separately encoding the relationships between treatment-covariates and outcome-covariates. Additionally, KG-TREAT includes two pre-training tasks to ensure sufficient embedding and contextualization of patient data and KGs. Experimental results show that KG-TREAT outperforms existing methods in four downstream TEE tasks, with an average improvement of 7% in the Area Under the ROC Curve (AUC) and 9% in Influence Function Precision Estimation of Heterogeneous Effects (IF-PEHE). Furthermore, the treatment effects estimated by this model are consistent with results from established Randomized Controlled Trials (RCTs), further validating its effectiveness in practical applications.