KG-TREAT: Pre-training for Treatment Effect Estimation by Synergizing Patient Data with Knowledge Graphs

Ruoqi Liu,Lingfei Wu,Ping Zhang

2024-03-06

Abstract:Treatment effect estimation (TEE) is the task of determining the impact of various treatments on patient outcomes. Current TEE methods fall short due to reliance on limited labeled data and challenges posed by sparse and high-dimensional observational patient data. To address the challenges, we introduce a novel pre-training and fine-tuning framework, KG-TREAT, which synergizes large-scale observational patient data with biomedical knowledge graphs (KGs) to enhance TEE. Unlike previous approaches, KG-TREAT constructs dual-focus KGs and integrates a deep bi-level attention synergy method for in-depth information fusion, enabling distinct encoding of treatment-covariate and outcome-covariate relationships. KG-TREAT also incorporates two pre-training tasks to ensure a thorough grounding and contextualization of patient data and KGs. Evaluation on four downstream TEE tasks shows KG-TREAT's superiority over existing methods, with an average improvement of 7% in Area under the ROC Curve (AUC) and 9% in Influence Function-based Precision of Estimating Heterogeneous Effects (IF-PEHE). The effectiveness of our estimated treatment effects is further affirmed by alignment with established randomized clinical trial findings.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address several key issues in Treatment Effect Estimation (TEE). Specifically: 1. **Dependence on Limited Labeled Data**: Current TEE methods rely on limited labeled data, leading to insufficient generalization ability and accuracy in complex relationships. 2. **Sparse and High-Dimensional Observational Patient Data**: The characteristics of these data make it difficult for existing methods to capture the complex relationships between variables, treatments, and outcomes, resulting in estimation bias. 3. **Challenges in Applying Foundation Models**: Although foundation models trained on large-scale datasets can improve generalization ability, medical data typically have high dimensionality and sparsity, posing challenges for foundation models. To address these issues, the authors propose a new pre-training and fine-tuning framework—KG-TREAT, which enhances TEE by combining large-scale observational patient data with biomedical Knowledge Graphs (KGs). Unlike previous methods, KG-TREAT constructs a dual-focus knowledge graph and introduces a deep dual-layer attention synergy method to achieve deep information fusion, capable of separately encoding the relationships between treatment-covariates and outcome-covariates. Additionally, KG-TREAT includes two pre-training tasks to ensure sufficient embedding and contextualization of patient data and KGs. Experimental results show that KG-TREAT outperforms existing methods in four downstream TEE tasks, with an average improvement of 7% in the Area Under the ROC Curve (AUC) and 9% in Influence Function Precision Estimation of Heterogeneous Effects (IF-PEHE). Furthermore, the treatment effects estimated by this model are consistent with results from established Randomized Controlled Trials (RCTs), further validating its effectiveness in practical applications.

KG-TREAT: Pre-training for Treatment Effect Estimation by Synergizing Patient Data with Knowledge Graphs

CURE: A Pre-training Framework on Large-scale Patient Data for Treatment Effect Estimation

SubgroupTE: Advancing Treatment Effect Estimation with Subgroup Identification

Leveraging Representation Learning for the Construction and Application of a Knowledge Graph for Traditional Chinese Medicine: Framework Development Study

Enhancing predictive imaging biomarker discovery through treatment effect analysis

Estimating the Treatment Effects of Multiple Drug Combinations on Multiple Outcomes in Hypertension

Deep Representation Learning for Individualized Treatment Effect Estimation Using Electronic Health Records.

Treatment effect prediction with adversarial deep learning using electronic health records

Learning Decomposed Representations for Treatment Effect Estimation

Emulate randomized clinical trials using heterogeneous treatment effect estimation for personalized treatments: Methodology review and benchmark

Combining the External Medical Knowledge Graph Embedding to Improve the Performance of Syndrome Differentiation Model

Augmented Learning of Heterogeneous Treatment Effects via Gradient Boosting Trees

Lesions of the esophagus in infants and children.

Doubly Robust Targeted Estimation of Conditional Average Treatment Effects for Time-to-event Outcomes with Competing Risks

GraphITE: Estimating Individual Effects of Graph-structured Treatments

Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks

Some methods for heterogeneous treatment effect estimation in high-dimensions

Estimating Interpretable Heterogeneous Treatment Effect with Causal Subgroup Discovery in Survival Outcomes

Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts

Learning optimal biomarker‐guided treatment policy for chronic disorders