A pancreatic cancer risk prediction model (Prism) developed and validated on large-scale US clinical data

Kai Jia,Steven Kundrot,Matvey B Palchuk,Jeff Warnick,Kathryn Haapala,Irving D Kaplan,Martin Rinard,Limor Appelbaum
DOI: https://doi.org/10.1016/j.ebiom.2023.104888
IF: 11.205
EBioMedicine
Abstract:Background: Pancreatic Duct Adenocarcinoma (PDAC) screening can enable early-stage disease detection and long-term survival. Current guidelines use inherited predisposition, with about 10% of PDAC cases eligible for screening. Using Electronic Health Record (EHR) data from a multi-institutional federated network, we developed and validated a PDAC RISk Model (Prism) for the general US population to extend early PDAC detection. Methods: Neural Network (PrismNN) and Logistic Regression (PrismLR) were developed using EHR data from 55 US Health Care Organisations (HCOs) to predict PDAC risk 6-18 months before diagnosis for patients 40 years or older. Model performance was assessed using Area Under the Curve (AUC) and calibration plots. Models were internal-externally validated by geographic location, race, and time. Simulated model deployment evaluated Standardised Incidence Ratio (SIR) and other metrics. Findings: With 35,387 PDAC cases, 1,500,081 controls, and 87 features per patient, PrismNN obtained a test AUC of 0.826 (95% CI: 0.824-0.828) (PrismLR: 0.800 (95% CI: 0.798-0.802)). PrismNN's average internal-external validation AUCs were 0.740 for locations, 0.828 for races, and 0.789 (95% CI: 0.762-0.816) for time. At SIR = 5.10 (exceeding the current screening inclusion threshold) in simulated model deployment, PrismNN sensitivity was 35.9% (specificity 95.3%). Interpretation: Prism models demonstrated good accuracy and generalizability across diverse populations. PrismNN could find 3.5 times more cases at comparable risk than current screening guidelines. The small number of features provided a basis for model interpretation. Integration with the federated network provided data from a large, heterogeneous patient population and a pathway to future clinical deployment. Funding: Prevent Cancer Foundation, TriNetX, Boeing, DARPA, NSF, and Aarno Labs.
What problem does this paper attempt to address?