Genetics, primary care records and lifestyle factors for short-term dynamic risk prediction of colorectal cancer: prospective study of asymptomatic and symptomatic UK Biobank participants

Samantha H Y Ip,Hannah Harrison,Juliet Usher-Smith,Matthew E Barclay,Jonathan Tyrer,Joe Dennis,Xin Yang,Michael Lush,Cristina Renzi,Nora Pashayan,Spiros Denaxas,Georgios Lyratzopoulos,Antonis C Antoniou,Angela M Wood
DOI: https://doi.org/10.1101/2023.12.21.23300244
2024-08-26
Abstract:Objectives To quantify the contributions of polygenic scores, primary care records (presenting symptoms, medical history and common blood tests) and lifestyle factors, for short-term risk prediction of colorectal cancer (CRC) in both all and symptomatic individuals. Design Prospective cohort study. Setting UK Biobank with follow-up until 2018. Participants All participants with linked primary care records (n=160,507), and a subcohort of participants with a recent (last two years) presentation of a symptom associated with CRC (n=42,782). Main outcome measures Outcome was the first recorded CRC diagnosis within two years. Dynamic risk models with time-varying predictors were derived in a super-landmark framework. Contributions to model discrimination were quantified using novel inclusion-order-agnostic Shapley values of Harrel's C-index using cross-validation. Results C-indices [95% CIs] were 0.73 [0.72-0.73] and 0.69 [0.68-0.70] for the models derived in all and symptomatic participants respectively. The Shapley contributions to model discrimination [95% CIs] differed between the two groups of participants for different predictors: 33% [25%-42%] (34% [9%-75%] in the symptomatic participants) for core predictors (e.g., age, sex, smoking), 16% [8%-26%] (8% [-21%-35%]) for polygenic scores, 32% [19%-43%] (41% [16%-73%]) for primary care blood tests, 11% [4%-17%] (9% [-25%-37%]) for primary care medical history, 6% [0%-11%] (-5% [-32%-13.4%]) for additional lifestyle factors and 3% [-2%-7%] (13% [-19%-41%]) for symptoms. Conclusions Polygenic scores contribute substantially to short-term risk prediction for CRC in both general and symptomatic populations; however, the contribution of information in primary care records (including presenting symptoms, medical history and common blood tests) is greater. There is, however, only a small contribution by the additional lifestyle risk factors which are not routinely collected in primary care.
What problem does this paper attempt to address?