Incorporating External Risk Information with the Cox Model under Population Heterogeneity: Applications to Trans-Ancestry Polygenic Hazard Scores

Di Wang,Wen Ye,Ji Zhu,Gongjun Xu,Weijing Tang,Matthew Zawistowski,Lars G. Fritsche,Kevin He
DOI: https://doi.org/10.48550/arXiv.2302.11123
2023-02-22
Abstract:Polygenic hazard score (PHS) models designed for European ancestry (EUR) individuals provide ample information regarding survival risk discrimination. Incorporating such information can improve the performance of risk discrimination in an internal small-sized non-EUR cohort. However, given that external EUR-based model and internal individual-level data come from different populations, ignoring population heterogeneity can introduce substantial bias. In this paper, we develop a Kullback-Leibler-based Cox model (CoxKL) to integrate internal individual-level time-to-event data with external risk scores derived from published prediction models, accounting for population heterogeneity. Partial-likelihood-based KL information is utilized to measure the discrepancy between the external risk information and the internal data. We establish the asymptotic properties of the CoxKL estimator. Simulation studies show that the integration model by the proposed CoxKL method achieves improved estimation efficiency and prediction accuracy. We applied the proposed method to develop a trans-ancestry PHS model for prostate cancer and found that integrating a previously published EUR-based PHS with an internal genotype data of African ancestry (AFR) males yielded considerable improvement on the prostate cancer risk discrimination.
Methodology,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively integrate external risk information and internal individual - level data to improve the performance of survival risk discrimination models in the presence of population heterogeneity. Specifically, the paper focuses on how to use the existing polygenic hazard score (PHS) model based on European ancestry (EUR) to improve prostate cancer risk assessment in non - European - ancestry (non - EUR) populations. Since the direct application of the EUR - based PHS model to non - EUR populations has limited utility, and building a PHS model for non - EUR populations faces problems such as small sample size, high dimension, and low signal - to - noise ratio, it is necessary to develop a method that can combine the risk information in the published prediction models based on EUR individuals with the individual - level data collected from non - EUR cohorts, automatically measure the heterogeneity between different populations, and effectively balance the contributions of each information source. To solve this problem, the authors propose a Cox model based on Kullback - Leibler (KL) information (CoxKL), which can integrate internal individual - level time - to - event data and external risk scores derived from published prediction models while taking into account population heterogeneity. By measuring the difference between external risk information and internal data through partial - likelihood KL information, the CoxKL model can improve estimation efficiency and prediction accuracy while maintaining computational efficiency. In addition, the paper also extends the CoxKL model and proposes CoxKL - LASSO, which is suitable for handling high - dimensional cross - ancestry PHS problems, such as the development of cross - ancestry PHS models for prostate cancer in African - ancestry (AFR) men.