Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

Lola Etievant,Mitchell H. Gail
DOI: https://doi.org/10.1007/s10985-024-09621-2
2024-04-03
Lifetime Data Analysis
Abstract:The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow (Biometrics 50:1064–1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
statistics & probability,mathematics, interdisciplinary applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency of Cox proportional - hazards model estimation in case - cohort design and provide an estimation method for pure risk. Specifically, the author focuses on how to improve the efficiency of log - relative hazards and cumulative baseline hazards estimation through stratification and weight calibration, and how to handle the situation of missing second - stage data in stratified design. In addition, the paper also aims to develop the corresponding R software to facilitate researchers to use these improved design and analysis options. ### Main problems 1. **Improve estimation efficiency**: - **Stratification**: Improve the efficiency of estimation through stratified sampling. - **Weight calibration**: Further improve the efficiency of estimation by calibrating the design weights. 2. **Handle data missing**: - Provide methods for handling missing second - stage data in stratified design. 3. **Provide software tools**: - Develop the R software package `CaseCohortCoxSurvival` so that researchers can easily conduct analysis. ### Background - **Case - cohort design**: This design is very useful in large - scale cohort studies because it only requires collecting complete covariate data for cases and a random subcohort, thus reducing the workload of data collection. - **Limitations of existing methods**: Although some literature has discussed the methods of stratification and weight calibration, these methods are not common in practical applications, partly due to the lack of convenient software tools and the complexity of technical literature. ### Solutions - **Unified method**: The author proposes a unified method, using influence functions to estimate relative risk and pure risk, and considering the impact of two - stage sampling. - **R software**: Developed the R software package `CaseCohortCoxSurvival`, which supports stratification and weight calibration and can handle the situation of missing second - stage data. ### Conclusions - Through stratification and weight calibration, the efficiency of log - relative hazards and cumulative baseline hazards estimation can be significantly improved. - The provided R software package makes these improved design and analysis options easier to use, which helps to promote the application of these methods in practical research. Hope this information is helpful for you to understand the purpose and methods of this paper. If you have more questions or need further explanation, please feel free to ask.