Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis
Ray S. Lin,Ji Lin,Satrajit Roychoudhury,Keaven M. Anderson,Tianle Hu,Bo Huang,Larry F Leon,Jason J.Z. Liao,Rong Liu,Xiaodong Luo,Pralay Mukhopadhyay,Rui Qin,Kay Tatsuoka,Xuejing Wang,Yang Wang,Jian Zhu,Tai-Tsang Chen,Renee Iacona,
DOI: https://doi.org/10.1080/19466315.2019.1697738
2020-01-27
Statistics in Biopharmaceutical Research
Abstract:The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan–Meier curve-based tests (including weighted Kaplan–Meier and restricted mean survival time), and combination tests (including Breslow test, Lee's combo test, and MaxCombo test). Nine scenarios representing the PH and various non-PH patterns were simulated. The power, Type I error, and effect estimate of each method were compared. In general, all tests control Type I error well. There is not a single most powerful test across all scenarios. In the absence of prior knowledge regarding the underlying or non-PH patterns, the MaxCombo test is relatively robust across patterns. Since the treatment effect changes over time under non-PH, the overall profile of the treatment effect may not be represented comprehensively based on a single measure. Thus, multiple measures of the treatment effect should be prespecified as sensitivity analyses to describe the totality of the data. <a class="ext-link" href="https://doi.org/10.1080/19466315.2019.1697738">Supplementary materials</a> for this article are available online.
mathematical & computational biology,statistics & probability