Abstract
We conducted a systematic comparison of statistical methods used for the analysis of time-to-event outcomes under various proportional and non-proportional hazard (NPH) scenarios. Our study used data from recently published oncology trials to compare the Log-rank test, still by far the most widely used option, against some available alternatives, including the MaxCombo test, the Restricted Mean Survival Time difference test, the Generalized Gamma model and the Generalized F model. Power, type I error rate, and time-dependent bias with respect to the survival probability and median survival time were used to evaluate and compare the performance of these methods. In addition to the real data, we simulated three hypothetical scenarios with crossing hazards chosen so that the early and late effects “cancel out” and used them to evaluate the ability of the aforementioned methods to detect time-specific and overall treatment effects. We implemented novel metrics for assessing the time-dependent bias in treatment effect estimates to provide a more comprehensive evaluation in NPH scenarios. Recommendations under each NPH scenario are provided by examining the type I error rate, power, and time-dependent bias associated with each statistical approach.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
