Abstract
Background:
Progression-free survival (PFS) has been adopted as the primary endpoint in many randomized controlled trials, and can be determined much earlier than overall survival (OS). We investigated whether PFS is a good surrogate endpoint for OS in trials of first-line treatment for epithelial ovarian cancer (EOC), and whether this relationship has changed with the introduction of new treatment types.
Methods:
In a meta-analysis, we identified summary data [hazard ratio (HR) and median time] from published randomized controlled trials. Linear regression was used to assess the association between treatment effects on PFS and OS overall, and for subgroups defined by treatment type, postprogression survival (PPS) and established prognostic factors.
Results:
Correlation between HRs for PFS and OS, in 26 trials with 30 treatment comparisons comprising 24,870 patients, was modest (r2 = 0.52, weighted by trial sample size). The correlation diminished with recency: preplatinum/paclitaxel era, r2 = 0.66; platinum/paclitaxel, r2 = 0.44; triplet combinations, r2 = 0.22; biologicals, r2 = 0.30. The median PPS increased over time for the experimental (Ptrend = 0.03) and control arms (Ptrend = 0.003). The difference in median PPS between treatment arms strongly correlated with the difference in median OS (r2 = 0.83). In trials where the control therapy had median PPS of less than 18 months, correlation between PFS and OS was stronger (r2 = 0.64) than where the median PPS was longer (r2 = 0.48).
Conclusions:
In EOC, correlation in the relative treatment effect between PFS and OS in first-line platinum-based chemotherapy randomized controlled trials is moderate and has weakened with increasing availability of effective salvage therapies.
Introduction
Epithelial ovarian cancer (EOC) remains a highly lethal disease, despite improvements in treatment over the last three decades that have increased the median survival but not the proportion of women cured. 1 Most patients with stage III disease relapse within 2 years after debulking surgery and platinum-based chemotherapy, and more than half die within 5 years. 2 There is an urgent need to accelerate development of active new treatments.
Overall survival (OS) has traditionally been regarded as the gold standard primary endpoint for phase III randomized controlled trials evaluating the efficacy of new treatments for EOC.3,4 Demonstrating an improvement in OS requires trials to be larger, with longer follow up, and hence more cost. Most patients now receive multiple postprogression treatments, including chemotherapy, biological-targeted therapies and surgery, which can significantly confound and dilute the effects of the investigational therapy on the OS endpoint, 5 and could impede the development of new potentially active therapies. Progression-free survival (PFS) can be determined earlier than OS and has potential both as an independent, valid endpoint and a potential surrogate for OS in certain circumstances. PFS is unaffected by postprogression therapies and may provide earlier evidence of efficacy of new treatments, which can expedite regulatory approval. The consensus of the Gynaecological Cancer InterGroup (GCIG), which includes 29 academic international trials groups, was that while OS remains the gold standard for demonstrating benefit in first-line trials, PFS assessed using validated assessment tools is a valid primary endpoint for phase III trials of first-line therapies for ovarian cancer. 6 Furthermore, the GCIG statement recognizes that differences in OS may be increasingly difficult to demonstrate in first-line trials given the availability of active therapies following progression.
In patients with recurrent ovarian cancer, other goals of treatment, including time to treatment failure, improvement in cancer-related symptoms and delaying time to subsequent therapy, are important in considering the benefit of new therapies, apart from improvement in survival outcomes.
While recognizing that PFS may be a valid endpoint in its own right, it is of interest to consider to what extent improving PFS would be expected to translate to a benefit in OS at a trial level. Furthermore, for future first-line trials in EOC, the value of these survival endpoints for determining the benefit of new therapeutics remains important. Evaluation of the surrogacy relationship between PFS and OS at a trial level will continue to have value in guiding future trial design.
Since previous work evaluating the relationship between PFS and OS in first-line trials of EOC,7,8 multiple new trials have been conducted with active agents subsequently available in clinical practice. We therefore performed a new literature-based meta-analysis with the primary objective of quantifying the strength of the relationship between the relative treatment effects on PFS and OS in phase III randomized controlled trials of first-line treatments for EOC. We further evaluated, as secondary objectives, the potential impact of the increased availability and number of salvage therapies over time, the duration of PPS, and the impact of known prognostic factors on the relationship between PFS and OS.
Methods
Search strategy
We searched MEDLINE, EMBASE and the Central Registry of Controlled Trials of the Cochrane Library (1 January 1996–30 June 2012) using search terms ‘ovarian neoplasms’ or ‘ovarian cancer/carcinoma’, ‘chemotherapy’ and ‘clinical trials’ (supplemental file S1). The search strategy was limited to studies in humans and in the English language. Conference proceedings, references of relevant review articles, citations of included studies, and trial cooperative-group websites were hand searched.
Study selection
All randomized phase III trials of first-line therapy in patients with stages IC–IV EOC in which the treatment and intervention arms contained a platinum chemotherapy backbone were eligible for inclusion. Trials that included planned interval debulking were allowed. Trials were required to report relative treatment effects for both PFS and OS. If these data were incomplete, trials were still included if sufficient information could be retrieved from published Kaplan–Meier curves. Trials of maintenance therapies or high-dose chemotherapy with stem cell rescue were ineligible.
Availability of anticancer agents for recurrent EOC over time
The timing of the availability of anticancer treatments for recurrent EOC was recorded as the year of approval by the United States Food and Drug Administration (US FDA) for any clinical indication, as recorded on its website. 9 Data were collected only for treatments with demonstrated activity in EOC 10 that could potentially be used for treatment of recurrent disease.
Data extraction
For each included trial, we extracted the trial name, year of publication or conference presentation, summary statistics of clinicopathologic characteristics (stage, performance status, extent of debulking), type, and median duration of chemotherapy per treatment arm. We also recorded the number of patients who were randomized and who progressed and died, for each treatment arm. We extracted data for hazard ratios (HRs) and 95% confidence intervals (CIs), and median OS and PFS durations. In some trials, where cases of death from causes other than ovarian cancer were censored observations, time to progression was used as the surrogate endpoint instead of PFS. In this review, we considered time to progression and PFS as interchangeable endpoints, given that most patients with advanced ovarian cancer survive beyond the first relapse.
Data on adjusted HRs were used in preference to unadjusted HRs whenever both results were available. In cases where multiple publications of the same trial were available, the results with maximum follow up were used. In trials where there were more than two treatment arms, we obtained the HRs and 95% CIs from the pairwise comparison between the experimental treatments against a common control therapy, and we treated each comparison independently. If HRs and CIs were not reported, they were estimated using the methods described by Parmar and colleagues. 11
Data were extracted independently by two authors (KS, SL), and discrepancies were resolved by consensus. Preferred reporting items for systematic reviews and meta-analyses (PRISMA) reporting guidelines were followed for applicable items and the study selection process was summarized in a flow diagram (Figure 1). 12 Publication bias is not a major consideration for this analysis and was not assessed.

PRISMA diagram/flow chart.
Statistical analysis
Because a larger difference in treatment effect for PFS (surrogate endpoint) is assumed biologically to translate into a larger difference in OS (true endpoint), a linear model was fitted by the use of ordinary least-squares regression. We inspected residual versus predicted plots and performed diagnostic tests for normality and heteroscedasticity (nonconstant error variance) to assess consistency with the assumptions of linear regression. All analyses were performed unweighted and then weighted by trial size.
We reported r2, the trial-level correlation coefficient, between PFS and OS, both unweighted and weighted by trial size, as derived from the regression models. Any r2 value of 0.72 or greater was considered a strong correlation, and r2 from 0.49 to less than 0.72 was considered modest correlation. 7 The 95% CIs of r2 values were obtained by the bootstrap method with 1000 replications.
Subgroup analyses were also carried out for trials that examined different treatment paradigms from different eras: before the use of platinum or paclitaxel as control therapies, when platinum and taxanes were used as control therapies, and in the trials exploring triplet therapies and biological therapies. We also classified these trials into subgroups on the basis of the median distributions according to the proportion of patients with different prognostic characteristics (stage, performance status, and extent of debulking). Sensitivity analyses were performed to evaluate the extent to which the relationship changed with differing proportions of established baseline prognostic factors.
We also tested for the correlation between the difference in median PPS of experimental versus control treatment arms and the difference in median OS. The median PPS of a treatment arm was defined as the difference between the median OS and the median PFS. The difference in the median PPS between the treatment arms for trials conducted at different times was examined by classifying trials by the year of the first patient accrual, or if this was not available, the year of the first trial publication. Differences in associations between the HR for PFS and the HR for OS were also evaluated for PPS at the cutoff point of 18 months for the control therapy. This cutoff point was chosen on the basis of a prior study of simulated data, 13 which reported a strong correlation for PPS less than 18 months and a moderate to weak correlation for PPS of 18 months or longer.
We performed sensitivity analyses to examine the impact on the overall results of excluding trials of: (1) intraperitoneal treatment, given that participants in these trials were likely to have complete surgical debulking and hence an overall better prognosis; and (2) biological therapies, as many trials in other advanced cancers had shown a significant relative PFS advantage but no OS difference.
Analyses used STATA, version 14 (StataCorp: College Station, TX, USA)
Results
In total, 26 trials with 30 treatment comparisons and comprising 24,870 patients were included (Figure 1 and Table 1). Most of the patients in these studies had advanced EOC (median of rates for all treatment arms 72.5% stage III, 17% stage IV). Overall, two studies14,15 contained multiple comparisons among different experimental therapies and a common control arm. There were twotrials15,16 of biological therapies and another two trials17,18 of intraperitoneal therapy. In total, seven comparisons reported an improvement in PFS (upper limit of the 95% CI for HR <1.00 or reported p < 0.05) and four comparisons reported an improvement in OS (Table 1). In five trials, at least one HR was not reported and had to be calculated.19–23 One trial used time to tumour progression in place of PFS. 19
26 trials and 30 comparisons included in the analysis (including biologics).
Year of publication.
Adjusted hazard ratio reported.
Hazard ratio extrapolated from available information.
Time to tumour progression reported.
Result not given or able to be extracted.
Trials of intraperitoneal therapies. All treatments were given intravenously except where indicated.
alt, alternating; bev, bevacizumab; CAP, cyclophosphamide, adriamycin, and cisplatin; carbo, carboplatin; CI, confidence interval; cis, cisplatin; cyclo, cyclophosphamide; HR, hazard ratio; IP, intraperitoneal; IV, intravenous; OS, overall survival; PFS, progression-free survival; PLD, pegylated liposomal doxorubicin; PT, cisplatin/taxol; NR, not reached; tax, paclitaxel; TC, taxol/carbo; TEC, taxol/epirubicin/carbo.
Figure 2 is a plot of the HR for PFS versus the HR for OS. The notable outlier was a trial comparing cisplatin-paclitaxel with cisplatin-cyclophosphamide, the first to compare two platinum combinations and to include a platinum-taxane combination. Both PFS and OS were significantly better in the experimental arm. 24 Another outlier trial compared cisplatin-paclitaxel with carboplatin-paclitaxel, and reported a nonsignificant difference between the treatment arms for both PFS and OS. 28 When all trials were included the correlation between HRs for PFS and OS was moderate (unweighted r2, 0.53, 95% CI 0.23–0.72; r2 weighted by sample size, 0.52, 95% CI 0.30–0.67).

Correlation between hazard ratios for progression-free and overall survival (all trials). The linear regression line is shown. The circles indicate the weighting according to trial size.
Data on PPS available from 22 treatment comparisons showed a trend to an increasing median PPS over time for both the experimental (Ptrend = 0.03) and control arms (Ptrend = 0.003) [Figure 3(a)]. The difference in median PPS between treatment arms strongly correlated with the difference in median OS [unweighted r2, 0.75; 95% CI 0.36–0.92; r2 weighted by sample size, 0.83, 95% CI 0.58–0.92; Figure 3(b)]. Details of post progression therapy were reported for five trials.24,26,27,29,34,40

(a). Median postprogression survival by treatment arm over time. The lines show predicted relationships in the experimental arm (solid line) and the control arm (dashed line). The weights according to trial size are shown by squares in the experimental arm and circles in the control arm.
Correlations between HRs for PFS and OS varied for different treatment eras (Figure 4): preplatinum/taxane (n = 8; unweighted r2, 0.61, 95% CI 0.01–0.90; r2 weighted by sample size, 0.66, 95% CI 0.02–0.96), platinum/paclitaxel (n = 11; unweighted r2, 0.44, 95% CI 0.01–0.77; r2 weighted by sample size, 0.44, 95% CI 0.01–0.77), triplet combination therapies (n = 7; unweighted r2, 0.25, 95% CI 0.00–0.66; r2 weighted by sample size, 0.22, 95% CI 0.00–0.66), and novel therapies (n = 4; unweighted r2, 0.21, 95% CI 0.00–1.00; r2 weighted by sample size, 0.30, 95% CI 0.00–0.56) Correlations between HRs for PFS and OS also varied according to PPS.

Correlation between hazard ratios for progression-free and overall survival by treatment regimen in different eras, weighted by sample size: (a) preplatinum/paclitaxel; (b) platinum/paclitaxel; (c) triplet combinations; (d) biological and other novel therapies.
In trials (n = 8) where the median PPS was less than 18 months with control therapy, the correlation was higher (unweighted r2, 0.55, 95% CI 0.01–0.98; r2 weighted by sample size, 0.64, 95% CI 0.00–0.98) than those trials (n = 18) in which the median PPS was at least 18 months (unweighted r2, 0.59, 95% CI 0.32–0.85; r2 weighted by sample size, 0.48, 95% CI 0.14–0.71; Figure 5).

Correlations between hazard ratios for progression-free and overall survival according to postprogression survival. (a) Median postprogression survival less than 18 months. (b) Median postprogression survival at least 18 months.
In subgroup analyses, trials that included 10% or more patients (median distribution of trial populations) with Eastern Cooperative Oncology Group (ECOG) performance status ⩾2 had stronger correlation between PFS and OS (n = 9; unweighted r2, 0.79, 95% CI 0.12–0.76; r2 weighted by sample size, 0.76, 95% CI 0.14–0.74) than trials with less than 10% performance status ⩾2 patients (n = 18; unweighted r2, 0.52, 95% CI 0.04–0.94; r2 weighted by sample size, 0.53, 95% CI 0.04–0.94; Figure 6). When trials with more patients with stage IV disease (18% or greater of trial populations; median distribution of trial populations) were compared with those with fewer patients (less than 18% of trial population with stage IV disease), the correlations were similar (r2 weighted by sample size, 0.49 versus 0.48)

Correlations between hazard ratios for progression-free and overall survival according to the proportion of patients with poor performance status. (a) Fewer than 10% of patients with Eastern Cooperative Oncology Group performance status ⩾2; (b) 10% or more patients with Eastern Cooperative Oncology Group performance status ⩾2.
Table 2 lists the year of US FDA approval of anticancer agents with clinical activity in EOC. Since paclitaxel was approved in 1992, the number of active agents has almost doubled, expanding the options for subsequent lines of therapies beyond the initial trial therapy.
Available salvage therapies for recurrent ovarian cancer.
US FDA, United States Food and Drug Administration
In sensitivity analyses, excluding trials of intraperitoneal treatment (unweighted r2, 0.49, 95% CI 0.18–0.69; r2 weighted by sample size, 0.49, 95% CI 0.26–0.66), and trials of biological therapies (unweighted r2, 0.58, 95% CI 0.27–0.77; r2 weighted by sample size, 0.58, 95% CI 0.32–0.76), did not change the overall results significantly.
Discussion
For PFS to be useful as a surrogate endpoint at trial level, a strong correlation between the relative treatment effects on PFS and OS is required. 41 Correlations between PFS and OS have been stronger in studies examining a limited number of EOC trials that included contemporary standard platinum-based therapies, (r2 ranges from 0.85 42 to 0.947) but not more recent trials, particularly those including biological-targeted and other novel therapies. Moreover, in two different trials conducted almost 10 years apart, the median PPS in EOC almost doubled in cohorts of patients treated with the same therapy of carboplatin-gemcitabine.43,44 We sought to address this question given its important implications for future trial design, selection of endpoints, drug approvals by regulatory bodies, and healthcare funding. 41
In clinical trials of advanced EOC, there was only a moderate correlation (r2 = 0.52) between the treatment effects on PFS and OS. When the correlations were examined for different treatment paradigms based on clinical trials conducted in different eras, the strength of the relationship between the HRs for PFS and OS was less for more recent regimens. Our finding of a significant trend to an increase in the median PPS over time and a strong correlation (r2 = 0.83 (weighted)) between the relative effects of treatment on PPS and OS supports the hypothesis that postprogression therapy can dilute the relationship between PFS and OS. This analysis is limited by the inability to adjust for baseline characteristics in the absence of individual patient data. It is therefore best considered hypothesis generating, with the aim of encouraging further research.
The results of this study differ from the findings of earlier studies, which reported strong correlations in relative treatment effect between PFS and OS.7,42 One possible explanation for this difference might be changes in the definition of PFS over time. Before 2000, World Health Organization criteria 45 or clinical progression criteria were used to define disease progression in clinical trials. In some of the earlier trials, a second-look laparotomy was planned,25,27 or was reported to have occurred, 28 and the extent to which the laparotomy findings influenced assessment of progression is unclear from published information. Since then, new guidelines to evaluate the response to treatment and to define progression using both imaging and CA125 levels have been introduced and widely adopted in EOC trials.46,47
It is more likely that the impact and greater availability of more effective salvage therapies explain the dilution of the previously observed relationship between the relative effects of treatment on PFS and OS. Few of the trials included in this study provided any details of postprogression therapies or the proportion of patients who crossed over to receive the active experimental therapy at progression. Of all the included trials, only a single study 24 of the six published before 2000 showed a statistically significant benefit of the experimental treatment over control for PFS. In contrast, 6 trials or comparisons15,17,22,39,48 of 18 published after 2000 reported a statistically significant benefit in favour of the experimental treatment.
The duration of PPS affects the relationship between the relative treatment effects for PFS and OS. Broglio and Berry 13 used simulated data to demonstrate that the probability of a statistical significant difference in OS between treatment arms lessens with increasing duration of PPS, despite a statistical difference for PFS. Our results in EOC trials support the findings of Broglio and Berry (Figure 3), although our results are limited by reliance on events occurring following randomization, and should therefore be considered exploratory.
It is possible that improved imaging modalities and the increasing use of CA125 to define progression could result in earlier detection of disease recurrence and hence inflate PPS in the more recently conducted trials. However, we do not believe that these factors alone would account for all the improvement in PPS. Availability of effective salvage therapies remains the most likely explanation for the increased PPS over time. This is supported by the differing results of two second-line studies conducted almost a decade apart, the Oceans trial 43 and the AGO-OVAR2.25 trial. 44 Both had a carboplatin-gemcitabine arm. The PFS with carboplatin-gemcitabine in the AGO trial was 8.4 months and in the Oceans trial it was 8.6 months, but the median OS was respectively 18.0 and 32.9 months. The eligibility criteria were very similar, but in the Oceans trial patients had a median of 5 (range 1–14) lines of subsequent treatment, which almost certainly accounted for the significantly longer PPS after second-line therapy.
Our hypothesis of the influence of salvage therapies diluting the relationship between relative treatment effects on PFS and OS is further supported by sensitivity analyses of trials that included a greater proportion of patients with an ECOG performance status of 2. In trials with 10% or more patients with performance status ⩾2, PFS and OS correlated more strongly than in those with less than 10%. We speculate that patients with a poor performance status were less likely to receive second-line salvage therapies, and therefore the relationship between the relative treatment effects on PFS and OS was not compromised.
Our work has a number of limitations. Published summary data, instead of individual patient data, means analyses could not be adjusted for baseline prognostic factors that affect OS or for the number and type of salvage therapies used after initial disease progression. We were also unable to examine the individual patient-level correlations between PFS and OS, which would require individual patient data. Our work is limited to clinical trials of platinum-based chemotherapies because these treatments are considered optimal and standard first-line therapy for advanced EOC. 10 The result of this study might not be applicable to trials of nonplatinum regimens.
This study has evaluated the relationship between PFS and OS in first-line trials of EOC in the modern era and has demonstrated that the correlation between treatment effects for PFS and OS has weakened. We expect that this relationship will continue to decline with the increasing availability of treatment options, including crossover to the active experimental treatment following disease progression. Therefore, it is increasingly unlikely future trials will demonstrate a relative improvement in treatment effect for OS with first-line therapy. Using OS as primary endpoint will require larger, longer trials in order for first-line treatments to demonstrate an OS benefit. The financial and opportunity costs of such trials make this approach largely infeasible. Other approaches include designing trials so that crossover is not allowed but recognizing that access to other salvage therapies will still occur outside trials. Trials could also be designed with standardized postprogression treatments 49 and meta-analyses of trials with similar class of agents could also be planned prospectively. Furthermore, novel statistical approaches, such as penalized Cox regression 49 that incorporate external estimates of the impact of salvage therapies in order to adjust and preserve the randomized comparisons between different treatment groups could be considered. Finally, a measure of net clinical benefit, such as quality-adjusted PFS, 50 could be considered for treatment recommendations, which would be appropriate even if a relative advantage of OS has not been demonstrated.
Our findings support the fifth GCIG consensus statement, 6 ENREF_5, which advocates the use of PFS as the primary trial endpoint in first-line trials of advanced EOC, but this approach does have limitations. Unlike OS, PFS is more prone to bias, and consequently strict definitions of progression and mandated intervals between imaging studies in trials are essential.1,4 The value of PFS as the primary endpoint continues to be an issue of ongoing debate, and PFS should be supported and underpinned by additional endpoints, such as patient-reported outcomes, time to second disease progression (PFS2), and time to first and second subsequent treatments.1,41,51,52 Alternatively endpoints such as quality-adjusted PFS,50,53 which represent a measure of net clinical benefit, could be used as primary endpoints and for clinical decision making and regulatory approval. It is also important to demonstrate no OS detriment if PFS is used as the primary endpoint.
In conclusion, the relative treatment effects for PFS and OS are moderately correlated in first-line trials using platinum-based chemotherapy for advanced EOC. This relationship has weakened with time and increasing availability of effective salvage therapies.
Supplemental Material
example_search_strategy_SJOQUIST_forTAM_-_supplementary_file_S1 – Supplemental material for Progression-free survival as a surrogate endpoint for overall survival in modern ovarian cancer trials: a meta-analysis
Supplemental material, example_search_strategy_SJOQUIST_forTAM_-_supplementary_file_S1 for Progression-free survival as a surrogate endpoint for overall survival in modern ovarian cancer trials: a meta-analysis by Katrin M. Sjoquist, Sarah J. Lord, Michael L. Friedlander, Robert John Simes, Ian C. Marschner and Chee Khoon Lee in Therapeutic Advances in Medical Oncology
Footnotes
Acknowledgements
The authors thank Rhana Pike, from the NHMRC Clinical Trials Centre, who assisted with the manuscript.
Funding
This work was supported in part by NHMRC Program grant 1037786.
Conflict of interest statement
The authors declare that there is no conflict of interest.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
