Abstract

“Looking for markers in specimens collected at the time of diagnosis may present us with signals that are more associated with consequence than cause and, hence, are destined to alert us only to late-stage disease.”
Over the last decade, early detection of ovarian cancer has achieved a prominent position in the cancer research agenda. The argument that early detection may be one of the most practical ways to reduce cancer morbidity and mortality in the near term is compelling – local stage disease is associated with long-term survival while advanced disease has a poor prognosis [1,101]. This strongly suggests that finding and removing tumors when still confined to the ovary should confer substantial improvement in survival.
Developing an ovarian cancer screening program has proven particularly challenging because the screening test must identify disease early with exceptionally high specificity. Why the emphasis on specificity? A definitive diagnosis requires abdominal surgery so false-positives have a considerable negative impact on women. In addition, because ovarian cancer is relatively rare, the number of women affected by even a small false-positive rate can overwhelm the number who might benefit. Roughly four in every 10,000 postmenopausal women are diagnosed with ovarian cancer annually. With even a 1% false-positive rate, screening 10,000 postmenopausal women would subject 100 women without cancer to the anxiety, risks, morbidity and cost of a false-positive test and surgery for the benefit of four women. The minimum acceptable positive predictive value is generally considered to be 10%.
Recent research has focused on developing a blood-based biomarker screen to be used either alone or in combination with transvaginal sonography (TVS). CA125, the first ovarian cancer biomarker identified [2], is the longstanding benchmark in this effort. CA125 is elevated in approximately 80% of ovarian cancer cases but only 50% of early-stage tumors [3]. The role of CA125 in early detection of ovarian cancer is still not fully established. In case–control studies conducted in large biorepositories, elevations in mean CA125 concentrations have been found 3– 5 years prior to clinical diagnosis [4,5], but because of the high variability in levels among women without disease, these differences in means do not give rise to a screening threshold that can distinguish women with and without cancer with sufficient accuracy.
Recent advances in molecular technologies have provided new avenues for large-scale biomarker discovery. The burgeoning fields of genomics, proteomics, metabonomics and other ‘omic’ assays have offered opportunities and hope that we can discern a cancer signature in biospecimens [6–10]. These and more traditional efforts have identified numerous candidate biomarkers that appear to distinguish women with and without cancer.
“…because ovarian cancer is relatively rare, the number of women affected by even a small false-positive rate can overwhelm the number who might benefit.”
However, some reports from these early biomarker discoveries have been overinterpreted, primarily when study design limitations are overlooked. High-dimensional studies sifting through thousands of biomarkers produce a large number of ‘statistically significant’ results, many of which may be chance findings. This problem occurs less frequently now that multiple testing issues and the necessary corrective steps are better known but the literature likely contains many false discoveries.
More fundamental limitations are attributable to the cross-sectional design and biological samples used to conduct these studies. Typically, these assays are performed in specimens collected at the time of clinical diagnosis in women with cancer and compared to similar samples from women without cancer. For ovarian cancer, tissue or blood is collected during surgery from women with symptomatic and predominantly late-stage disease. While a valid cancer signature may be found in these specimens, it is not at all clear which, if any, of these elements would be present at earlier stages. Some studies address this concern by specifically studying specimens from early-stage cases [11–13]. Clinically detected early-stage cancers are generally not histologically representative of those tumors destined to be diagnosed in late-stage [1], however, so enriching the study population with early-stage cancers may not help.
“…an effective screening test will need to detect cancers at least 1 year before clinical presentation.”
Such studies are also constrained by the difficulty in obtaining completely comparable specimens from women without disease. For tissue-based assays, the comparison group is made up of women undergoing surgery primarily for other nonmalignant gynecologic conditions – samples that are not representative of ‘normal’ women at risk of ovarian cancer. For blood-based discovery work, identifying a more generalizable comparison group is easier, but the challenge comes in controlling aspects of blood collection and processing that may differ from the cancer group (e.g., exposure to anesthesia, fasting status and processing times). For novel markers, one cannot know in advance which, if any, of these factors will confound the contrasts and potentially produce misleading results such as those recently noted for prolactin [14,15].
Despite these methodological challenges, some novel biomarkers have been identified and validated in cross-sectional studies. To date, the most promising candidate is human epididymis protein 4 (HE4), a stable4-disulfide core protein associated with the WFDC2 gene that is overexpressed in ovarian cancer, particularly serous and endometrioid histologies [16]. Serum HE4 concentrations have been shown to be elevated in ovarian cancer cases in multiple studies, but its classification performance neither surpasses nor substantially improves upon that of CA125 alone [17,18]. Validation studies of most other single biomarkers show even more modest discriminatory power [19].
Cross-sectional validation studies such as those described above do not allow estimation of lead time – the interval between when a tumor could have been detected by a marker and the actual time of clinical diagnosis. Few biomarkers have been studied in prediagnostic specimens because of the difficulty in obtaining these specimens. A recent report on six biomarkers in 34 cases is disappointing [5]. Subtle changes in CA125, HE4 and mesothelin levels began approximately 3 years prior to diagnosis but did not reach a detectable level until within 1 year of diagnosis. The strongest signal in these data again came from CA125 and suggests that a CA125-based screening program would produce an average lead time of less than 1 year.
Whether a few months lead time is sufficient to impact survival is an open question. A thoughtful modeling exercise suggests that tumors may be in late-stage as much as 1 year before diagnosis [20]. If so, to achieve a shift in staging, an effective screening test will need to detect cancers at least 1 year before clinical presentation.
Statistical approaches to improve the performance of a biomarker screen have also been offered. One idea is to use a panel of markers, combining them with various statistical approaches [16,19,21–23]. These methods have the potential to optimize the diagnostic performance of available biomarkers, but if the algorithms are developed in studies with the design limitations noted above, they are likely to exhibit similar limitations and potential biases. In addition, the performance of any algorithm developed by an iterative analysis process must be assessed in an independent study population.
“Detecting cancer early moves the date of diagnosis forward but may only prolong life with cancer rather than truly extending it.”
Another statistical approach to improve biomarker performance is to tailor the threshold of positivity to each woman using her history of biomarker measures. The Risk of Ovarian Malignancy (ROM) algorithm [24] and the Parametric Empirical Bayes algorithm [25] are two approaches aimed at detecting when a woman's biomarker levels change from her own normal levels. These approaches are most useful when a biomarker exhibits high variability among women without disease.
Confirmation of early detection is not sufficient to justify screening. Detecting cancer early moves the date of diagnosis forward but may only prolong life with cancer rather than truly extending it. Before launching a large and expensive national screening program, we need a randomized trial of screening with a mortality end point to distinguish between lead time bias and true survival benefit and to help address the potential for overdiagnosis. A Japanese randomized trial of screening with both CA125 and TVS in 82,487 postmenopausal women gave only mildly encouraging results. Trial results suggest that screening with both modalities identifies a somewhat larger fraction of cancers in early-stage disease, but with a high false-positive rate [26]. Mortality results are not available. There are also two large trials currently ongoing. The US-based Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) trial is testing whether the combination of annual screening for elevations in CA125 and abnormalities detected by ultrasound or pelvic exam will reduce ovarian cancer mortality in postmenopausal women. Among the 34,521 women in the screened arm, the reported positive predictive power was less than 1.3%, and 72% of screen-detected cancers were late-stage [27]. The UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) randomized 202,638 postmenopausal women to one of three-arms: TVS alone; a multimodal arm using CA125 in the ROM algorithm to identify women for TVS; and no screening. Reports from the UKCTOCS prevalence screen are more encouraging with excellent sensitivity, specifity and positive predictive value for the initial screen in the multimodality arm (89.4%, 99.8% and 43.3%, respectively) [28]. Mortality results of both ongoing trials are expected within the next few years.
“The incidence of ovarian cancer is low enough and the impact of false-positive screens is serious enough to require nearly perfect specificity for screening women of average risk.”
What is the prognosis for biomarker screening? Despite the considerable effort and resources devoted to finding novel ovarian cancer biomarkers, none can yet compete with CA125. Furthermore, of those evaluated, none add substantially to the discriminatory power of CA125 alone. While disappointing, these findings should be used to challenge the paradigm of our biomarker discovery research. Looking for markers in specimens collected at the time of diagnosis may present us with signals that are more associated with consequence than cause and, hence, are destined to alert us only to late-stage disease. A more direct path to early detection biomarkers may be needed, redeploying biomarker discovery efforts to prediagnostic specimens.
In addition, we should not expect the rigorous requirements for a cost-effective screening program to be met by biomarkers alone. The incidence of ovarian cancer is low enough and the impact of false-positive screens is serious enough to require nearly perfect specificity for screening women of average risk. In order to meet this, all available tools – risk-based screening, biomarkers and improved imaging – will be required.
In 2004, the US Preventive Services Task Force recommended against routine ovarian cancer screening [102]. None of the more recent findings overide this guideline. We must await the results of the two large-scale trials. Whether the results are positive or not, these trials will yield vital data to guide the next steps in biomarker research and subsequent guidelines for practice.
Footnotes
Funding was provided by National Cancer Institute (Specialized Program of Research Excellence grant P50 CA83636). The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
