Abstract
The National Institute for Health and Clinical Excellence (NICE) guidelines have sparked hot debate regarding the role of carbohydrate antigen 125 (CA-125) for ovarian cancer (OC) detection. Recent literature and evidence calls into question the use of CA-125 in diagnostic algorithms, given the better performance of human epididymis protein 4 (HE4) vs. CA-125 to rule OC. This is an important consideration since combined measurements are not cost-effective. The quality of this evidence is, however, threatened by important gaps related to study design, enrolled populations and analytical issues. For instance, despite the clinical need to prioritize the evaluation of biomarker performance in early stage tumours, sound evidence on this cannot be provided. In addition, results should be cautiously interpreted due to wide differences in the type of employed assays and in adopted diagnostic thresholds for HE4. Comparability among results obtained by different commercially available HE4 assays, together with an objective establishment of analytical goals is essential for the optimal clinical application of this marker.
Introduction
Updated ovarian cancer (OC) guidelines by the National Institute for Health and Clinical Excellence (NICE) were meant to improve the early recognition and management of OC and to raise the awareness of survival trends after OC diagnosis. 1 Despite advancements in chemotherapy strategies, over the past decade the survival rate has remained almost unchanged in several countries. Different factors have been investigated and further recognized to influence the outcome of OC. 2 Aggressiveness, presence of co-morbidity, disease clinical course and, in particular, stage at tumour diagnosis have been identified as important prognostic factors.3,4 Furthermore, threats in quality, accessibility and availability of diagnostic tests have emerged to significantly delay diagnosis and consequently to lower survival rates. Therefore, new clinical recommendations appeared necessary to better exploit available diagnostic tools in primary and secondary care settings. In particular, carbohydrate antigen 125 (CA-125) has been recommended as appropriate serum biomarker to identify OC in symptomatic women presenting to general practitioner (GP) in primary care and in suspicious cases referred to secondary care, even gaining a leading position in the decisional pathway. 1 In primary care, a CA-125 value rising over the diagnostic threshold should be followed by an ultrasound scan (US) of the abdomen and pelvis. In secondary care, a moderate/high score of the risk of malignancy index (RMI), including US and CA-125 results in addition to the pre/postmenopausal status, should signal referral to a specialist multidisciplinary team. 1
The implementation of NICE guidelines has, however, sparked wide debate and discussion regarding the sequence of diagnostic tests in the recommended algorithm and, in particular, on the critical role of CA-125 determination.5–8 Several authors disagree with the NICE recommendation assigning a primary role to CA-125 absolute concentrations and dichotomous interpretation of results (≥ or <35 kU/L) in the decisional algorithms.5,7,8 A large body of literature has emphasized the poor sensitivity of CA-125, failing to detect more than 50% of OC at International Federation of Gynaecology and Obstetrics (FIGO) stage I and being increased only in <80% of serous and endometrioid OCs, representing the most common and aggressive histological tumour subtypes.9,10 On the other hand, the modest specificity further limits its effectiveness for rule in purposes. 10 Over past 20 years, pitfalls of CA-125 application in all settings have been widely shared by Guideline Development Groups (GDGs) and GPs. As a matter of fact, several groups of gynaecologists and gynaecological oncologists as well as of biochemists have claimed a revision of the role of CA-125 in the decisional algorithm,5,7,8 despite recent evidence on the clinical value of newer biomarkers proposed to aid in the OC diagnosis. 11
Challenging evidence
To judge whether the CA-125 determination fits for purpose, the level and strength of evidence supporting NICE recommendations should be taken into consideration. The GDG recognized that the recommendations regarding primary care rely on indirect evidence concerning CA-125 diagnostic performances in secondary care settings. 1 However, screened ‘high-risk’ populations in primary care significantly differ in case mix as well as in the OC prevalence. Large randomized screening trials, using different study designs and diagnostic algorithms including CA-125 measurement, have demonstrated a wide variability in OC prevalence within similar settings.12–14 Consequently, the capability of CA-125 to predict OC in symptomatic women presenting to GP might reasonably result in an overestimation. Theoretically, an average OC prevalence of ∼0.04% in the postmenopausal population consulting in primary care 1 constrains to set as desirable for a diagnostic test a specificity of >99.6% associated to a sensitivity of >75% for achieving a positive predictive value of 10%, which is considered cost-effective (i.e., 10 operations for each case of OC detected). 15 Clearly, these goals of diagnostic performance cannot currently be met by any proposed decisional algorithm, even if by addressing screening to more selected individuals (i.e., women in postmenopausal status and with a strong family history or inherited predisposition), the pre-test probability could be increased. 16
On the other hand, randomized clinical trials (RCTs) have preliminarily reported uncomfortable results on survival benefit of OC screening programmes.12–14 The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening RCT showed that screening healthy women by CA-125 and transvaginal US is unable to lower mortality when compared with usual care, as 72% of detected OC cases were at late stages (FIGO stages III and IV). Furthermore, this result was associated to an increase of invasive medical procedures and related costs. 12 Preliminary data from the UK Collaborative Trial of Ovarian Cancer Screening (UKTOCS) have shown that screening postmenopausal women, deemed at intermediate risk, by an algorithm including the evaluation of serial CA-125 measurements plus transvaginal US might be more cost-effective than PLCO strategy. 13 The UKTOCS protocol, by employing more stringent selection criteria, the longitudinal evaluation of diagnostic tests and the referral to a gynaecological oncologist of suspicious cases, has resulted in a higher proportion of detected early stage tumours with respect to PLCO (48% vs. 22%). However, in the UKTOCS as well as in the UK Familial Ovarian Cancer Screening Study (UKFOCCS), 14 employing a similar diagnostic algorithm on women at high risk for OC, there is so far no evidence of a likely mortality benefit. Expectations on symptomatic women in primary care cannot be higher than in screening asymptomatic population. Although controversial opinions,17–19 the high proportion (∼75%) of diagnosis made only after metastases in asymptomatic patients and the lack of disease specific symptoms could reasonably threaten any effort to decrease mortality. 20
In contrast to primary care, the evidence supporting the role of serum CA-125 in secondary care to predict OC in woman with suspected malignancy has been considered strong by the GDG. Notably, this assumption mainly relies on the fact that CA-125 increase in this setting may occur in 90% of patients at an advanced OC stage, but disregards similar increases occurring in several benign conditions or other malignancies. 21
Considering previous remarks altogether, the discouraging results from a preliminary audit on NICE guideline application is not surprising. It has been recently observed that guideline implementation has resulted on an increasing workload in clinical laboratories without reporting any significant diagnostic improvement and raising quandaries about the management of patients with increased CA-125 concentrations and no evidence of OC by imaging. 22 On the other hand, the possibility of an isolated finding of CA-125 concentrations exceeding the threshold or of a CA-125 increasing trend not associated to clinical evidence is rather consolidated. 23 As a consequence, additional tumour markers have been advocated to improve diagnostic specificity and lead time of the single biomarker for an effective rule OC in.24,25 In particular, serum human epididymis protein 4 (HE4) has successfully emerged between a wide spectrum of newly proposed markers11,24 and, according to the preliminary evidence, it was suggested by NICE to be co-detected with CA-125. 1 However, after the release of NICE guidelines, the number of studies on head-to-head comparison between diagnostic performances of CA-125 and HE4 has had a sudden increase (passing from 6 to >100). This was consequent to the spread of HE4 assays on automated laboratory platforms and to the development of novel risk algorithms for OC diagnosis including both markers. In a recent systematic review (SR) and meta-analysis, we reported rather promising evidence in aid of HE4 with respect to CA-125. 11 Synthesizing the key message, HE4 was shown to significantly outperform CA-125 in identifying OC (positive likelihood ratio, 13.0 vs. 4.2), assigning to HE4 a more relevant capability for ruling OC in. Interestingly, the SR showed that in 60% of studies meeting selection criteria HE4 had a diagnostic sensitivity higher or at least comparable to CA-125. 11 This agrees with the reported higher percentage of HE4 vs. CA-125 positive results in serous (93% vs. 80%) and in endometrioid (100% vs. 75%) OCs, despite HE4, conversely to CA-125, is not expressed in mucinous OC type. 24 Notably, the mucinous type has, however, a low incidence (5–10%), whereas serous and endometrioid subtypes represent ∼60–75% of epithelial OCs, serous being the most aggressive one. 26 In this framework, it is important to remember that 10% of OCs is not of epithelial origin, consequently not detectable by either CA-125 or HE4, and the use of additional biomarkers should possibly be addressed in this context. 24
Despite the highest rule in capability strongly supports the replacement of CA-125 with HE4, discouraging their combined measurements, no change has been so far proposed to the NICE diagnostic pathway. In particular, the cautionary principle to retain CA-125 as first line marker adding HE4 to improve the rule in capability appears without evidence and clearly contrasts with cost-effectiveness outcomes. To this regard, it is relevant to account for results of a recently published study, evaluating the feasibility of using biomarkers for early detection of OC (i.e., lowest tumour size) by a mathematical modelling. 27 The statistical simulation showed that 1-mm detection goal for tumour size is achievable by finding a biomarker with highest specificity or highest analytical sensitivity. 27 This result, in addition to results from our SR, encourages the use of HE4 as stand alone test for addressing future clinical research on the early recognition of OC, which is currently very inadequate. On the other hand, this suggests the need to thoroughly assess assay performances and define quality requirements before consolidating any clinical use of HE4. Accordingly, the National Academy of Clinical Biochemistry and the European Group for Tumor Markers make general recommendations about methodological issues concerning reagent stability, assay interferences, internal quality control and external quality assessment, adoption of appropriate reference intervals and protocols for changing assays.28,29
Are we ready for HE4 clinical introduction?
Despite preliminary evidence on the HE4 capability to identify OC, further important issues should be overcome before its introduction in clinical practice in order to considerably increase effectiveness in different diagnostic frameworks (screening context, primary/secondary care). According to results from our SR, 11 some gaps are related to the quality of the research, which needs to be improved and tailored on more specific clinical questions (i.e., evaluation of marker performance in early tumour stage and in postmenopausal women only). Other gaps concern methodological issues as well as HE4 result interpretation and reporting.
Quality of research and clinical questions
Studies evaluating the diagnostic performance of serum human epididymis protein 4 in the detection of ovarian cancer at an early stage (International Federation of Gynaecology and Obstetrics stages I/II).
OC, ovarian cancer; BCG, benign gynaecological disease; Sens, sensitivity; CI, confidence interval; Spec, specificity; LR+, positive likelihood ratio; LR–, negative likelihood ratio.
Original cut-off in pg/L, converted in pmol/L by using a multiplication factor of 0.004.
Studies evaluating the diagnostic performance of serum human epididymis protein 4 in the detection of ovarian cancer in postmenopausal women.
OC, ovarian cancer; BCG, benign gynaecological disease; Sens, sensitivity; CI, confidence interval; Spec, specificity; LR+, positive likelihood ratio; LR−, negative likelihood ratio.
Original cut-off in pg/L, converted in pmol/L by using a multiplication factor of 0.004.
Assessing analytical performance and result comparability
Meeting with the need for more guidance about the appropriate application and interpretation of HE4 measurement, fundamental duties should be committed to biochemists before its clinical introduction. First, assessing the degree of comparability among different HE4 commercially available assays through correlation studies appears to be mandatory in the traceability era.42,43 As a matter of fact, the diagnostic improvement reported in literature by the HE4 introduction has recently resulted in a rapid spread of marker measurement on different automated immunoassay platforms. For all marketed methods, validation data including information on inter-method bias have to be produced well before their clinical implementation. The strong association between diagnostic accuracy and good comparability of test results among different assays is well known. 44 Working on result harmonization fits for guideline purposes, as it allows an effective application of recommendations advocating adoption of common reference intervals and decision limits. 45 Furthermore, the lack of data on assay comparability does not permit to adjust marker results from studies employing different assays, as the data pooling might be censored, with a consequent loss of the accumulated clinical experience. For CA-125, a rather good comparability between methods has been shown, even if concerns have been raised on the selection of 35 kU/L as common diagnostic threshold for all assays. 46
Actually, to assess whether the performance of a new assay is acceptable, analytical goals should be defined. Notably, an unacceptable analytic bias for a new method may produce a systemic shift of marker results causing changes in the clinical decision making and even increasing health costs. 47 The assessment of biological variation of the marker may offer an objective and practical approach to derive analytical goals; however, no data are currently available for HE4.
Selecting adequate interpretation strategies
The optimization of HE4 decision thresholds is another main outcome needed in order to improve clinical decision making. As emerged from SR, HE4 cut-offs differed across available studies even whether using the same assay and only some authors considered to adopt different limits in pre- and postmenopausal status. 11 No consensus is currently achieved in this framework and there are still net discrepancies between reference intervals recommended by various manufactures in their package inserts, also for those assays employing the same antibodies and differing only for the signal detection technology. In particular, despite compelling data on the dependence of HE concentrations upon age and, in particular, on the two-fold increase in post- vs. premenopausal status,35,48 specific recommendations for differentiating threshold levels have not been made.
In general, the concept of a fixed marker threshold level for clinical decision making in OC diagnosis and relapse monitoring is waning. The failure for detecting asymptomatic early-stage OC and early recurrence by resorting to CA-125 threshold levels has been widely emphasized.49,50 On the other hand, in the screening setting, the evaluation of CA-125 changing pattern from serial marker measurements was found to significantly increase both diagnostic sensitivity and specificity. 13 In particular, a preclinical rising trend of CA-125 concentrations was strongly associated to malignancy, whereas static or falling patterns indicated a disease-free steady state. 49 A warning was, however, issued from UK Medical Research Council/European Organisation for Research and Treatment of Cancer on the monitoring of OC relapse by CA-125 changes in patients following completion of primary treatment. In this framework, a rising CA-125 trajectory was not shown to reflect resistance to treatment, but the administration of platinum-based chemotherapy. Furthermore, deflections in the rising trends were associated to the activity of hormonal treatments (i.e., tamoxifen). 50
Considering previous remarks, several efforts are currently addressed to search and validate diagnostic algorithms incorporating CA-125 and HE4 absolute concentrations or changes, together with individual data (age, US parameters), for screening or primary/secondary care.51,52 Since HE4 introduction, some manufacturers recommended the Risk of Ovarian Malignancy Algorithm (ROMA) to triage patients with pelvic mass. 52 Although ROMA™ has received clearance from the US Food and Drug Administration (FDA), several concerns raise on its clinical validation facing on controversial results about its diagnostic accuracy when compared to CA-125 and HE4 alone.41,53,54 A recent SR has reported ROMA as promising predictor of epithelial OC to replace CA-125, even showing higher sensitivity than HE4 stand-alone test. 55 However, in our opinion, a more cautionary message should be shared accounting for the several limitations of the considered individual studies, suggesting an underestimation of HE4 diagnostic performance. Notably, most of the recruited studies employed an unique decisional threshold for detecting HE4 positivity and only one used specific limits for pre- and postmenopausal women. 39 The above-reported promising association between marker concentration trend and diagnostic anticipation has encouraged the development of new and more complex algorithms considering changes over time in biomarker concentrations. Interestingly, algorithms based on longitudinal interpretation of the biomarker course appear to overcome the limits of those incorporating dichotomous results according to the single threshold rule, then improving the early detection of malignancy. 51
Unbiased interpretation of marker changes
As mentioned above, different approaches (i.e., fixed cut-offs, algorithms, evaluation of marker trends) have been tested to classify patients according to CA-125 and HE4 results in several settings, such as screening symptom-free or symptomatic populations, triaging women with pelvic mass, staging patients with proven OC, monitoring response to therapy and detecting OC relapses. To date, the evaluation of marker changes, even integrated in longitudinal algorithms seems to be the more promising criterion to increase the diagnostic accuracy. However, by interpreting serial marker measurements some pitfalls should be taken into consideration in establishing generalized diagnostic rules. First, a great heterogeneity of tumour marker patterns (in particular for CA-125) has been observed among women in several frameworks and this theoretically underscores the need for incorporating individual-specific decision criteria in diagnostic protocols. 17 In addition, a great body of literature has shown that dealing with moderate variations in tumour marker concentrations in serum, as those expected in screening and monitoring of relapse contexts, is quite difficult, as changes in serial results are due not only to disease progression/remission. Pre-analytical, analytical and within-subject biologic sources of variation represent a variable and relevant quote of what constitutes a significant difference between serial results. 56 All these sources of variability are accounted in the estimation of the reference change value (RCV). 57 For the most popular tumour markers, including CA-125, the rationale for the adoption of RCV to evaluate marker changes from serial measurements, seeking for a more sensitive and specific mean to identify early disease, is strong. This relies on the availability of data on biological variability. 58
For CA-125, the average intra-individual variation (24.7% as CV) is considerable, but lower than the inter-individual one (54.6% as CV). 58 This implies an index of individuality (II) (i.e., the ratio between intra- and inter-individual variation) 59 of ∼0.45, which results in a very limited utility of reference intervals for marker interpretation, in agreement with the clinical evidence. Accordingly, the hypothesis to reliably identify clinically significant CA-125 increases by evaluating serial marker changes using RCV has a strong biologic background for interpreting CA-125 concentrations. The biologic variability data provide a well-founded explanation on the clinical failure of considering fixed cut-off levels for CA-125. According to the approach suggested by Fraser and Petersen, 60 the number of specimens that should be collected to ensure that the mean marker result is within ±10% of the individual’s homeostatic set point can be obtained according to the following statistics: 1.962 (Analytical CV2 + Within-subject biologic CV2)/100. Using previously published imprecision data, 61 for CA-125 it can be estimated that each patient should theoretically undergone up to 24 marker measurements to achieve a sufficiently accurate estimate of the CA-125 individual set point!
It is evident that, thinking on a possible replacing of CA-125 with HE4, the information on the HE4 biologic variation is mandatory to understand its biological behaviour and, consequently, how to deal with marker concentrations and changes. For instance, according to possible differences in HE4 biologic variability due to hormonal status, we could expect to differently manage marker results and changes in pre- and postmenopausal women. Although this option has not been practically considered for CA-125, the available studies on the estimation of biological variability for this marker have strongly supported this hypothesis.23,62,63
In evaluating literature on biologic variability, concerns may arise on the study results often supporting the need for revaluating available information.64,65 Basically, a well-defined study protocol to preclude marker homeostatic disturbances as well as to minimize pre-analytical sources of variability is recommended. In this framework, the number, the duration and the frequency of sampling collection, as well as the definition of individual’s steady-state (e.g., menstrual cycle in premenopausal women) may unpredictably affect the estimation of biologic variability. In particular, greatest care should be taken in assuring the same condition for all enrolled subjects (i.e., blood drawn at half of menstrual cycle) and in defining strict selection criteria to avoid spurious marker increases (e.g., by smoke, incidence of menstruation, etc.).66–69 Unfortunately, the few available studies on CA-125 biologic variability disregarded these critical issues, collecting serial samples at a variable time distance for several individuals,23,62,63 blending marker results from subjects in pre- and postmenopausal status,62,63 including samples drawn during menstruation,23,62,63 or drawn from patients with potential interfering conditions (i.e., smokers, carrying other malignancies).23,62,63 In only one study, 23 all samples were analyzed as recommended in the same analytical batch. 56 Therefore, it is evident that CA-125 biologic variability data need to be revised and updated using a more rigorous experimental approach.
Concluding remarks
Faced with the low incidence of disease and the lack of reported survival benefit, stringent performances are required for the clinical introduction of effective OC biomarkers. In addition, the histological heterogeneity of OC necessitates focus on the most life threatening and aggressive subtype that is high-grade serous tumour. Although preferentially expressed by this OC subtype, CA-125, employed as stand-alone marker, cannot detect the disease at an early manageable stage and its rule-in capability is further hindered by low specificity. The updated evidence confirms better diagnostic performance of HE4 compared to CA-125, 11 even if this is probably not enough to definitively incorporate HE4 in clinical practice guidelines. Further steps are required in order to prioritize clinically relevant research questions (i.e., biomarker performances in early OC identification) and to provide generalizable study results. Accordingly, we have remarked the need to improve the study design as well as to thoroughly investigate emerged methodological issues to threaten strength and level of scientific evidence.
The rule of thumb for introducing a new biomarker is that it could effectively increase OC discrimination and add information to existing biomarkers. HE4 as stand-alone test is likely to accomplish a better identification of the malignancy, but, due to the similar expression profile, its addition to CA125 does not significantly increase the OC detection rate more than a few percent. 53 Therefore, it should be clear that the HE4 and CA-125 combination is unfavourable in a cost-benefit perspective. The replacement of CA-125 with HE4 appears advisable, but more focus is needed on the evaluation of assay performances and comparability and on the definition of appropriate interpretation strategies. In favour of HE4 with respect to CA-125, the complete characterization of the protein structure is an overriding prerequisite to overcome possible methodological limitations (e.g., stability of antibodies, analytical sensitivity, lack of assay harmonization), with a definitive benefit on the diagnostic performance. On the other hand, data on the biologic variability of HE4 should be built to better disclose which interpretative criteria may aid in disease identification. In this process, the laboratory assumes the role of ‘gatekeeper’ to prevent erroneous results and over-requesting. The deletion of obsolete biomarkers together with the introduction of new ones is known to exert a relevant impact both on laboratory and on total health-care costs, resulting fundamental to clinical governance. 70 In this perspective, the proposal to replace CA-125 with HE4 should be considered. Costs may play an important role in this framework involving several stakeholders (clinicians, GPs, laboratorians, hospital administrators, patients) in the delivery, payment and receipt of health care. 71 Often the introduction of a new biomarker has been reported to not only change clinical practice, but also the reimbursement rate. However, sound clinical evidence should exert a relevant impact on reimbursement models, which for novel biomarkers has been proposed to shift from a cost-based to a value-based and evidence-based approach. 72
Footnotes
Acknowledgements
We would like to thank Federica Braga (Cattedra di Biochimica Clinica e Biologia Molecolare Clinica, Dipartimento di Scienze Biomediche e Cliniche ‘Luigi Sacco’, Università degli Studi, Milano, Italy) and Patrizia La Musta (Campus Cascina Rosa, Fondazione IRCCS Istituto Nazionale Tumori Milano, Italy) for their assistance in retrieving the cited original articles.
Declarations of conflicting interests
None declared.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Ethical approval
Not applicable.
Guarantor
SF and MP.
Contributorship
MP conceived the study and revised the manuscript. SF wrote the first draft of the manuscript and researched literature. SF and MP edited and approved the final version of the manuscript.
