Is serum human epididymis protein 4 ready for prime time?

Abstract

The National Institute for Health and Clinical Excellence (NICE) guidelines have sparked hot debate regarding the role of carbohydrate antigen 125 (CA-125) for ovarian cancer (OC) detection. Recent literature and evidence calls into question the use of CA-125 in diagnostic algorithms, given the better performance of human epididymis protein 4 (HE4) vs. CA-125 to rule OC. This is an important consideration since combined measurements are not cost-effective. The quality of this evidence is, however, threatened by important gaps related to study design, enrolled populations and analytical issues. For instance, despite the clinical need to prioritize the evaluation of biomarker performance in early stage tumours, sound evidence on this cannot be provided. In addition, results should be cautiously interpreted due to wide differences in the type of employed assays and in adopted diagnostic thresholds for HE4. Comparability among results obtained by different commercially available HE4 assays, together with an objective establishment of analytical goals is essential for the optimal clinical application of this marker.

Keywords

Commutability reference change value biological variability ovarian cancer biomarker

Introduction

Updated ovarian cancer (OC) guidelines by the National Institute for Health and Clinical Excellence (NICE) were meant to improve the early recognition and management of OC and to raise the awareness of survival trends after OC diagnosis.¹ Despite advancements in chemotherapy strategies, over the past decade the survival rate has remained almost unchanged in several countries. Different factors have been investigated and further recognized to influence the outcome of OC.² Aggressiveness, presence of co-morbidity, disease clinical course and, in particular, stage at tumour diagnosis have been identified as important prognostic factors.^3,4 Furthermore, threats in quality, accessibility and availability of diagnostic tests have emerged to significantly delay diagnosis and consequently to lower survival rates. Therefore, new clinical recommendations appeared necessary to better exploit available diagnostic tools in primary and secondary care settings. In particular, carbohydrate antigen 125 (CA-125) has been recommended as appropriate serum biomarker to identify OC in symptomatic women presenting to general practitioner (GP) in primary care and in suspicious cases referred to secondary care, even gaining a leading position in the decisional pathway.¹ In primary care, a CA-125 value rising over the diagnostic threshold should be followed by an ultrasound scan (US) of the abdomen and pelvis. In secondary care, a moderate/high score of the risk of malignancy index (RMI), including US and CA-125 results in addition to the pre/postmenopausal status, should signal referral to a specialist multidisciplinary team.¹

The implementation of NICE guidelines has, however, sparked wide debate and discussion regarding the sequence of diagnostic tests in the recommended algorithm and, in particular, on the critical role of CA-125 determination.^5–8 Several authors disagree with the NICE recommendation assigning a primary role to CA-125 absolute concentrations and dichotomous interpretation of results (≥ or <35 kU/L) in the decisional algorithms.^5,7,8 A large body of literature has emphasized the poor sensitivity of CA-125, failing to detect more than 50% of OC at International Federation of Gynaecology and Obstetrics (FIGO) stage I and being increased only in <80% of serous and endometrioid OCs, representing the most common and aggressive histological tumour subtypes.^9,10 On the other hand, the modest specificity further limits its effectiveness for rule in purposes.¹⁰ Over past 20 years, pitfalls of CA-125 application in all settings have been widely shared by Guideline Development Groups (GDGs) and GPs. As a matter of fact, several groups of gynaecologists and gynaecological oncologists as well as of biochemists have claimed a revision of the role of CA-125 in the decisional algorithm,^5,7,8 despite recent evidence on the clinical value of newer biomarkers proposed to aid in the OC diagnosis.¹¹

Challenging evidence

To judge whether the CA-125 determination fits for purpose, the level and strength of evidence supporting NICE recommendations should be taken into consideration. The GDG recognized that the recommendations regarding primary care rely on indirect evidence concerning CA-125 diagnostic performances in secondary care settings.¹ However, screened ‘high-risk’ populations in primary care significantly differ in case mix as well as in the OC prevalence. Large randomized screening trials, using different study designs and diagnostic algorithms including CA-125 measurement, have demonstrated a wide variability in OC prevalence within similar settings.^12–14 Consequently, the capability of CA-125 to predict OC in symptomatic women presenting to GP might reasonably result in an overestimation. Theoretically, an average OC prevalence of ∼0.04% in the postmenopausal population consulting in primary care¹ constrains to set as desirable for a diagnostic test a specificity of >99.6% associated to a sensitivity of >75% for achieving a positive predictive value of 10%, which is considered cost-effective (i.e., 10 operations for each case of OC detected).¹⁵ Clearly, these goals of diagnostic performance cannot currently be met by any proposed decisional algorithm, even if by addressing screening to more selected individuals (i.e., women in postmenopausal status and with a strong family history or inherited predisposition), the pre-test probability could be increased.¹⁶

On the other hand, randomized clinical trials (RCTs) have preliminarily reported uncomfortable results on survival benefit of OC screening programmes.^12–14 The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening RCT showed that screening healthy women by CA-125 and transvaginal US is unable to lower mortality when compared with usual care, as 72% of detected OC cases were at late stages (FIGO stages III and IV). Furthermore, this result was associated to an increase of invasive medical procedures and related costs.¹² Preliminary data from the UK Collaborative Trial of Ovarian Cancer Screening (UKTOCS) have shown that screening postmenopausal women, deemed at intermediate risk, by an algorithm including the evaluation of serial CA-125 measurements plus transvaginal US might be more cost-effective than PLCO strategy.¹³ The UKTOCS protocol, by employing more stringent selection criteria, the longitudinal evaluation of diagnostic tests and the referral to a gynaecological oncologist of suspicious cases, has resulted in a higher proportion of detected early stage tumours with respect to PLCO (48% vs. 22%). However, in the UKTOCS as well as in the UK Familial Ovarian Cancer Screening Study (UKFOCCS),¹⁴ employing a similar diagnostic algorithm on women at high risk for OC, there is so far no evidence of a likely mortality benefit. Expectations on symptomatic women in primary care cannot be higher than in screening asymptomatic population. Although controversial opinions,^17–19 the high proportion (∼75%) of diagnosis made only after metastases in asymptomatic patients and the lack of disease specific symptoms could reasonably threaten any effort to decrease mortality.²⁰

In contrast to primary care, the evidence supporting the role of serum CA-125 in secondary care to predict OC in woman with suspected malignancy has been considered strong by the GDG. Notably, this assumption mainly relies on the fact that CA-125 increase in this setting may occur in 90% of patients at an advanced OC stage, but disregards similar increases occurring in several benign conditions or other malignancies.²¹

Considering previous remarks altogether, the discouraging results from a preliminary audit on NICE guideline application is not surprising. It has been recently observed that guideline implementation has resulted on an increasing workload in clinical laboratories without reporting any significant diagnostic improvement and raising quandaries about the management of patients with increased CA-125 concentrations and no evidence of OC by imaging.²² On the other hand, the possibility of an isolated finding of CA-125 concentrations exceeding the threshold or of a CA-125 increasing trend not associated to clinical evidence is rather consolidated.²³ As a consequence, additional tumour markers have been advocated to improve diagnostic specificity and lead time of the single biomarker for an effective rule OC in.^24,25 In particular, serum human epididymis protein 4 (HE4) has successfully emerged between a wide spectrum of newly proposed markers^11,24 and, according to the preliminary evidence, it was suggested by NICE to be co-detected with CA-125.¹ However, after the release of NICE guidelines, the number of studies on head-to-head comparison between diagnostic performances of CA-125 and HE4 has had a sudden increase (passing from 6 to >100). This was consequent to the spread of HE4 assays on automated laboratory platforms and to the development of novel risk algorithms for OC diagnosis including both markers. In a recent systematic review (SR) and meta-analysis, we reported rather promising evidence in aid of HE4 with respect to CA-125.¹¹ Synthesizing the key message, HE4 was shown to significantly outperform CA-125 in identifying OC (positive likelihood ratio, 13.0 vs. 4.2), assigning to HE4 a more relevant capability for ruling OC in. Interestingly, the SR showed that in 60% of studies meeting selection criteria HE4 had a diagnostic sensitivity higher or at least comparable to CA-125.¹¹ This agrees with the reported higher percentage of HE4 vs. CA-125 positive results in serous (93% vs. 80%) and in endometrioid (100% vs. 75%) OCs, despite HE4, conversely to CA-125, is not expressed in mucinous OC type.²⁴ Notably, the mucinous type has, however, a low incidence (5–10%), whereas serous and endometrioid subtypes represent ∼60–75% of epithelial OCs, serous being the most aggressive one.²⁶ In this framework, it is important to remember that 10% of OCs is not of epithelial origin, consequently not detectable by either CA-125 or HE4, and the use of additional biomarkers should possibly be addressed in this context.²⁴

Despite the highest rule in capability strongly supports the replacement of CA-125 with HE4, discouraging their combined measurements, no change has been so far proposed to the NICE diagnostic pathway. In particular, the cautionary principle to retain CA-125 as first line marker adding HE4 to improve the rule in capability appears without evidence and clearly contrasts with cost-effectiveness outcomes. To this regard, it is relevant to account for results of a recently published study, evaluating the feasibility of using biomarkers for early detection of OC (i.e., lowest tumour size) by a mathematical modelling.²⁷ The statistical simulation showed that 1-mm detection goal for tumour size is achievable by finding a biomarker with highest specificity or highest analytical sensitivity.²⁷ This result, in addition to results from our SR, encourages the use of HE4 as stand alone test for addressing future clinical research on the early recognition of OC, which is currently very inadequate. On the other hand, this suggests the need to thoroughly assess assay performances and define quality requirements before consolidating any clinical use of HE4. Accordingly, the National Academy of Clinical Biochemistry and the European Group for Tumor Markers make general recommendations about methodological issues concerning reagent stability, assay interferences, internal quality control and external quality assessment, adoption of appropriate reference intervals and protocols for changing assays.^28,29

Are we ready for HE4 clinical introduction?

Despite preliminary evidence on the HE4 capability to identify OC, further important issues should be overcome before its introduction in clinical practice in order to considerably increase effectiveness in different diagnostic frameworks (screening context, primary/secondary care). According to results from our SR,¹¹ some gaps are related to the quality of the research, which needs to be improved and tailored on more specific clinical questions (i.e., evaluation of marker performance in early tumour stage and in postmenopausal women only). Other gaps concern methodological issues as well as HE4 result interpretation and reporting.

Quality of research and clinical questions

As differences in the study design, clinical source of patients and adoption of eligibility criteria imply different pre-test probabilities, high variability in HE4 diagnostic performance estimates across individual studies have been evidenced and a possible overestimation of diagnostic power has been supposed.¹¹ A further risk of bias has been associated to the uneven distribution of enrolled patients in pre- and postmenopausal status, of OC histological subtypes and of FIGO stages of the tumour.¹¹ Under these conditions, well-designed prospective RCTs are required to reinforce the preliminary evidence. However, this is not enough whether research findings do not meet clinical expectations for an early OC detection. Thus, the clinical question is rather clear: can we significantly lower the rate of late OC diagnosis?³⁰ To this regard, some concerns for the applicability of results should emerge in agreement with the cautionary conclusion of the SR.¹¹ Two main drawbacks should be overcome by future research studies. The first one is to specifically provide HE4 diagnostic performance in the detection of OC at early stages, considering that serum CA-125 is reported to raise in ∼50% of FIGO stage I and ∼90% of stage II OCs.³¹ However, in available studies,^32–34 the number of subjects at early stage tumour is quite low and this, together with the heterogeneity of study design and differences in employed assays and thresholds, does not permit to achieve an accurate estimate of HE4 diagnostic performance (Table 1).

Table 1.

Studies evaluating the diagnostic performance of serum human epididymis protein 4 in the detection of ovarian cancer at an early stage (International Federation of Gynaecology and Obstetrics stages I/II).

Authors (reference)	No. of patients (early OC vs. BCG)	Assay	Threshold	Sens (95% CI)	Spec (95% CI)	LR+ (95% CI)	LR– (95% CI)
Escudero et al.³²	24 vs. 289	Abbott Architect	140 pmol/L	0.58 (0.37–0.78)	0.99 (0.96–1.00)	42.2 (15.0–118.1)	0.42 (0.26–0.68)
Abdel-Azeez et al.³³	13 vs. 24	Fujirebio manual ELISA	72 pmol/L	0.85 (0.55–0.98)	0.75 (0.53–0.90)	3.4 (1.6–7.0)	0.21 (0.06–0.75)
Nolen et al.³⁴	63 vs. 158	Luminex multiplexed ELISA	*73.7 pmol/L	0.84 (0.73–0.92)	0.70 (0.62–0.77)	2.8 (2.1–3.7)	0.23 (0.13–0.40)

OC, ovarian cancer; BCG, benign gynaecological disease; Sens, sensitivity; CI, confidence interval; Spec, specificity; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Original cut-off in pg/L, converted in pmol/L by using a multiplication factor of 0.004.

The second issue is related to the HE4 diagnostic ability in the sub-group of postmenopausal women. It is noteworthy that the pre-test probability of OC as well as HE4 concentrations widely differ in pre- and postmenopausal populations.³⁵ Actually, a number of publications reported HE4 results in postmenopausal women, but most of them did not meet the selection criteria documented in the SR.¹¹ At the time of preparation of the present manuscript, we were however able to retrieve three further recent studies evaluating HE4 in postmenopausal women fulfilling SR selection criteria^36–38 to be added to those previously selected in the SR.^34,39–41 Table 2 lists the study characteristics. Once again, results of biomarker performance should be cautionary interpreted for differences in the type of employed assays, adopted diagnostic thresholds and in the OC prevalence across the studies that do not permit any data pooling.

Table 2.

Studies evaluating the diagnostic performance of serum human epididymis protein 4 in the detection of ovarian cancer in postmenopausal women.

Authors (reference)	No. of patients (OC vs. BCG)	Assay	Threshold	Sens (95% CI)	Spec (95% CI)	LR+ (95% CI)	LR− (95% CI)
Nolen et al.³⁴	169 vs. 140	Luminex multiplexed ELISA	*73.7 pmol/L	0.83 (0.77–0.89)	0.84 (0.77–0.90)	5.3 (3.6–7.8)	0.20 (0.14–0.28)
Bandiera et al.³⁹	87 vs. 96	Abbott Architect	140 pmol/L	0.78 (0.68–0.86)	0.99 (0.94–1.00)	75.0 (10.6–528.9)	0.22 (0.15–0.33)
Molina et al.⁴⁰	84 vs. 59	Abbott Architect	150 pmol/L	0.85 (0.75–0.91)	0.95 (0.86–0.99)	16.6 (5.5–50.3)	0.16 (0.10–0.27)
Van Gorp et al.⁴¹	119 vs. 86	Fujirebio ELISA	150 pmol/L	0.53 (0.44–0.61)	0.92 (0.84–0.97)	6.5 (3.1–13.5)	0.51 (0.42–0.63)
Karlsen et al.³⁶	203 vs. 279	Abbott Architect	140 pmol/L	0.82 (0.76–0.87)	0.90 (0.85–0.93)	7.9 (5.5–11.2)	0.20 (0.15–0.27)
Novotny et al.³⁷	21 vs. 256	Abbott Architect	140 pmol/L	0.71 (0.48–0.89)	0.88 (0.83–0.91)	5.7 (3.8–8.7)	0.33 (0.17–0.64)
Chan et al.³⁸	43 vs. 53	Abbott Architect	112 pmol/L	0.53 (0.38–0.69)	0.98 (0.90–1.00)	28.4 (4.0–201.5)	0.47 (0.34–0.65)

OC, ovarian cancer; BCG, benign gynaecological disease; Sens, sensitivity; CI, confidence interval; Spec, specificity; LR+, positive likelihood ratio; LR−, negative likelihood ratio.

Original cut-off in pg/L, converted in pmol/L by using a multiplication factor of 0.004.

Assessing analytical performance and result comparability

Meeting with the need for more guidance about the appropriate application and interpretation of HE4 measurement, fundamental duties should be committed to biochemists before its clinical introduction. First, assessing the degree of comparability among different HE4 commercially available assays through correlation studies appears to be mandatory in the traceability era.^42,43 As a matter of fact, the diagnostic improvement reported in literature by the HE4 introduction has recently resulted in a rapid spread of marker measurement on different automated immunoassay platforms. For all marketed methods, validation data including information on inter-method bias have to be produced well before their clinical implementation. The strong association between diagnostic accuracy and good comparability of test results among different assays is well known.⁴⁴ Working on result harmonization fits for guideline purposes, as it allows an effective application of recommendations advocating adoption of common reference intervals and decision limits.⁴⁵ Furthermore, the lack of data on assay comparability does not permit to adjust marker results from studies employing different assays, as the data pooling might be censored, with a consequent loss of the accumulated clinical experience. For CA-125, a rather good comparability between methods has been shown, even if concerns have been raised on the selection of 35 kU/L as common diagnostic threshold for all assays.⁴⁶

Actually, to assess whether the performance of a new assay is acceptable, analytical goals should be defined. Notably, an unacceptable analytic bias for a new method may produce a systemic shift of marker results causing changes in the clinical decision making and even increasing health costs.⁴⁷ The assessment of biological variation of the marker may offer an objective and practical approach to derive analytical goals; however, no data are currently available for HE4.

Selecting adequate interpretation strategies

The optimization of HE4 decision thresholds is another main outcome needed in order to improve clinical decision making. As emerged from SR, HE4 cut-offs differed across available studies even whether using the same assay and only some authors considered to adopt different limits in pre- and postmenopausal status.¹¹ No consensus is currently achieved in this framework and there are still net discrepancies between reference intervals recommended by various manufactures in their package inserts, also for those assays employing the same antibodies and differing only for the signal detection technology. In particular, despite compelling data on the dependence of HE concentrations upon age and, in particular, on the two-fold increase in post- vs. premenopausal status,^35,48 specific recommendations for differentiating threshold levels have not been made.

In general, the concept of a fixed marker threshold level for clinical decision making in OC diagnosis and relapse monitoring is waning. The failure for detecting asymptomatic early-stage OC and early recurrence by resorting to CA-125 threshold levels has been widely emphasized.^49,50 On the other hand, in the screening setting, the evaluation of CA-125 changing pattern from serial marker measurements was found to significantly increase both diagnostic sensitivity and specificity.¹³ In particular, a preclinical rising trend of CA-125 concentrations was strongly associated to malignancy, whereas static or falling patterns indicated a disease-free steady state.⁴⁹ A warning was, however, issued from UK Medical Research Council/European Organisation for Research and Treatment of Cancer on the monitoring of OC relapse by CA-125 changes in patients following completion of primary treatment. In this framework, a rising CA-125 trajectory was not shown to reflect resistance to treatment, but the administration of platinum-based chemotherapy. Furthermore, deflections in the rising trends were associated to the activity of hormonal treatments (i.e., tamoxifen).⁵⁰

Considering previous remarks, several efforts are currently addressed to search and validate diagnostic algorithms incorporating CA-125 and HE4 absolute concentrations or changes, together with individual data (age, US parameters), for screening or primary/secondary care.^51,52 Since HE4 introduction, some manufacturers recommended the Risk of Ovarian Malignancy Algorithm (ROMA) to triage patients with pelvic mass.⁵² Although ROMA™ has received clearance from the US Food and Drug Administration (FDA), several concerns raise on its clinical validation facing on controversial results about its diagnostic accuracy when compared to CA-125 and HE4 alone.^41,53,54 A recent SR has reported ROMA as promising predictor of epithelial OC to replace CA-125, even showing higher sensitivity than HE4 stand-alone test.⁵⁵ However, in our opinion, a more cautionary message should be shared accounting for the several limitations of the considered individual studies, suggesting an underestimation of HE4 diagnostic performance. Notably, most of the recruited studies employed an unique decisional threshold for detecting HE4 positivity and only one used specific limits for pre- and postmenopausal women.³⁹ The above-reported promising association between marker concentration trend and diagnostic anticipation has encouraged the development of new and more complex algorithms considering changes over time in biomarker concentrations. Interestingly, algorithms based on longitudinal interpretation of the biomarker course appear to overcome the limits of those incorporating dichotomous results according to the single threshold rule, then improving the early detection of malignancy.⁵¹

Unbiased interpretation of marker changes

As mentioned above, different approaches (i.e., fixed cut-offs, algorithms, evaluation of marker trends) have been tested to classify patients according to CA-125 and HE4 results in several settings, such as screening symptom-free or symptomatic populations, triaging women with pelvic mass, staging patients with proven OC, monitoring response to therapy and detecting OC relapses. To date, the evaluation of marker changes, even integrated in longitudinal algorithms seems to be the more promising criterion to increase the diagnostic accuracy. However, by interpreting serial marker measurements some pitfalls should be taken into consideration in establishing generalized diagnostic rules. First, a great heterogeneity of tumour marker patterns (in particular for CA-125) has been observed among women in several frameworks and this theoretically underscores the need for incorporating individual-specific decision criteria in diagnostic protocols.¹⁷ In addition, a great body of literature has shown that dealing with moderate variations in tumour marker concentrations in serum, as those expected in screening and monitoring of relapse contexts, is quite difficult, as changes in serial results are due not only to disease progression/remission. Pre-analytical, analytical and within-subject biologic sources of variation represent a variable and relevant quote of what constitutes a significant difference between serial results.⁵⁶ All these sources of variability are accounted in the estimation of the reference change value (RCV).⁵⁷ For the most popular tumour markers, including CA-125, the rationale for the adoption of RCV to evaluate marker changes from serial measurements, seeking for a more sensitive and specific mean to identify early disease, is strong. This relies on the availability of data on biological variability.⁵⁸

For CA-125, the average intra-individual variation (24.7% as CV) is considerable, but lower than the inter-individual one (54.6% as CV).⁵⁸ This implies an index of individuality (II) (i.e., the ratio between intra- and inter-individual variation)⁵⁹ of ∼0.45, which results in a very limited utility of reference intervals for marker interpretation, in agreement with the clinical evidence. Accordingly, the hypothesis to reliably identify clinically significant CA-125 increases by evaluating serial marker changes using RCV has a strong biologic background for interpreting CA-125 concentrations. The biologic variability data provide a well-founded explanation on the clinical failure of considering fixed cut-off levels for CA-125. According to the approach suggested by Fraser and Petersen,⁶⁰ the number of specimens that should be collected to ensure that the mean marker result is within ±10% of the individual’s homeostatic set point can be obtained according to the following statistics: 1.96² (Analytical CV²+ Within-subject biologic CV²)/100. Using previously published imprecision data,⁶¹ for CA-125 it can be estimated that each patient should theoretically undergone up to 24 marker measurements to achieve a sufficiently accurate estimate of the CA-125 individual set point!

It is evident that, thinking on a possible replacing of CA-125 with HE4, the information on the HE4 biologic variation is mandatory to understand its biological behaviour and, consequently, how to deal with marker concentrations and changes. For instance, according to possible differences in HE4 biologic variability due to hormonal status, we could expect to differently manage marker results and changes in pre- and postmenopausal women. Although this option has not been practically considered for CA-125, the available studies on the estimation of biological variability for this marker have strongly supported this hypothesis.^23,62,63

In evaluating literature on biologic variability, concerns may arise on the study results often supporting the need for revaluating available information.^64,65 Basically, a well-defined study protocol to preclude marker homeostatic disturbances as well as to minimize pre-analytical sources of variability is recommended. In this framework, the number, the duration and the frequency of sampling collection, as well as the definition of individual’s steady-state (e.g., menstrual cycle in premenopausal women) may unpredictably affect the estimation of biologic variability. In particular, greatest care should be taken in assuring the same condition for all enrolled subjects (i.e., blood drawn at half of menstrual cycle) and in defining strict selection criteria to avoid spurious marker increases (e.g., by smoke, incidence of menstruation, etc.).^66–69 Unfortunately, the few available studies on CA-125 biologic variability disregarded these critical issues, collecting serial samples at a variable time distance for several individuals,^23,62,63 blending marker results from subjects in pre- and postmenopausal status,^62,63 including samples drawn during menstruation,^23,62,63 or drawn from patients with potential interfering conditions (i.e., smokers, carrying other malignancies).^23,62,63 In only one study,²³ all samples were analyzed as recommended in the same analytical batch.⁵⁶ Therefore, it is evident that CA-125 biologic variability data need to be revised and updated using a more rigorous experimental approach.

Concluding remarks

Faced with the low incidence of disease and the lack of reported survival benefit, stringent performances are required for the clinical introduction of effective OC biomarkers. In addition, the histological heterogeneity of OC necessitates focus on the most life threatening and aggressive subtype that is high-grade serous tumour. Although preferentially expressed by this OC subtype, CA-125, employed as stand-alone marker, cannot detect the disease at an early manageable stage and its rule-in capability is further hindered by low specificity. The updated evidence confirms better diagnostic performance of HE4 compared to CA-125,¹¹ even if this is probably not enough to definitively incorporate HE4 in clinical practice guidelines. Further steps are required in order to prioritize clinically relevant research questions (i.e., biomarker performances in early OC identification) and to provide generalizable study results. Accordingly, we have remarked the need to improve the study design as well as to thoroughly investigate emerged methodological issues to threaten strength and level of scientific evidence.

The rule of thumb for introducing a new biomarker is that it could effectively increase OC discrimination and add information to existing biomarkers. HE4 as stand-alone test is likely to accomplish a better identification of the malignancy, but, due to the similar expression profile, its addition to CA125 does not significantly increase the OC detection rate more than a few percent.⁵³ Therefore, it should be clear that the HE4 and CA-125 combination is unfavourable in a cost-benefit perspective. The replacement of CA-125 with HE4 appears advisable, but more focus is needed on the evaluation of assay performances and comparability and on the definition of appropriate interpretation strategies. In favour of HE4 with respect to CA-125, the complete characterization of the protein structure is an overriding prerequisite to overcome possible methodological limitations (e.g., stability of antibodies, analytical sensitivity, lack of assay harmonization), with a definitive benefit on the diagnostic performance. On the other hand, data on the biologic variability of HE4 should be built to better disclose which interpretative criteria may aid in disease identification. In this process, the laboratory assumes the role of ‘gatekeeper’ to prevent erroneous results and over-requesting. The deletion of obsolete biomarkers together with the introduction of new ones is known to exert a relevant impact both on laboratory and on total health-care costs, resulting fundamental to clinical governance.⁷⁰ In this perspective, the proposal to replace CA-125 with HE4 should be considered. Costs may play an important role in this framework involving several stakeholders (clinicians, GPs, laboratorians, hospital administrators, patients) in the delivery, payment and receipt of health care.⁷¹ Often the introduction of a new biomarker has been reported to not only change clinical practice, but also the reimbursement rate. However, sound clinical evidence should exert a relevant impact on reimbursement models, which for novel biomarkers has been proposed to shift from a cost-based to a value-based and evidence-based approach.⁷²

Footnotes

Acknowledgements

We would like to thank Federica Braga (Cattedra di Biochimica Clinica e Biologia Molecolare Clinica, Dipartimento di Scienze Biomediche e Cliniche ‘Luigi Sacco’, Università degli Studi, Milano, Italy) and Patrizia La Musta (Campus Cascina Rosa, Fondazione IRCCS Istituto Nazionale Tumori Milano, Italy) for their assistance in retrieving the cited original articles.

Declarations of conflicting interests

None declared.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Ethical approval

Not applicable.

Guarantor

SF and MP.

Contributorship

MP conceived the study and revised the manuscript. SF wrote the first draft of the manuscript and researched literature. SF and MP edited and approved the final version of the manuscript.

References

National Collaborating Centre for Cancer (UK). National Institute for Health and Clinical Excellence (NICE): Guidance. Ovarian cancer: the recognition and initial management of ovarian cancer, Cardiff, UK: National Collaborating Centre for Cancer, 2011.

Grann

Nørgaard

Blaakær

. Survival of patients with ovarian cancer in central and northern Denmark, 1998–2009. Clin Epidemiol 2011; 3: 59–64.

Autier

Boniol

. Caution needed for country-specific cancer survival. Lancet 2011; 377: 99–101.

Maringe

Walters

Butler

. ICBP module 1 working group. Stage at diagnosis and ovarian cancer survival: evidence from the international cancer benchmarking partnership. Gynecol Oncol 2012; 127: 75–82.

Gulland

. Women with symptoms of ovarian cancer should have a CA125 test, says NICE. BMJ 2011; 342: d2695–d2695.

Sturgeon

Duffy

Walker

. The national institute for health and clinical excellence (NICE) guidelines for early detection of ovarian cancer: the pivotal role of the clinical laboratory. Ann Clin Biochem 2011; 48: 295–299.

Cave

. NICE on ovarian cancer. Please include GPs in developing guidelines. BMJ 2011; 342: d3023–d3023.

Olaitan

. NICE on ovarian cancer. Recommendations for detection in primary care are flawed. BMJ 2011; 342: d3022–d3022.

Woolas

Jacobs

. Elevation of multiple serum markers in patients with stage I ovarian cancer. J Natl Cancer Inst 1993; 85: 1748–1751.

10.

Markowska

Manys

Kubaszewska

. Value of CA 125 as a marker of ovarian cancer. Eur J Gynaecol Oncol 1992; 13: 360–365.

11.

Ferraro S, Braga F, Lanzoni M, et al. Serum human epididymis protein 4 vs carbohydrate antigen 125 for ovarian cancer diagnosis: a systematic review. J Clin Pathol 2013; 6: 273–281.

12.

Buys

Partridge

Black

. PLCO project team. Effect of screening on ovarian cancer mortality: the prostate, lung, colorectal and ovarian (PLCO) cancer screening randomized controlled trial. JAMA 2011; 305: 2295–2303.

13.

Menon

Kalsi

Jacobs

. The UKCTOCS experience – reasons for hope? Int J Gynecol Cancer 2012; 22: S18–20.

14.

Rosenthal

. Ovarian cancer screening in the high-risk population – the UK familial ovarian cancer screening study (UKFOCSS). Int J Gynecol Cancer 2012; 22: S27–S28.

15.

Menon

Jacobs

. Ovarian cancer screening in the general population: current status. Int J Gynecol Cancer 2001; 11: 3–6.

16.

Hogg

Friedlander

. Biology of epithelial ovarian cancer: implications for screening women at high genetic risk. J Clin Oncol 2004; 22: 1315–1327.

17.

Crump

McIntosh

Urban

. Ovarian cancer tumor marker behavior in asymptomatic healthy women: implications for screening detection of asymptomatic ovarian cancer implications for screening. Cancer Epidemiol Biomarkers Prev 2000; 9: 1107–1111.

18.

Evans

Ziebland

McPherson

. Minimizing delays in ovarian cancer diagnosis: an expansion of Andersen's model of “total patient delay”. Fam Pract 2007; 24: 48–55.

19.

Nagle

Francis

Nelson

. Reducing time to diagnosis does not improve outcomes for women with symptomatic ovarian cancer: a report from the Australian ovarian cancer study group. J Clin Oncol 2011; 29: 2253–2258.

20.

Holschneider

Berek

. Ovarian cancer: epidemiology, biology, and prognostic factors. Semin Surg Oncol 2000; 19: 3–10.

21.

Clarke-Pearson

. Clinical practice. Screening for ovarian cancer. N Engl J Med 2009; 361: 170–177.

22.

Crawford

. Tumour markers in common epithelial cancers – practice and potential. Proceedings of UK NEQAS participants’ meeting, Edinburgh, UK, 24–26 September. NEQAS, 2012, pp. 15–15.

23.

Tuxen

Sölétormos

Petersen

. Assessment of biological variation and analytical imprecision of CA 125, CEA, and TPA in relation to monitoring of ovarian cancer. Gynecol Oncol 1999; 74: 12–22.

24.

Graybill

Zhu

. Detection and monitoring of ovarian cancer. Clin Chim Acta 2013; 415: 341–345.

25.

Anderson

McIntosh

. Assessing lead time of selected ovarian cancer biomarkers: a nested case-control study. J Natl Cancer Inst 2010; 102: 26–38.

26.

Tseng

Sprance

Carcangiu

. CA 125, NB/70K, and lipid-associated sialic acid in monitoring uterine papillary serous carcinoma. Obstet Gynecol 1989; 74: 384–387.

27.

Hori

Gambhir

. Mathematical model identifies blood biomarker-based early cancer detection strategies and limitations. Sci Transl Med 2011; 3: 109–116.

28.

Sturgeon

Hoffman

Chan

. National academy of clinical biochemistry laboratory medicine practice guidelines for use of tumor markers in clinical practice: quality requirements. Clin Chem 2008; 54: e1–e10.

29.

Duffy

Bonfrer

Kulpa

. CA125 in ovarian cancer: European group on tumor markers guidelines for clinical use. Int J Gynecol Cancer 2005; 15: 679–691.

30.

Konforte

Diamandis

. Is early detection of cancer with circulating biomarkers feasible? Clin Chem 2013; 59: 35–37.

31.

Carlson

Skates

Singer

. Screening for ovarian cancer. Ann Intern Med 1994; 121: 124–132.

32.

Escudero

Auge

Filella

. Comparison of serum human epididymis protein 4 with cancer antigen 125 as a tumor marker in patients with malignant and nonmalignant diseases. Clin Chem 2011; 57: 1534–1544.

33.

Abdel-Azeez

Labib

Sharaf

. HE4 and mesothelin: novel biomarkers of ovarian carcinoma in patients with pelvic masses. Asian Pac J Cancer Prev 2010; 11: 111–116.

34.

Nolen

Velikokhatnaya

Marrangoni

. Serum biomarker panels for the discrimination of benign from malignant cases inpatients with an adnexal mass. Gynecol Oncol 2010; 117: 440–445.

35.

Bolstad

Oijordsbakken

Nustad

. Human epididymis protein 4 reference limits and natural variation in a Nordic reference population. Tumour Biol 2012; 33: 141–148.

36.

Karlsen

Sandhu

Høgdall

. Evaluation of HE4, CA125, risk of ovarian malignancy algorithm (ROMA) and risk of malignancy index (RMI) as diagnostic tools of epithelial ovarian cancer in patients with a pelvic mass. Gynecol Oncol 2012; 127: 379–383.

37.

Novotny

Presl

Kucera

. HE4 and ROMA index in Czech postmenopausal women. Anticancer Res 2012; 32: 4137–4140.

38.

Chan

Chen

Nam

. The use of HE4 in the prediction of ovarian cancer in Asian women with a pelvic mass. Gynecol Oncol 2013; 128: 239–244.

39.

Bandiera

Romani

Specchia

. Serum human epididymis protein 4 and risk for ovarian malignancy algorithm as new diagnostic and prognostic tools for epithelial ovarian cancer management. Cancer Epidemiol Biomarkers Prev 2011; 20: 2496–2506.

40.

Molina

Escudero

Augé

. HE4 a novel tumour marker for ovarian cancer: comparison with CA 125 and ROMA algorithm in patients with gynaecological diseases. Tumour Biol 2011; 32: 1087–1095.

41.

Van Gorp

Cadron

Despierre

. HE4 and CA-125 as a diagnostic test in ovarian cancer: prospective validation of the risk of ovarian malignancy algorithm. Br J Cancer 2011; 104: 863–870.

42.

Schimmel

Zegers

Emons

. Standardization of protein biomarker measurements: is it feasible? Scand J Clin Lab Investig 2010; 70: 27–33.

43.

Ferraro

Mozzi

Panteghini

. Revaluating serum ferritin as a marker of body iron stores in the traceability era. Clin Chem Lab Med 2012; 50: 1911–1916.

44.

Klee

. Clinical interpretation of reference intervals and reference limits. A plea for assay harmonization. Clin Chem Lab Med 2004; 42: 752–757.

45.

Sturgeon

. Tumor markers in the laboratory: closing the guideline-practice gap. Clin Biochem 2001; 34: 353–359.

46.

Mongia

Rawlins

Owen

. Performance characteristics of seven automated CA 125 assays. Am J Clin Pathol 2006; 125: 921–927.

47.

Gallagher MP, Mobley LR, Klee GG, et al. The impact of calibration error in medical decision making. NIST Planning Report 04-1. 2004. http://www.rti.org/pubs/Calibration_Error.pdf (accessed 15 March 2013).

48.

Galgano

Hampton

Frierson

Jr . Comprehensive analysis of HE4 expression in normal and malignant human tissues. Mod Pathol 2006; 19: 847–853.

49.

Jacobs

Skates

Davies

. Risk of diagnosis of ovarian cancer after raised serum CA 125 concentration: a prospective cohort study. Br Med J 1996; 313: 1355–1358.

50.

Rustin

. Follow-up with CA125 after primary therapy of advanced ovarian cancer has major implications for treatment outcome and trial performances and should not be routinely performed. Ann Oncol 2011; 22: viii45–viii48.

51.

Drescher

Shah

Thorpe

. Longitudinal screening algorithm that incorporates change over time in CA125 levels identifies ovarian cancer earlier than a single-threshold rule. J Clin Oncol 2013; 31: 387–392.

52.

Moore

Brown

Miller

. The use of multiple novel tumor biomarkers for the detection of ovarian carcinoma in patients with a pelvic mass. Gynecol Oncol 2008; 108: 402–408.

53.

Jacob

Meier

Caduff

. No benefit from combining HE4 and CA125 as ovarian tumor markers in a clinical setting. Gynecol Oncol 2011; 121: 487–491.

54.

Moore

Miller

Disilvestro

. Evaluation of the diagnostic accuracy of the risk of ovarian malignancy algorithm in women with a pelvic mass. Obstet Gynecol 2011; 118: 280–288.

55.

Tie

Chang

. Does risk for ovarian malignancy algorithm excel human epididymis protein 4 and CA125 in predicting epithelial ovarian cancer: a meta-analysis. BMC Cancer 2012; 12: 258–258.

56.

Fraser

Harris

. Generation and application of data on biological variation in clinical chemistry. Crit Rev Clin Lab Sci 1989; 27: 409–437.

57.

Fraser

. Reference change values. Clin Chem Lab Med 2012; 50: 807–812.

58.

Desirable Specifications for Total Error, Imprecision, and Bias, Derived from Intra- and Inter-IndividualBiologic Variation. Updated in 2010. http://www.westgard.com/biodatabase1.htm?format=phocapdf (accessed 15 March 2013).

59.

Harris

. Effects of intra- and inter-individual variation on the appropriate use of normal range. Clin Chem 1974; 20: 1535–1542.

60.

Fraser

Petersen

. The importance of imprecision. Ann Clin Biochem 1991; 28: 207–211.

61.

Dolci

Scapellato

Mozzi

. Imprecision of tumour biomarker measurements on Roche modular E170 platform fulfills desirable goals derived from biological variation. Ann Clin Biochem 2010; 47: 171–173.

62.

Trapé

Perez de Olaguer

Buxó

. Biological variation of tumor markers and its application in the detection of disease progression in patients with non-small cell lung cancer. Clin Chem 2005; 51: 219–222.

63.

Browning

MCK

McFarlane

Horobin

. Objective interpretation of results for tumour marker. J Nucl Med Allied Sci 1990; 34: 89s–91s.

64.

Braga

Dolci

Mosca

. Biological variability of glycated hemoglobin. Clin Chim Acta 2010; 411: 1606–1610.

65.

Braga

Panteghini

. Biologic variability of C-reactive protein: is the available information reliable? Clin Chim Acta 2012; 413: 1179–1183.

66.

Dehaghani

Ghiam

Hosseini

. Factors influencing serum concentration of CA125 and CA15-3 in Iranian healthy postmenopausal women. Pathol Oncol Res 2007; 13: 360–364.

67.

Hallamaa

Suvitie

Huhtinen

. Serum HE4 concentration is not dependent on menstrual cycle or hormonal treatment among endometriosis patients and healthy premenopausal women. Gynecol Oncol 2012; 125: 667–672.

68.

Anastasi

Granato

Marchei

. Ovarian tumor marker HE4 is differently expressed during the phases of the menstrual cycle in healthy young women. Tumour Biol 2010; 31: 411–415.

69.

Erbağci

Yilmaz

Kutlar

. Menstrual cycle dependent variability for serum tumor markers CEA, AFP, CA 19-9, CA 125 and CA 15-3 in healthy women. Dis Markers 1999; 15: 259–267.

70.

Dolci

Scapellato

Panteghini

. One-year audit of the implementation of recommendations for optimal use of tumor biomarkers in a university hospital. Biochim Clin 2008; 32: 181–185.

71.

Cohen

Reynolds

. Interpreting the results of cost-effectiveness studies. J Am Coll Cardiol 2008; 52: 2119–2126.

72.

Horvath

Kis

Dobos

. Guidelines for the use of biomarkers: principles, processes and practical considerations. Scand J Clin Lab Invest 2010; 242: 109–116.