Sage Journals: Discover world-class research

Abstract

Objective

To explore how the definition of the target condition and post hoc exclusion of participants can limit the usefulness of diagnostic accuracy studies.

Methods

We used data from a systematic review, conducted for a NICE diagnostic assessment of risk scores to inform secondary care decisions about specialist referral for women with suspected ovarian cancer, to explore how the definition of the target condition and post hoc exclusion of participants can limit the usefulness of diagnostic accuracy studies to inform clinical practice.

Results

Fourteen of the studies evaluated the ROMA score, nine used Abbott ARCHITECT tumour marker assays, five used Roche Elecsys. The summary sensitivity estimate (Abbott ARCHITECT) was highest, 95.1% (95% CI: 92.4 to 97.1%), where analyses excluded participants with borderline tumours or malignancies other than epithelial ovarian cancer and lowest, 75.0% (95% CI: 60.4 to 86.4%), where all participants were included. Results were similar for Roche Elecsys tumour marker assays. Although the number of patients involved was small, data from studies that reported diagnostic accuracy for both the whole study population and with post hoc exclusion of those with borderline or non-epithelial malignancies suggested that patients with borderline or malignancies other than epithelial ovarian cancer accounts for between 50 and 85% of false-negative ROMA scores.

Conclusions

Our results illustrate the potential consequences of inappropriate population selection in diagnostic studies; women with non-epithelial ovarian cancers or non-ovarian primaries, and those borderline tumours may be disproportionately represented among those with false negative, ‘low risk’ ROMA scores. These observations highlight the importance of giving careful consideration to how the target condition has been defined when assessing whether the diagnostic accuracy estimates reported in clinical studies will translate into clinical utility in real-world settings.

Keywords

Tumour markers cancer statistics

Introduction

The failure to design diagnostic test studies which utilize the spectrum of participants representative of the population for the intended use in clinical practice has been extensively discussed in the diagnostic literature for many years.^1–4 Using healthy volunteers as controls and/or including more individuals with advanced disease and fewer individuals with early stage disease are generally held to be associated with overestimation of test accuracy;^1,3 however, the appropriateness, or otherwise, of spectrum of participants in a study can be less obvious. A recent review article, describing the sources of bias in diagnostic accuracy studies describes how the patient spectrum is altered at each referral, illustrating the importance of selecting study participants who correspond with the stage in the clinical pathway (referral stage) at which the test being evaluated would be used in practice. Reporting guidelines for test accuracy studies also emphasize the importance of defining the intended use and clinical role of the test being evaluated.⁵ Despite this, a recent study of 112 systematic reviews of diagnostic accuracy studies found that 46% failed to provide a clear description of the intended role of the test in the clinical pathway and overemphasis on positive conclusions (not supported by the data) was present in 72% of abstracts and 69% of full texts.⁶ Interpretation of the results of test accuracy studies for clinical practice remains inadequate; we present an example of current relevance to clinical biochemistry.

Early diagnosis of ovarian cancer is particularly problematic, with a high proportion of women (58%) still diagnosed at an advanced stage (stage III or IV) and 21% having metastases at diagnosis.¹⁰ Ovarian cancer survival is strongly related to stage at diagnosis; 2012 data showed that the one-year and five-year survival rates for women diagnosed at stage I were 97% and 90% versus 53% and 4% for women diagnosed at stage IV.⁹ Improving early diagnosis is therefore a priority and, when evaluating testing strategies, it is essential to consider possible variation in performance for the detection of different stages of ovarian cancer. Similarly, while the majority of studies about ovarian cancer diagnosis concern epithelial carcinomas, there is some evidence to indicate that the diagnostic performance of tumour markers and risk scores may vary between tumours of different tissue types;¹¹ possible effects of tumour tissue type on estimates of test performance should, therefore, also be considered.

Current guidance, NICE clinical guideline (CG122) Ovarian cancer: recognition and initial management,¹² recommends the calculation of a risk malignancy index I (RMI I) score (based on serum CA125, ultrasound and menopausal status) followed by referral to a specialist gynaecological oncology multidisciplinary team (SMDT) for people with an RMI score ≥250. We have recently completed a systematic review to assess the clinical effectiveness of using alternative risk scores (ROMA, IOTA simple ultrasound rules, the IOTA ADNEX model, Overa (MIA2G) and RMI 1 at thresholds other than 250) to guide referral decisions for women with adnexal mass and suspected ovarian cancer in secondary care. The review was undertaken as part of a diagnostic appraisal to inform the development of new NICE diagnostics guidance (DG31).¹³

The ROMA score uses serum HE4 and serum CA125 concentrations, along with menopausal status, to generate an individualized estimate of the risk that a person has ovarian cancer.¹⁴ The objective of this article was to explore how the definition of the target condition and the post hoc exclusion of participants can limit the usefulness of diagnostic accuracy studies. We used studies of the ROMA score, taken from our systematic review, to provide an example of current relevance to Clinical Biochemistry practitioners and researchers. The full results of our systematic review are published elsewhere.¹⁵

Methods

Systematic review methods followed the principles outlined in the Centre for Reviews and Dissemination guidance for undertaking reviews in healthcare,¹⁶ and the NICE Diagnostic Assessment Programme manual.¹⁷ This article focuses on studies assessing the accuracy of the ROMA score and explores how the spectrum of study participants may affect the usefulness of results for clinical practice.

Data sources

We searched 22 resources up to November 2016, including MEDLINE, EMBASE, clinical trials registers and conference proceedings (Radiological Society of North America, American Society of Clinical Oncology Annual Conference, Society of Gynecologic Oncology, The National Cancer Research Institute, European Society of Radiology). Furthermore, we contacted experts in the field, with the aim of identifying any unpublished studies. Search strategies were based on the specified risk scores and the target condition (ovarian cancer), and did not include any study design terms or filters¹⁸; example search strategies are provided online (web appendix 1). No restrictions on language or publication status were applied to any searches.

Inclusion criteria

Diagnostic cohort studies, which assessed the accuracy of risk scores for identifying those women with suspected ovarian cancer who require referral from secondary care to specialist oncology services, were eligible for inclusion. Studies were required to use histological confirmation as the reference standard.

We included secondary care studies in women of any age with suspected ovarian cancer, who had not previously been treated for ovarian cancer and were not currently receiving chemotherapy; studies were included if the setting was unclear, but the population was described as people with suspected ovarian cancer. For studies of the ROMA score, only studies using tumour marker (CA125 and HE4) assays commercially available in the UK (Abbott ARCHITECT (Abbott Diagnostics, Abbott Park, Illinois, USA), Roche Elecsys (Roche Diagnostics, Rotkreuz, Switzerland) and Fujirebio Lumipulse G (Fujirebio Diagnostics, Göteburg, Sweden)) were included. Only studies of the ROMA score are included in this article.

Included studies were required to report sufficient data to determine the numbers of true positive, false positive, false negative and true negative test results; the primary outcomes were sensitivity and specificity and the data needed to calculate these parameters.

Studies were screened for relevance independently by two reviewers and full text articles of studies considered potentially relevant were assessed for inclusion by one reviewer and checked by a second. Disagreements, at either stage of study selection, were resolved through discussion and consensus, or by consultation with a third reviewer.

Data extraction

One reviewer extracted data using a prepiloted data extraction form and extractions were checked by a second reviewer; any disagreements were resolved through discussion and consensus, or by consultation with a third reviewer.

Quality assessment

The methodological quality of included test accuracy studies was assessed using QUADAS-2,¹⁹ which uses four domains to assess risk of bias and three domains to assess the applicability of the study to the review question. Quality assessment was undertaken by one reviewer and checked by a second reviewer and any disagreements were resolved by consensus or discussion with a third reviewer.

Analysis

Sensitivity and specificity were calculated for each set of 2 × 2 data. All meta-analyses estimated separate pooled estimates of sensitivity and specificity, using random-effects logistic regression.²⁰ The bivariate/hierarchical summary receiver operating characteristic model^21–23 could not be applied because the data-sets for each tumour marker assay manufacturer and target condition were too small and/or homogeneous. Heterogeneity was assessed visually using summary receiver operating characteristic plots or receiver operating characteristic (ROC) space plots. Analyses were performed in MetaDisc.²⁴

Results

Overview of included studies

The searches identified 2456 references; 38 studies, reported in 48 publications, were included in the full systematic review. Figure 1 shows the flow of studies through the review process. Fourteen studies reported data on the accuracy of the ROMA score and were included in this article,^25–38 of which nine used Abbott ARCHITECT tumour marker assays^25–33 and five used Roche Elecsys tumour marker assays.^34–38 None of the included studies used the Fujirebio Lumipulse G system.

Figure 1.

Flow of studies through the review process.

All studies included women with adnexal/ovarian mass. Four of the 14 studies reported analyses which excluded some participants based on their final histopathological diagnosis^25,28,31,35 and two studies included final histopathological diagnosis in their participant selection criteria.^33,36 Because final histopathological diagnosis is information which could not be known at the point in the clinical pathway where the ROMA score would be used, we consider that both of these approaches are, in effect, post hoc exclusions; no study provided a justification for these exclusions. All four of the studies that excluded participants from their analyses, based on final histopathological diagnosis appeared to use the terms ‘ovarian cancer’ and ‘epithelial ovarian cancer’ interchangeably; study objectives were framed in terms of differentiating between benign and malignant ovarian masses, whereas accuracy results were reported for epithelial ovarian cancer. Two of these studies also excluded patients with borderline tumours from their analyses.^31,35 In contrast, both of the studies that used histopathological diagnosis as a participant selection criterion clearly reported an objective of evaluating the ROMA score for the detection of epithelial ovarian cancer. A further four studies did not provide a clear definition of the target condition.^26,27,29,30 Details of studies evaluating the ROMA score, their associated references and main target condition are provided in Table 1.

Table 1.

Details of studies evaluating the ROMA score.

Details	Country	n	Reported target condition
Abbott ARCHITECT
Al Musalhi et al.³²	Oman	213	All malignant tumours including borderline
Chan et al.²⁵	Multi-national (Asia)	387	All epithelial ovarian malignancies excluding borderline
Clemente and Benitez²⁶	Philippines	62	Ovarian malignancies (undefined – not clear whether borderline tumours were included)
Karlsen et al.³¹	Denmark	579	All epithelial ovarian malignancies excluding borderline
Li et al.²⁷	China	917	Ovarian malignancies (undefined – not clear whether borderline tumours were included)
Moore et al.²⁸Moore et al.⁴⁰Moore et al.⁴¹	USA	450	All epithelial ovarian malignancies including borderline
Novotny et al.²⁹	Czech Republic	277	Ovarian malignancies (undefined – not clear whether borderline tumours were included)
Presl et al.³⁰	Czech Republic	552	Ovarian malignancies (undefined – not clear whether borderline tumours were included)
Winarto et al.³³	Indonesia	128	All epithelial ovarian malignancies including borderline
Roche
Janas et al.³⁴	Poland	259	All malignant tumours including borderline
Shulman et al.³⁸	USA	993	All malignant tumours including borderline
Xu et al.³⁵	China	521	All epithelial ovarian malignancies excluding borderline
Yanaranop et al.³⁷	Thailand	260	All malignant tumours – borderline tumours classified as disease negative
Zhang et al.³⁶Chen et al.⁴²	China	612	All epithelial ovarian malignancies excluding borderline

Table 2.

QUADAS-2 results for ROMA score.

Study	Risk of bias				Applicability
Study	Patient selection	Index test	Reference standard	Flow and timing	Patient selection	Index test	Reference standard
Al Musalhi et al.³²	Unclear	Low	Unclear	Unclear	Low	Low	Low
Chan et al.²⁵	Low	Low	Low	High	Low	Low	High
Clemente and Benitez²⁶	Unclear	Unclear	Low	Low	Unclear	Low	Unclear
Janas et al.³⁴	Unclear	Low	Unclear	Low	Unclear	Low	Low
Karlsen et al.³¹	Unclear	Low	Unclear	High	Low	Low	High
Li et al.²⁷	Low	Low	Unclear	Unclear	Unclear	Low	Unclear
Moore et al.²⁸	Unclear	Low	Unclear	High	High	Low	High
Novotny et al.²⁹	Unclear	Low	Unclear	Unclear	Low	Low	Low
Presl et al.³⁰	Unclear	Unclear	Unclear	Unclear	Low	Unclear	Unclear
Shulman et al.³⁸	Unclear	Unclear	Unclear	Unclear	Low	Low	Unclear
Winarto et al.³³	Unclear	Low	Unclear	High	Unclear	Low	High
Xu et al.³⁵	High	Low	Unclear	High	Unclear	Low	High
Yanaranop et al.³⁷	Unclear	Low	Low	Low	Low	Low	High
Zhang et al.³⁶	High	Low	Unclear	High	Unclear	Low	High

Methodological quality of studies assessing the ROMA score

None of the studies were rated as ‘low’ risk of bias for all domains of the QUADAS-2 tool and only two were rated as having ‘low’ concerns regarding all applicability domains.^29,32 A summary of the QUADAS-2 assessments for each study is provided in Table 2.

The main potential sources of bias concerned flow and timing. Six (43%) studies were rated as ‘high’ risk of bias on the flow and timing domain, because some participants were excluded from the analyses after their histopathological diagnoses had been established.^{25,28,31,33,35,36} This approach has been taken, by some researchers, in order to allow calculation of risk score performance data for specific target conditions, which are subsets of ovarian cancer (e.g. epithelial ovarian cancer, excluding borderline tumours), but was classified as inappropriate exclusion because final histopathological diagnosis is information which could not be known at the point in the clinical pathway where the ROMA score would be used.

These six studies were also classified as having ‘high’ concern regarding applicability, with respect to how the target condition was defined by the reference standard. This is because, in ‘real world’ clinical practice, it is likely that the appropriate target condition, for women presenting with adnexal mass who are being considered for specialist referral, would be considered to be ‘all malignant tumours’. Web appendix 2 lists final histological diagnoses (where reported) of study participants.

Accuracy of the ROMA score using Abbott ARCHITECT tumour marker assays

Nine studies used Abbott ARCHITECT tumour marker assays.^25–33 Only one study included all participants in the analysis, regardless of their final histopathological diagnosis (target condition: all malignant tumours including borderline).³² Two studies excluded women with histopathological diagnoses other than epithelial ovarian cancer, but included women with borderline tumours.^28,33 Two further studies excluded participants with non-epithelial ovarian cancer, participants with non-ovarian cancers and participants with borderline tumours;^25,31 the distribution of positive and negative ROMA score results, in these patients, was not reported. The remaining four studies did not report a clear definition of the target condition; the results for these studies are not reported.^26,27,29,30

The sensitivity estimate for the ROMA score was highest, 95.1% (95% CI: 92.4 to 97.1%), where analyses excluded participants with borderline tumours and those with malignancies other than epithelial ovarian cancer and lowest, 75.0% (95% CI: 60.4 to 86.4%), where all participants were included in the analysis (see Table 3). Conversely, the specificity estimate for the ROMA score was highest, 87.9% (95% CI: 81.9 to 92.4%), in the study which included all participants³² and lowest, 62.5% (95% CI: 59.7 to 65.3%), where analyses excluded participants with borderline tumours and those with malignancies other than epithelial ovarian cancer (see Table 3).

Table 3.

Accuracy of the ROMA by tumour marker manufacturer and target condition.

Study ID	Threshold	TP	FN	FP	TN	Total, n	Sensitivity, %(95% CI)	Specificity, %(95% CI)
Abbott ARCHITECT tumour marker assays
All malignant tumours including borderline
Al Musalhi et al.³²	13.1%/27.7%	36	12	20	145	213	75.0 (60.4, 86.4)	87.9 (81.9, 92.4)
Target condition: Epithelial ovarian malignancies including borderline
Moore et al.²⁸	13.1%/27.7%	59	8	96	287	450	88.1 (77.8, 94.7)	74.9 (70.3, 79.2)
Winarto et al.³³	7.4%/25.3%	61	6	35	26	128	91.0 (81.5, 96.6)	42.6 (30.0, 55.9)
Target condition: Epithelial ovarian malignancies excluding borderline
Chan et al.²⁵	7.4%/25.3%	58	7	41	281	387	89.2 (79.1, 95.6)	87.3 (83.1, 90.7 )
Karlsen et al.³¹	7.4%/25.3%	244	8	371	438	1061	96.8 (93.8, 98.6)	54.1 (50.6, 57.6)
Winarto et al.³³	7.4%/25.3%	47	3	35	26	111	94.0 (83.5, 98.7)	42.6 (30.0, 55.9)
Summary estimates							95.1 (92.4, 97.1)	62.5 (59.7, 65.3)
Target condition: Epithelial ovarian malignancies (stage III/IV) – borderline and stage I/II tumours excluded
Chan et al.²⁵	7.4%/25.3%	35	3	41	281	360	92.1 (78.6, 98.3)	87.3 (83.1, 90.7)
Moore et al.²⁸	13.1%/27.7%	34	0	96	287	417	100 (89.7, 100)	74.9 (70.3, 79.2)
Target condition: Epithelial ovarian malignancies (stage I/II) – borderline and stage III/IV tumours excluded
Chan et al.²⁵	7.4%/25.3%	19	4	41	281	345	82.6 (61.2, 95.0)	87.3 (83.1, 90.7)
Moore et al.²⁸	13.1%/27.7%	9	3	96	287	395	75.0 (42.8, 94.5)	74.9 (70.3, 79.2)
Target condition: Ovarian borderline tumours – higher stage tumours excluded
Chan et al.²⁵	7.4%/25.3%	9	7	41	281	338	56.3 (29.9, 80.2)	87.3 (83.1, 90.7)
Roche tumour marker assays
Target condition: All malignant tumours including borderline
Janas et al.³⁴	11.4%/29.9%	52	14	39	154	259	78.8 (67.0, 87.9)	79.8 (73.4, 85.2)
Shulman et al.³⁸	11.4%/29.9%	194	51	158	590	993	79.2 (73.7, 83.8)	78.9 (75.8, 81.7)
Summary estimates							79.1 (74.2, 83.5)	79.1 (76.3, 81.6)
Target condition: All malignant tumours – borderline tumours classified as disease negative
Yanaranop et al.³⁷	11.4%/29.9%	62	12	58	128	260	83.8 (73.4, 91.3)	68.8 (61.6, 75.4)
Target condition: All malignant tumours excluding borderline
Janas et al.³⁴	11.4%/29.9%	42	2	39	154	237	95.5 (84.5, 99.4)	79.8 (73.4, 85.2)
Target condition: Epithelial ovarian malignancies excluding borderline
Xu et al.³⁵	11.4%/29.9%	113	97	39	272	521	53.8 (46.8, 60.7	87.5 (83.3, 90.9)
Target condition: Epithelial ovarian malignancies – borderline tumours classified as disease negative
Yanaranop et al.³⁷	11.4%/29.9%	58	8	58	128	252	87.9 (77.5, 94.6)	68.8 (61.6, 75.4)
Target condition: Epithelial ovarian malignancies (stage I) – borderline tumours classified as disease negative and higher stage tumours excluded
Yanaranop et al.³⁷	11.4%/29.9%	23	7	58	128	216	76.7 (57.7, 90.1)	68.8 (61.6, 75.4)
Target condition: Epithelial ovarian malignancies (II–IV) – borderline tumours classified as disease negative and stage I tumours excluded
Yanaranop et al.³⁷	11.4%/29.9%	35	1	58	128	222	97.2 (85.5, 99.9)	68.8 (61.6, 75.4)
Target condition: Epithelial ovarian malignancies (stage III/IV) – borderline and stage I/II tumours excluded
Zhang et al.³⁶	11.4%/29.9%	143	16	72	276	507	89.9 (84.2, 94.1)	79.3 (74.7, 83.4)
Target condition: Epithelial ovarian malignancies (stage I/II) – borderline and stage III/IV tumours excluded
Zhang et al.³⁶	11.4%/29.9%	49	15	72	276	412	76.6 (64.3, 86.2)	79.3 (74.7, 83.4)

TP: true positive; FP: false positive; FN: false negative; TN: true negative.

One study reported test performance estimates calculated both with and without the inclusion of participants with borderline tumours.³³ Although the number of participants involved was small, these data indicated that around half of all false-negative risk scores were accounted for by patients with borderline tumours, 3/6 (50%).³³ Approximately 13% (17/128) of the participants in this study had borderline tumours, while 39% (50/128) had malignant tumours, i.e. a higher proportion of patients with borderline tumours had a negative ROMA score, 17.6% (3/17), than was the case for patients with malignant tumours, 3/50 (6%).³³

Two studies, using different thresholds, assessed the variation in the performance of the ROMA score with different stages of epithelial ovarian cancer (see Table 3).^25,28 In both studies, the sensitivity estimate was highest, 92.1% (95% CI: 78.6 to 98.3%) and 100% (95% CI: 89.7 to 100%), where the target condition was stage III/IV epithelial ovarian cancer and patients with stage I/II and borderline disease were excluded from the analysis and decreased, 82.6% (95% CI: 61.2 to 95.0%) and 75.0% (95% CI: 42.8 to 95.4%) where the where the target condition was stage I/II epithelial ovarian cancer, and patients with stage III/IV and borderline disease were excluded from the analysis.^25,28 When the target condition was borderline epithelial tumours and all patients with more advanced stage disease were excluded from the analysis, the sensitivity estimate was significantly lower, 56.3% (95% CI: 29.9 to 80.2%).²⁵

Accuracy of the ROMA score using Roche tumour marker assays

Five studies used Roche Elecsys tumour marker assays.^34–38 Two studies included all patients, regardless of final histopathological diagnosis (target condition all malignant tumours). The summary estimates of sensitivity and specificity derived from these studies were 79.1% (95% CI: 74.2 to 83.5%) and 79.1% (95% CI: 76.3 to 81.6%), respectively.^34,38 One of these studies also reported test accuracy when study participants with borderline tumours were excluded from the analysis.³⁴ The exclusion of participants with borderline tumours resulted in increased sensitivity, 95.5% (95% CI: 84.5 to 99.4%), and unchanged specificity, 79.3% (95% CI: 73.4 to 85.2%).³⁴ Data from this study indicated that patients with borderline tumours and those with non-ovarian primaries accounted for a high proportion, 12/14 (86%), of the false-negative risk scores observed.³⁴

A further study included all participants, but classified those found to have borderline ovarian tumours as disease negative.³⁷ The sensitivity estimate from this study appeared slightly higher than that from the studies where borderline tumours were classified as positive, 83.8% (95% CI: 73.4 to 91.3%), and the specificity estimate appeared slightly lower, 68.8% (95% CI: 61.6 to 75.4%), but neither difference was statistically significant (see Table 3).³⁷ The same study also reported test performance data, where eight (3%) patients with non-epithelial ovarian cancer and non-ovarian primaries were excluded from the analysis; this exclusion did not significantly change the results (see Table 3). Although the numbers involved were small, it should be noted that patients with malignancies other than epithelial ovarian cancer accounted for four (50%) of the false-negative results.³⁷ This study also assessed the variation in the performance of the ROMA score with different stages of epithelial ovarian cancer (see Table 3).³⁷ The sensitivity estimate was highest, for both the ROMA score 97.2% (95% CI: 95.5 to 99.9%) and the RMI 1 88.9% (95% CI: 73.9 to 96.9%), where the target condition was stage II to IV epithelial ovarian cancer and patients with stage I disease were excluded from the analysis.³⁷ A second study observed a similar pattern for stage III/IV epithelial ovarian cancer and patients, with stage I/II disease and borderline tumours excluded, compared with stage I/II epithelial ovarian cancer and patients, with stage III/IV disease and borderline tumours excluded³⁶ (see Table 3).

Discussion

All studies described in this article were diagnostic cohort studies, taken from our systematic review of ovarian cancer risk scores, which reported data on the diagnostic accuracy of the ROMA score using either Roche Elecsys or Abbott ARCHITECT tumour marker assays. Using either manufacturer’s tumour marker assays, sensitivity estimates for the ROMA were highest, where analyses excluded participants with borderline tumours and those with malignancies other than epithelial ovarian cancer and lowest, where all participants were included in the analysis, regardless of their final histopathological diagnosis. The analysis which included all participants, regardless of their final histopathological diagnosis, is more likely to reflect the performance of the score in a clinical setting since the population in which the risk score is to be used will be defined by presenting characteristics and is likely to include women with a variety of histopathological ovarian tumour types as well as some whose primary cancer is subsequently found to be non-ovarian.

Our results also indicate that the ROMA score is better at identifying women with high-grade ovarian tumours (stage III/IV), than low-grade tumours (stage I/II) or borderline tumours and is better at identifying epithelial ovarian tumours than other histopathological tissue types. This is a potential limitation in the clinical setting, where there is a heterogeneous mix of tumour tissue types and stages. It is also an indication of a fundamental limitation of the diagnostic test accuracy concept as applied to cancer, which assumes a single tissue type and tumour stage that can be established by histopathological examination of an excised sample. The understanding of how cancers evolve is currently undergoing a period of intense research which show that cancer most likely do not evolve on a linear pathway from low grade to high grade but are a heterogeneous mix of many subclones which may evolve separately.⁴³ Depending on how the tumour is sampled for histology, intratumour heterogeneity impacts upon our ability to find suitable cancer biomarkers for clinical use. Diagnostic studies of the future will have to give careful consideration to the evolution of cancer and understand that a person diagnosed with cancer may in fact have several tumours each with a different genetic origin, and therefore each will have different diagnostic or prognostic consequences. Re-evaluation of the use of single tissue samples per patient, and the assumption of tumour homogeneity is required to update diagnostic research in the ongoing evolution of personalized medicine.

Previous systematic reviews of the ROMA score have focused on predicting ovarian cancer (no definition reported) or epithelial ovarian cancer, have combined data from studies using different manufacturers’ tumour marker assays and thresholds and have not clearly described how study participants with borderline tumours and those with non-ovarian primaries were classified.^11,44,45 The resultant summary estimates of test performance have tended to be higher than those described in this article (sensitivity 85% to 87%, specificity 82% to 86%), and the authors’ conclusions about the potential clinical utility of the ROMA score may perhaps be over optimistic.

The definition of the target condition is a crucial consideration when assessing whether the diagnostic accuracy estimates reported in clinical studies will translate into clinical utility in real-world settings. In the current example, to define the target condition as ‘epithelial ovarian cancer’ implies that how women with other malignancies are classified by the ROMA score is not relevant. Clearly, such women form part of the spectrum of those presenting with an adnexal mass (those in whom the ROMA score is intended to be used). Furthermore, post hoc exclusion of study participants based on their final histopathological diagnosis requires information that could not be known at the point of presentation. Studies should therefore include all participants in their analyses. Consideration of the data from those studies that reported accuracy estimates for both the whole study population (target condition all malignant tumours including borderline) and for selected populations (participants found to have borderline tumours and/or those with non-epithelial ovarian cancers or non-ovarian primaries excluded) indicates that patients with borderline tumours and those with non-epithelial ovarian cancers or non-ovarian primaries may be disproportionately represented among those with false-negative ROMA scores; it should be emphasized that these observations are derived from very small numbers of patients and should be viewed as hypothesis generating. The downstream consequences (treatment and prognosis) of a false negative, low risk, classification are likely to differ between patients with different histological cancer types and between those with borderline tumours and those with higher stage malignancies, although all histological cancer types will require referral to an SMDT. A more complete exploration of the types of patients who are likely to be misclassified as low risk, as well as an investigation of the downstream clinical consequences for these patients is needed. The potential to detect non-epithelial ovarian cancers by combining other tests (e.g. alpha fetoprotein (AFP) and beta human chorionic gonadotropin (beta-hCG), as recommended in CG122,¹² for women under 40 with suspected ovarian cancer) with the ROMA score is unclear and may warrant exploration in future studies.

There remains a further question, regarding the real-world clinical applicability of studies evaluating the ROMA score. All participants in the identified studies underwent surgery (i.e. histological confirmation of disease status was available). In practice, risk scores may be used, in secondary care, to triage patients to surgery or surveillance/conservative management, as well as to guide decisions about where surgery should be undertaken (referral to a specialist gynaecological oncology unit). This potential mismatch between the study populations and real-world clinical practice is reflected in the relatively high estimate for the prevalence of malignancy (25.1%) derived from the ROMA studies included in our systematic review. It should be noted that a lower prevalence of malignancy may also affect risk score performance in practice. It could be argued that a more realistic estimate of the performance of the ROMA score would be obtained by including both patients undergoing surgery and those who are managed conservatively, and applying a mixed reference standard of histological confirmation or follow-up for a specified minimum period.

Conclusions

Despite the optimistic conclusions presented by some research studies, our results indicate that the ROMA score is unlikely to provide adequate sensitivity to be of use in guiding decisions about referral from secondary care to an SMDT. There are limited data to indicate that patients with borderline tumours and those with non-epithelial ovarian cancers or non-ovarian primaries may be disproportionately represented among those with false-negative ROMA scores. Future studies should include populations that reflect the referral point at which the ROMA score is intend to be used. These observations highlight the importance of giving careful consideration to how the target condition has been defined, and whether particular groups of patients have been inappropriately excluded from the analyses, when assessing whether the diagnostic accuracy estimates reported in clinical studies will translate into clinical utility in real-world settings.

Supplemental Material

Supplemental material for Clinically inappropriate post hoc exclusion of study participants from test accuracy calculations: the ROMA score, an example from a recent NICE diagnostic assessment

Supplemental material for Clinically inappropriate post hoc exclusion of study participants from test accuracy calculations: the ROMA score, an example from a recent NICE diagnostic assessment by Shona Lang, Nigel Armstrong, Sohan Deshpande, Bram Ramaekers, Sabine Grimm, Shelley de Kock, Jos Kleijnen and Marie Westwood in Annals of Clinical Biochemistry

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This report presents independent research funded by the National Institute for Health Research (NIHR). The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the NHS, the NIHR, NETSCC, the HTA programme or the Department of Health.

Ethical approval

Not applicable.

Guarantor

MW.

Contributorship

MW and SL planned and drafted this article. All authors contributed to planning and interpretation of the systematic review on which this article is based and all authors provided input to the article. SdK devised and performed the literature searches and provided information support to the project. All parties were involved in drafting and/or commenting on the report.

Supplementary material

Additional supplementary information may be found with the online version of this article.

References

Lijmer

Mol

Heisterkamp

et al . Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282: 1061–1066.

Whiting

Quality of diagnostic accuracy studies: the development, use, and evaluation of QUADAS. PhD thesis. University of Amsterdam, Amsterdam, 2006.

Rutjes

Reitsma

Vandenbroucke

et al . Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005; 51: 1335–1341.

Schmidt

Factor

RE.

Understanding sources of bias in diagnostic accuracy studies. Arch Pathol Lab Med 2013; 137: 558–565.

Cohen

Korevaar

Altman

et al . STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016; 6: e012799.

McGrath

McInnes

MDF

van Es

et al . Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies. Clin Chem 2017; 63: 1353–1362.

Office for National Statistics. Cancer registration statistics, England: 2014. Cancer diagnoses and age-standardised incidence rates for all cancer sites by age, sex and region. Newport: ONS, 2016, 17 p, www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/cancerregistrationstatisticsengland/2014/pdf (accessed 22 September 2016).

Cancer Research UK. Ovarian cancer (C56-C57.4): 2013. Number of new cases, crude and European age-standardised (AS) incidence rates per 100,000 population, females, UK, www.cancerresearchuk.org/sites/default/files/cstream-node/cases_rates_ovary_1.pdf (2015, accessed 22 September 2016).

Cancer Research UK. Ovarian cancer (C56-C57.4): 2012–2014. Average number of deaths per year and age-specific mortality rates per 100,000 population, UK. London: Cancer Research UK, www.cancerresearchuk.org/sites/default/files/cstream-node/deaths_crude_ovary_M14.pdf (2015, accessed 22 September 2016).

10.

Cancer Research UK. Ovarian cancer (C56): 2014. Proportion of cancers diagnosed at each stage, all ages, England. London: Cancer Research UK, www.cancerresearchuk.org/sites/default/files/cstream-node/inc_stage_ovarian_0.pdf (accessed 22 September 2016).

11.

Wang

Gao

Yao

et al . Diagnostic accuracy of serum HE4, CA125 and ROMA in patients with ovarian cancer: a meta-analysis. Tumour Biol 2014; 35: 6127–6138.

12.

National Collaborating Centre for Cancer. Ovarian cancer: the recognition and initial management of ovarian cancer. Manchester: NICE, www.nice.org.uk/guidance/cg122/evidence/full-guideline-181688797 (2011, accessed 22 September 2016).

13.

National Institute for Health and Care Excellence. Tests in secondary care to identify people at high risk of ovarian cancer: diagnostics guidance 31. Manchester: NICE, nice.org.uk/guidance/dg31 (2017, accessed 23 January 2018).

14.

Moore

McMeekin

Brown

et al . A novel multiple marker bioassay utilizing HE4 and CA125 for the prediction of ovarian cancer in patients with a pelvic mass. Gynecol Oncol 2009; 112: 40–46.

15.

Westwood

Ramaekers

Lang

et al.

Tests in secondary care to identify people at high risk of ovarian cancer: a systematic review and cost effectiveness analysis. A diagnostic assessment report commissioned by the NIHR HTA programme on behalf of the National Institute for Health and Care Excellence [in press]. York: Kleijnen Systematic Reviews, 2017, 318 p.

16.

Centre for Reviews and Dissemination. Systematic Reviews: CRD’s guidance for undertaking reviews in health care. York: University of York, www.york.ac.uk/inst/crd/SysRev/!SSL!/WebHelp/SysRev3.htm (2009, accessed 23 March 2011).

17.

National Institute for Health and Care Excellence. Diagnostics assessment programme manual. Manchester: NICE, www.nice.org.uk/Media/Default/About/what-we-do/NICE-guidance/NICE-diagnostics-guidance/Diagnostics-assessment-programme-manual.pdf (2011, accessed 4 October 2016).

18.

Whiting

Westwood

Beynon

et al . Inclusion of methodological filters in searches for diagnostic test accuracy studies misses relevant studies. J Clin Epidemiol 2011; 64: 602–607.

19.

Whiting

Rutjes

AWS

Westwood

et al . QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011; 155: 529–536.

20.

Riley

Abrams

Sutton

et al . Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007; 7: 3.

21.

Reitsma

Glas

Rutjes

AWS

et al . Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005; 58: 982–990.

22.

Harbord

Whiting

Sterne

et al . An empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary. J Clin Epidemiol 2008; 61: 1095–1103.

23.

Harbord

Deeks

Egger

et al . A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 2007; 8: 239–251.

24.

Zamora

Abraira

Nuriel

et al . Meta-DiSc: a software for meta-analysis of test accuracy data. BMC Med Res Methodol 2006; 6.

25.

Chan

Chen

Nam

et al . The use of HE4 in the prediction of ovarian cancer in Asian women with a pelvic mass. Gynecol Oncol 2013; 128: 239–244.

26.

Clemente

Benitez

A retrospective cohort study to validate CA125 and the combination of CA125 and HE4 using the ROMA in assessing the risk for ovarian malignancy in women diagnosed with an adnexal mass in Makati Medical Centre. BJOG 2015; 122: 133–134.

27.

Wan

Cai

et al . Value of serum human epididymis secretory protein 4 as a marker for differential diagnosis of malignant and benign gynecological diseases of patients in southern China. Clin Chim Acta 2016; 459: 170–176.

28.

Moore

Miller

Disilvestro

et al . Evaluation of the diagnostic accuracy of the risk of ovarian malignancy algorithm in women with a pelvic mass. Obstet Gynecol 2011; 118: 280–288.

29.

Novotny

Presl

Kucera

et al . HE4 and ROMA index in Czech postmenopausal women. Anticancer Res 2012; 32: 4137–4140.

30.

Presl

Kucera

Topolcan

et al . HE4 a biomarker of ovarian cancer. Ceska Gynekol 2012; 77: 445–449.

31.

Karlsen

Sandhu

Hogdall

et al . Evaluation of HE4, CA125, risk of ovarian malignancy algorithm (ROMA) and risk of malignancy index (RMI) as diagnostic tools of epithelial ovarian cancer in patients with a pelvic mass. Gynecol Oncol 2012; 127: 379–383.

32.

Al Musalhi

Al Kindi

Al Aisary

et al . Evaluation of HE4, CA-125, Risk of Ovarian Malignancy Algorithm (ROMA) and Risk of Malignancy Index (RMI) in the preoperative assessment of patients with adnexal mass. Oman Med J 2016; 31: 336–344.

33.

Winarto

Laihad

Nuranna

Modification of cutoff values for HE4, CA125, the risk of malignancy index, and the risk of malignancy algorithm for ovarian cancer detection in Jakarta, Indonesia. Asian Pac J Cancer Prev 2014; 15: 1949–1953.

34.

Janas

Glowacka

Wilczynski

et al . Evaluation of applicability of HE4 and ROMA in the preoperative diagnosis of adnexal masses. Ginekol Pol 2015; 86: 193–197.

35.

Zhong

et al . Modification of cut-off values for HE4, CA125 and the ROMA algorithm for early-stage epithelial ovarian cancer detection: results from 1021 cases in South China. Clin Biochem 2016; 49: 32–40.

36.

Zhang

Wang

Cheng

et al . Comparison of HE4, CA125, and ROMA diagnostic accuracy: a prospective and multicenter study for Chinese women with epithelial ovarian cancer. Medicine 2015; 94: e2402.

37.

Yanaranop

Anakrat

Siricharoenthai

et al . Is the risk of ovarian malignancy algorithm better than other tests for predicting ovarian malignancy in women with pelvic masses? Gynecol Obstet Invest 2017; 82: 47–53.

38.

Shulman

Smith

Pappas

et al . Clinical performance comparison of two ivdmias for pre-surgical assessment of ovarian cancer risk. In: Annual Meeting of American College of Obstetricians and Gynecologists, Washington DC, 14–17 May 2016.

39.

Jacobs

Oram

Fairbanks

et al . A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer. BJOG 1990; 97: 922–929.

40.

Moore

Miller

DiSilvestro

et al.

Evaluation of the risk of ovarian malignancy algorithm in women with a pelvic mass presenting to general gynecologists.

Gynecol Oncol 2011; 120:S68.

41.

Moore

Miller

MacLaughlan

et al . The use of the Risk of Ovarian Malignancy Algorithm (ROMA) with clinical assessment improves ovarian cancer detection in women with a pelvic mass. Gynecol Oncol 2012; 125: S38.

42.

Chen

Zhou

Chen

et al . Development of a multimarker assay for differential diagnosis of benign and malignant pelvic masses. Clin Chim Acta 2015; 440: 57–63.

43.

McGranahan

Swanton

Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell 2015; 27: 15–26.

44.

Dayyani

Uhlig

Colson

et al . Diagnostic performance of Risk of Ovarian Malignancy Algorithm against CA125 and HE4 in connection with ovarian cancer: a meta-analysis. Int J Gynecol Cancer 2016; 26: 1586–1593.

45.

Tie

Chang

et al . Does risk for ovarian malignancy algorithm excel human epididymis protein 4 and CA125 in predicting epithelial ovarian cancer: a meta-analysis. BMC Cancer 2012; 12: 258.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB

Clinically inappropriate post hoc exclusion of study participants from test accuracy calculations: the ROMA score,an example from a recent NICE diagnostic assessment

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Methods

Data sources

Inclusion criteria

Data extraction

Quality assessment

Analysis

Results

Overview of included studies

Methodological quality of studies assessing the ROMA score

Accuracy of the ROMA score using Abbott ARCHITECT tumour marker assays

Accuracy of the ROMA score using Roche tumour marker assays

Discussion

Conclusions

Supplemental Material

Supplemental material for Clinically inappropriate post hoc exclusion of study participants from test accuracy calculations: the ROMA score, an example from a recent NICE diagnostic assessment

Footnotes

Declaration of conflicting interests

Funding

Ethical approval

Guarantor

Contributorship

Supplementary material

References

Supplementary Material