Abstract
Ovarian cancer is often fatal and incidence in the general population is low, underscoring the necessity (and the challenges) for advancements in screening and early detection. The goal of this study was to design a serum-based biomarker panel and corresponding multivariate algorithm that can be used to accurately detect ovarian cancer. A combinatorial protein biomarker assay (CPBA) that uses CA125, HE4, and 3 tumor-associated autoantibodies resulted in an area under the curve of 0.98. The CPBA Ov algorithm was trained using subjects who were suspected to have gynecological cancer and were scheduled for surgery. As a surgical rule-out test, the clinical performance achieves 100% sensitivity and 83.7% specificity. Although sample size (n = 60) is a limiting factor, the CPBA Ov algorithm performed better than either CA-125 alone or the Risk of Ovarian Malignancy Algorithm.
Keywords
Background
Although the incidence of ovarian cancer (OvCa) is relatively rare in the general population, it is often lethal because most women (60%) are diagnosed with advanced-stage (III or IV) disease, wherein 5-year survival is only 17% to 39%. 1 If diagnosed early (stage I or II), survival is 70% to 92%. 1 These numbers underscore the imminent need for improvement in OvCa diagnostics.
The field of OvCa screening faces a unique paradoxical hurdle. Low disease prevalence means general population screening trials have difficulty achieving the required high specificity (low false-positive rate), which necessitates the use of an enriched population (such as women with adnexal pelvic masses).2,3 However, serous adenocarcinoma, which accounts for up to 70% of epithelial OvCas, often originates in the fallopian tubes and presents as advanced disease without a clear “pelvic mass” as a diagnostic point.4,5 When OvCa is suspected, a woman undergoes invasive surgery that often results in sterilization. Therefore, a noninvasive OvCa diagnostic must be highly accurate if the goal is to prevent unnecessary invasive surgeries while not overlooking true invasive cancers.
In women with a pelvic mass, transvaginal ultrasound and/or serum CA-125 is often used to determine the likelihood of malignancy. CA-125 is elevated in 85% of advanced epithelial OvCa but is normal in up to 50% of early-stage cancers. 6 CA-125 lacks clinical specificity (resulting in a high false-positive rate), which limits its utility as a diagnostic.7,8 More recently, the combination of CA-125 and HE4 using the Risk of Ovarian Malignancy Algorithm (ROMA) in women with pelvic masses planned for surgery imparted better performance characteristics compared with CA-125 alone.9,10 In addition, OVA1, a panel of 5 OvCa serum biomarkers was shown to improve clinical sensitivity beyond ROMA but clinical specificity was lower than ROMA. 11
Outside of serum proteins, tumor-associated autoantibodies (TAAb) have shown promise in detecting OvCa.12,13 However, the biggest drawback to using TAAb in a diagnostic setting is the low prevalence for any single TAAb; not all subjects with cancer will produce an autoantibody response. Even among patients with a p53 mutation, most will not produce detectable levels of circulating p53 TAAb.14,15 Despite this caveat, TAAb may still impart utility as serum biomarkers due to their high specificity. Although p53 autoantibodies are rare within a given population, it is almost guaranteed that a patient with p53 TAAb does have cancer (although it may not be OvCa, specifically). 14 Because of redundancy (meaning that an individual TAAb can be found in multiple cancer types) and low sensitivity, TAAb should be combined into future diagnostic panels, along with other serum biomarkers, to impart optimal clinical performance characteristics.16,17
In the effort to diagnose OvCa at the earliest possible stage, additional advances are necessary to create a serum biomarker diagnostic with high sensitivity and high specificity. Serum protein biomarker (SPB) panels most often result in assays with high sensitivity, but low specificity. Conversely, serum autoantibody panels most often result in assays with low sensitivity, but high specificity. We have demonstrated previously that the combination of SPB and TAAb results in high sensitivity and specificity in the detection of breast cancer.18,19 In this study, we sought to determine whether SPB could be combined with TAAb to create a novel algorithm that can accurately detect OvCa.
Methods
Serum samples
Plasma and serum samples used for assay and biomarker development were purchased from multiple biorepositories (Indivumed, Asterand, Oregon Health & Science University, and The University of Arizona). A total of n = 122 samples were used to develop and refine the SPB and TAAb assays (Table 1). Sample numbers were skewed, with an abundance of OvCa samples, due to the expectation of low TAAb prevalence. The data obtained from these analyses were used in feature selection during algorithm and model development (see “Model development and statistical analysis” section). Samples used for biomarker development were independent of those used for algorithm development.
Subjects used for biomarker analytical development and algorithm training.
Abbreviation: NA, not applicable.
Median age, along with minimum and maximum age, for each group is shown in parentheses. Subjects with no evidence of gynecological disease were not included in algorithm training.
The serum specimens used to design a training algorithm were collected from women who presented for pelvic surgery at the Catholic Health Initiatives Center for Translational Research (CHI-CTR) and whose physician has indicated a clinical suspicion of gynecological cancer. A total of 196 serum specimens were collected from CHI-CTR for this study. Of these, 17 were collected presurgery from subjects diagnosed with OvCa and 43 were collected presurgery from subjects diagnosed with a benign gynecological condition (eg, ovarian cysts, endometriosis) (Supplementary Table 1). Samples categorized by the site as fallopian tube cancer were recategorized as OvCa due to evidence linking fallopian tubes to OvCa origins.4,5 The remaining 136 samples were excluded from training due to postsurgical collection, prior gynecological cancer diagnosis (ie, recurrence monitoring), borderline tumor status (as defined by pathology), and/or cancer origin other than ovarian or fallopian tube. All specimens were de-identified by the collection site, thus personal information was not identifiable to the investigator nor any other individual associated with this investigation. The study design was granted institutional review board (IRB) exemption under 45 CFR 46.101(b)(4).
Measurement of SPB and TAAb
Protein and hormone biomarkers (Supplementary Table 2) were chosen for analysis based on published literature.20–23 Serum was evaluated for the concentrations of 9 SPB using Abbott Architect i1000SR immunoassays, following manufacturer’s specifications. The Architect assays use chemiluminescent microparticles to determine analyte concentrations. Calibrator and control samples were run for each assay as recommended by the manufacturer. Samples resulting in an upper limit of quantification error flag (analyte concentration above the reportable range) were diluted appropriately and rerun to obtain a valid measurement.
Autoantibody biomarkers were chosen for analysis based on published literature.12,18,24–26 Samples were processed in duplicate and evaluated for the relative presence/absence of 47 TAAb (protein targets listed in Supplementary Table 2), as described previously. 18 Associated autoantibodies were detected using an indirect enzyme-linked immunosorbent assay (ELISA), which involves coating standard-bind plates (MSD, Rockville, MD, USA) with recombinant protein. Proteins were diluted in 1× phosphate-buffered saline and coated onto blank plates at a final concentration of 20 ng/well. All recombinant proteins, certified as >80% pure (sodium dodecyl sulfate polyacrylamide gel electrophoresis), were purchased from OriGene (Rockville, MD, USA) or Abnova (Taipei City, Taiwan). OriGene proteins were myc/DDK peptide tagged and produced in HEK-293 cells. Abnova proteins were GST tagged and produced in wheat germ cells. Appropriate controls (TAAb-negative serum spiked with anti-myc/DDK or anti-GST) were included on each plate to monitor assay performance. Electrochemiluminescent signal was detected using a Meso Sector S600 plate reader and MSD Workbench 4.0 software. The TAAb ratio values were determined using the following calculation:
where Target MFI = mean fluorescence intensity (MFI) of sample plus target and True Target MFI = MFI of corresponding target protein without sample (protein background).
Sample run order was randomized and laboratory staff was blinded to subject disease status.
ROMA calculation
The ROMA was calculated for samples included in the training group as described previously, 27 with a cutoff of 12.5% used for premenopausal subjects and a cutoff of 14.4% used for postmenopausal subjects. Menopause data were not collected for CHI-CTR samples; therefore, the follicle-stimulating hormone (FSH) value was used to approximate menopause status (with >30 mIU/mL corresponding to postmenopause). When a valid FSH measurement was not available (n = 2), ROMA was calculated for both pre- and postmenopause. For these samples, the outcome (high or low) was the same when using both calculations.
Model development and statistical analysis
Feature selection was used to select biomarkers that are either biologically relevant to OvCa or statistically significant in the biomarker development and/or training sample set. Multiple approaches were used, including literature review, logistic modeling (all biomarkers were individually used as a predictor in a logistic model with cancer as a response; biomarkers that were significant at the .1 level were recorded), and univariate 2-sample t tests for association (biomarkers that were significant at the .1 level were recorded). In addition, 2 different bootstrap methods were used for feature selection. Elastic net (ELNET) and generalized boosted models (GBMs) were applied to 200 bootstrap samples; biomarkers that were selected at least 50% of samples for ELNET or 60% of samples for GBM were recorded. Three models (SPB alone, TAAb alone, and SPB + TAAb) were originally built using a logistic boost approach that used a Synthetic Minority Over-sampling Technique (SMOTE) to increase the number of cancer cases.
Receiver operator characteristic (ROC) and area under the curve (AUC) metrics were used to determine algorithm performance regarding sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Model cut points were optimized for sensitivity (rule-out malignancy) and specificity (rule-in malignancy). Confidence intervals (CI) were reported as 2-sided binomial 95% CIs. Logit boost models were created using R (version 3.0.3, March 6, 2014). All analyses were conducted using SAS (version 9.4) and GraphPad Prism (version 6.03).
Results
Biomarker assay development: univariate analysis of serum protein biomarkers
Nine OvCa-specific SPB were analyzed using Abbott Architect assays, as described in the “Methods” section. Individual patient health characteristics (including but not limited to age and menopausal status) and clinical outcomes were extracted from each de-identified patient record where available. Univariate analyses were completed to determine whether individual SPB differed between women diagnosed with OvCa and women with no evidence of ovarian disease (Table 1, biomarker development population). These patient populations were chosen for assay development to assess the limits of detection based on expected population concentration ranges. Samples that did not have adequate volume for SPB analysis were excluded (n = 43). A total of 5/9 SPB (CA-125, CA 15.3, CA19-9, HE4, and Prolactin) were found to be differentially expressed at statistically significant levels (P < .05; Figure 1) between the 2 groups.

Scatter plot distributions of SPB and LH/FSH in subjects with ovarian cancer (OvCa, n = 37) and no evidence of ovarian disease (ND, n = 22).
Biomarker assay development: univariate analysis of tumor-associated autoantibodies
A total of 47 TAAb were analyzed by MSD-indirect ELISA, as described in the “Methods” section. The TAAb ratios indicate the relative presence or absence of target-specific autoantibodies. Based on published literature,12,13 prevalence was expected to be low for each individual TAAb, even in OvCa samples. Because of the expected low prevalence and because prevalence in the healthy/non-disease (ND) population is expected to be at or close to zero, OvCa samples were compared with benign gynecological disease samples (Table 1, biomarker development population). Samples that did not have adequate volume for TAAb analysis were excluded (n = 3). Although individual TAAb were not significantly different between the 2 groups, differences in overall prevalence were noted (Figure 2) and some trends may achieve significance in a larger sample set. Full TAAb comparison data are shown in Supplementary Figure 1.

Representative distributions of select ovarian cancer-specific TAAb in subjects with ovarian cancer (OvCa, n = 77) and benign gynecological disease (BGD, n = 20). Analyte mean and standard deviations are shown for each population. Log-2 or Log-10 scales are used where appropriate to better illustrate TAAb ratio distributions. BGD indicates benign gynecological disease; OvCa, ovarian cancer; TAAb, tumor-associated autoantibodies.
A blood-based multimarker panel detects OvCa
A multivariate algorithm using prospectively collected (presurgery) samples from CHI-CTR was developed to differentiate subjects with benign gynecological conditions from those with OvCa (as described in the “Methods” section). A total of 60 samples were selected for model development; all samples were drawn from subjects prior to undergoing surgery. Feature selection and logistic modeling were conducted as described in the “Methods” section; the final model (combinatorial protein biomarker assay [CPBA] Ov) includes 2 SPB (CA-125 and HE4) and 3 TAAb (ACSBG1, CTAG1B, and DHFR). Clinical performance was evaluated for other biomarker models (CA-125 alone, HE4 alone, and ROMA) using the same sample data. The CPBA Ov ROC is shown in Figure 3, with the CA-125, HE4, and ROMA ROC curves shown for comparison. Although CA-125 and ROMA each performed well individually, the AUC was highest for the CPBA Ov algorithm (0.98).

Receiver operator characteristic (ROC) curve of ovarian cancer algorithm, developed on n = 60 serum samples. Curves are also shown for the same cohort using (A) CA-125 alone and HE4 alone as well as (B) ROMA premenopausal and ROMA postmenopausal. Area under the curve (AUC) is shown for each test in parentheses. ROMA indicates Risk of Ovarian Malignancy Algorithm.
Two model cut points were chosen to represent a surgery rule-in model (optimized for specificity) and a surgery rule-out model (optimized for sensitivity). The training set clinical performance metrics are shown in Table 2 with CA-125 and ROMA metrics shown for comparison. The rule-in model resulted in high specificity (97.7%), which is necessary to appropriately rule-in subjects for surgery while maintaining a low number of false positives. However, as the population evaluated in this study consisted of women identified for surgery, a rule-out model would more closely match an intended-use population wherein surgery may be ruled out.
CPBA Ov algorithm results with clinical performance metrics.
Abbreviations: CPBA, combinatorial protein biomarker assay; FN, false negative; FP, false positive; NPV, negative predictive value; OvCa, ovarian cancer; PPV, positive predictive value; ROMA, Risk of Ovarian Malignancy Algorithm; TN, true negative; TP, true positive.
Assay cut points were optimized to enhance sensitivity in a rule-out model and specificity in a rule-in model. Clinical performance is also shown for CA-125 alone and ROMA (pre- and postmenopausal). Confidence intervals are shown for all calculations.
A total of n = 2 subjects could not be assessed for ROMA due to lack of a valid FSH value to determine menopause status.
Although ROMA (combined results for pre- and postmenopausal subjects) missed only 1/17 cancer subjects, all were correctly identified with the CPBA Ov algorithm (rule-out model) (Table 2). High clinical sensitivity is essential for a physician to recommend against invasive surgery when OvCa is suspected by standard clinical follow-up. For the subjects within this study, 60 underwent surgery, 43 of which were benign conditions (a 71.7% false-positive rate). In contrast, inclusion of the CPBA Ov would have resulted in a false-positive rate of only 11.7%. ROMA and CA-125 are not intended to rule-out surgery and further studies are necessary to determine whether ROMA can be integrated successfully into CPBA Ov to create a screening test that can be optimized for both rule-in and rule-out applications.
To compare CPBA Ov algorithm performance with samples that were collected postsurgery, 36 postsurgical OvCa serum samples were evaluated for clinical performance. This cohort comprised 4 CHI-CTR samples and 32 biomarker development samples. Subjects whose blood was collected postsurgery did not perform as well as subjects whose blood was collected presurgery, with the CPBA Ov algorithm correctly predicting 31/36 postsurgery OvCa subjects (sensitivity, 86.1%; NPV, 87.5%) (Supplementary Table 3). This is not unexpected because OvCa surgery frequently involves tumor removal/debulking, which might result in changes in circulating tumor biomarkers.28,29 These results underscore the fundamental requirement for presurgical samples when evaluating OvCa diagnostic biomarkers. Clinical performance was also poor for subjects with recurrent OvCa (Supplementary Table 3). As such, additional studies would be necessary to develop an algorithm that can adequately detect OvCa recurrence.
Discussion
The CPBA Ov algorithm was developed using 60 prospective samples from women scheduled to undergo surgery due to the suspicion of gynecological cancer. Although sample size is limited, the final algorithm is highly accurate, with 100% sensitivity and 83.7% specificity (Table 2). The negative predictive value of the CPBA Ov algorithm (100%) was greater than either CA-125 alone (91.7% NPV) or ROMA (97.2% NPV). This implies improved clinical utility, as high NPV is necessary to ensure subjects are correctly being ruled out for surgery. False negatives are most worrisome in a rule-out test. ROMA performance (which is intended to determine risk of malignancy, not surgery rule out) was high, with only 1/17 subjects being a false negative. The CPBA Ov algorithm, however, detected all OvCa cases (0/17 false negatives) while also improving specificity above the comprehensive ROMA (83.7% vs 81.4%, respectively).
Although ROMA has generally shown good performance, clinical utility is somewhat limited and at least one study reported no improvement over CA-125 alone. 30 Moore et al 9 had reported higher ROMA specificity in premenopausal as opposed to postmenopausal women, which we found to be true of our sample cohort as well (premenopausal specificity, 91.3% vs postmenopausal specificity, 72.2%). In contrast, the CPBA Ov algorithm resulted in higher specificity in postmenopausal as opposed to premenopausal subjects (94.4% vs 73.9%, respectively). Because most OvCa subjects are postmenopausal, performance metrics are particularly important in this population. These data suggest that the inclusion of additional biomarkers into ROMA (or combination with an independent rule-out model) might result in higher overall specificity and, thus, improved clinical utility. The CPBA Ov algorithm already includes CA-125 and HE4 (the 2 biomarkers included in ROMA) so integrating the 2 models without overfitting is difficult. Sample size is a limiting factor in these analyses, assessment of additional subjects will be necessary to confirm these conclusions.
In considering further development of a blood test to detect OvCa, we must acknowledge the difficulty faced in developing such a test and the best population of women on which to evaluate its safety and efficacy. Many women with serous OvCa present with advanced disease with peritoneal carcinomatosis and without a dominant pelvic mass.4,5 In addition, prior large screening trials (PLCO and UKCTOCS) using serial CA-125 in a general population of women have yet to find efficacy in improving survival in OvCa.31,32 However, some promise has been shown recently in women at high risk for developing OvCa because of genetic factors and or family history, using ROMA in a subset evaluation of the UKCTOCS trial.33,34
In our study, CPBA Ov performance was not as strong when applied to samples collected postsurgery (Supplementary Table 3). Tumor debulking often results in a sizable change in circulating biomarkers28,29 so these results are not unexpected. However, they do underscore the necessity for analyzing presurgery samples when developing a liquid biopsy test for OvCa detection. This is inherently difficult due to low prevalence—the middle ground in addressing this issue has been to use enriched samples (such as high-risk subjects or women with pelvic masses on imaging). Some research studies choose to use postsurgical samples but the resulting models will likely suffer in terms of clinical performance and may end up being trained more on noise than on signal. Regardless, such models will have to demonstrate diagnostic utility in a presurgery population and such samples are rare in collection banks. Given the low disease prevalence and high barrier-to-entry for OvCa liquid biopsy tests, sample banks may be better served to collect samples with these considerations in mind.
For these reasons, the greatest limitation in developing new OvCa diagnostics is sample size and availability. Although many were not statistically significant, the trends noted in biomarker distributions may achieve statistical significance in a larger sample set. Also, it will be necessary to test the CPBA Ov algorithm in an independent cohort to assess whether clinical validation performance is consistent with the results obtained from the training cohort.
Conclusions
The CPBA Ov algorithm is a novel liquid biopsy test that accurately detects OvCa in a presurgical population. With 100% sensitivity and 83.7% specificity, the test would provide assurance that a subject may avoid invasive surgery without the concern that invasive OvCa would be missed (false negative). Although these results are impressive, the study size is small (n = 60). It will be necessary to analyze additional samples obtained from an independent, presurgery population before true clinical performance can be reported. Additional studies are being conducted to establish the clinical validity of CPBA Ov in an independent sample cohort.
Footnotes
Appendix
CPBA Ov clinical performance metrics for subjects where sample was collected postsurgery or from subjects with recurrent OvCa.
| OvCa- Post Surgery | OvCa- Recurrence | |||
|---|---|---|---|---|
| Rule-In | Rule-Out | Rule-In | Rule-Out | |
| TN | 41 | 35 | 41 | 35 |
| FP | 1 | 7 | 1 | 7 |
| TP | 27 | 31 | 7 | 10 |
| FN | 9 | 5 | 6 | 3 |
| Sensitivity | 75.0% (57.4 – 87.2%) |
86.1% (69.7 – 94.8%) |
53.8% (26.1 – 79.6%) |
76.9% (46.0 – 93.8%) |
| Specificity | 97.6% (85.9 – 99.9%) |
83.3% (68.0 – 92.5%) |
97.6% (85.9 – 99.9%) |
83.3% (68.0 – 92.5%) |
| NPV | 82.0% (68.1 – 91.0%) |
87.5% (72.4 – 95.3%) |
87.2% (73.6 – 94.7%) |
92.1% (77.5 – 97.9%) |
| PPV | 96.4% (79.8 – 99.8%) |
81.6% (65.1 – 91.7%) |
87.5% (46.7 – 99.3%) |
58.8% (33.5 – 80.6%) |
Description: A total of 36 OvCa subjects had samples drawn postsurgery and a total of 13 subjects were diagnosed with recurrent OvCa. Benign training samples (n=42) included due to a lack of post-biopsy benign samples. For sensitivity, specificity, negative predictive value (NPV), and positive predictive values (PPV), 95% confidence intervals (CIs) are shown in parentheses.
Acknowledgements
The authors wish to thank the research staff members at CHI-CTR for helping conduct the study. The authors would also like to thank all participants for their valuable contribution to this work, BioStat Solutions, Inc. (BSSI), for their assistance in building the CPBA Ov algorithm, and Tn Consulting, LLC, for their assistance in preparing the manuscript.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded entirely by Provista Diagnostics Inc.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
MCH, MS, and JKW conceived and designed the experiments. MCH, MS, and EL analyzed the data. MCH wrote the first draft of the manuscript. MCH, SB, QT, RM, JKW, and DER contributed to the writing of the manuscript. MCH, RM, and DER jointly developed the structure and arguments for the paper. MCH, JKW, and DER made critical revisions and approved final version. All authors agree with manuscript results and conclusions, reviewed, and approved the final manuscript.
Disclosures and Ethics
Institutional review board exemption was granted under 45 CFR 46.101(b)(4) by Western IRB. As a requirement of publication, authors have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality, and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.
