Abstract
Daemen and colleagues reported on the second Phase of the International Ovarian Tumor Analysis (IOTA Phase 2 study). The authors’ primary objective was to determine the appropriate use of a second stage diagnostic test when a primary test yielded an uncertain classification of an adnexal mass as either benign or malignant [1]. The benefits of an accurate preoperative classification of an adnexal mass include:
The opportunity for women with a high risk of ovarian cancer to be referred to a gynecologic oncologist for appropriate surgical staging and optimal cytoreductive surgery;
The avoidance of surgery by women with ovarian tumors at minimal risk for neoplasia;
Reduction in psychological stress for patients and their families by providing a more certain diagnosis prior to surgery.
In 1994, the NIH released a consensus statement recommending that all women suspected of having ovarian cancer be given the option of having their surgery performed by a gynecologic oncologist [2]. Geide and colleagues performed a meta-analysis of 18 studies and concluded that women with early-stage ovarian cancer were more likely to have comprehensive surgical staging when operated on by a gynecologic oncologist. Also, women with advanced-stage ovarian cancer were more likely to have optimal cytoreductive surgery and improved median and overall 5-year survivals when treated by a gynecologic oncologist [3]. In patients with early-stage disease, accurate surgical staging provided useful prognostic information and helped determine which patients would best benefit from platinum-based chemotherapy. Numerous studies have confirmed that maximal cytoreductive surgery is the most important treatment-related prognostic factor in patients with advanced-stage ovarian cancer [4–6].
Modalities for the preoperative assessment of an adnexal mass include history, physical examination, imaging studies and serum biomarker testing [7–10]. No one modality alone has proven adequate for accurate preoperative classification of an adnexal mass. This has resulted in the development of mathematic models combining multiple variables to improve the accuracy of preoperative assessment of ovarian tumors. Sonography is generally considered to be more valuable than other imaging techniques in evaluating adnexal masses owing to its ability to accurately characterize the structural components of ovarian neoplasms. Also, Color/Power Doppler can be used to evaluate blood flow to these structures. In a recent publication, McDonald and colleagues reported results from 395 patients who had been referred with an adnexal mass [10]. Patients were evaluated with transvaginal sonography and serum CA-125 testing, prior to surgery. The findings of this study indicated that the combination of tumor morphology generated from transvaginal sonography and serum CA-125 could predict the risk of malignancy in ovarian tumors prior to surgery. Specifically, postmenopausal women with ovarian tumors having solid or complex morphology on ultrasound and an elevated serum CA-125 (>35 U/ml) were at high risk for ovarian malignancy. This definition had a positive predictive value of 84.7%, a negative predictive value of 92.4% and correctly identified 77.2% of patients with stage I and II ovarian cancer and 98.9% of patients with stage III and IV ovarian cancer.
The value of accurate preoperative classification of adnexal masses may be even more important as an increasing number of patients with a specific symptom profile for ovarian cancer, are referred for evaluation. Symptoms including bloating, pelvic or abdominal pain, difficulty eating or early satiety and urinary urgency or frequency, have been shown to be associated with ovarian cancer, particularly when they occur more than 12-times per month and are of sudden onset [11,12]. As expected, a large majority of patients referred with these symptoms will not have ovarian cancer; however, those who do will need appropriate referral to cancer specialists.
Methods & results
The IOTA Phase 2 study included 1938 patients with an adnexal mass evaluated at 19 centers in eight European countries. All patients underwent transvaginal gray scale and Color/Power Doppler ultrasound. Data collection was carried out prospectively and included more than 40 clinical and ultrasound variables, as well as the ultrasound examiner's subjective classification of the mass as either benign or malignant. In addition, the ultrasonographers were asked how certain they were of their classification (certainly benign, probably benign, uncertain, probably malignant or certainly malignant). The information was then used retrospectively to calculate the risk of malignancy using 11 models developed in the IOTA Phase 1 study. The performance of the sonographer's subjective classification and the risk of malignancy using the different models were evaluated based on the histologic diagnosis after surgery. Subjective evaluation had a larger area under the reciever operating curve (0.953) and a higher specificity (92.7%) than any other method studied. Of the mathematical models evaluated, LR1 [13] had the highest sensitivity (92.4%). In the 115 cases (6%) where the ultrasound examiner was uncertain about the diagnosis, the sensitivity of subjective evaluation was 81.1% and the specificity was 47.4%. Therefore, a hypothetical second stage test would need to have a sensitivity and specificity above these values to improve the diagnostic performance of subjective evaluation. When the mathematical models were tested in those 115 tumors where the ultrasound examiner was uncertain, two tests had a sensitivity of more than 81% (LR1 and LR2 [13]) and two tests with specificity more than 47% (LS-SVM-1 [14] and BMLP-1 [15]). As the initial test, the logistic regression model, LR1, performed the best of all the models evaluated. Nevertheless, there were 194 tumors (10%) in which this model had a lower than optimal ability to discriminate benign from malignant ovarian tumors (area under the reciever operating curve: 0.59; specificity: 21%). The other 10 statistical models were evaluated as a second stage test in these cases and there was no evidence of an improvement in diagnostic accuracy. When subjective evaluation was used as the second stage test in these tumors, the sensitivity was improved from 90 to 91.3% and specificity increased from 21 to 90.3%.
Significance
As the authors indicate, the strength of this study is that it evaluated a large number of patients (~2000 cases) and that it was carried out at centers in different countries where the equipment used varied. The majority of the centers had not participated in the prior IOTA Phase 1 study, indicating that there may be multiple centers in each country that could perform these tests. This is important since the models in this study incorporated many variables in a mathematical formula to construct a risk of malignancy score. Despite the use of 11 different mathematical formulas, subjective evaluation was still the most accurate method of defining risk of malignancy in an adnexal mass. Furthermore, the addition of a mathematical model as a second stage test did not improve diagnostic accuracy in those patients where preoperative classification was subjectively uncertain.
This study confirms the limitations of mathematical models in improving the accuracy of preoperative diagnosis in those ovarian tumors in which subjective evaluation by an experienced ultrasonographer is uncertain. The authors acknowledge the potential value of novel biomarkers in this setting and express a need for an inexpensive test that is widely available, so that diagnosis is not delayed.
Future perspective
The preoperative classification of adnexal masses as benign or malignant often involves the use of a multimodality approach, including patient demographics, transvaginal sonography and biomarker testing. An ideal algorithm should be simple, cost effective, time efficient and offer limited inconvenience to patients. In the present investigation, Daemen and colleagues report that both the subjective interpretation of sonographic findings by an experienced ultrasonographer and a logistic regression model based on sonographically generated data, patient history and demographics (LR1) are accurate in discriminating benign from malignant ovarian tumors in more than 90% of cases. However, an additional second stage test is required in those cases where subjective evaluation of sonographic findings is uncertain. Such a test may involve a simple biomarker, such as CA-125, or a panel of biomarkers [10,16].
OVA1 (Quest Diagnostics, Inc., Madison, NJ, USA) is biomarker panel approved by the US FDA in 2009 to facilitate clinical decision making in women with a pelvic mass who are undergoing surgery. The panel consists of five immunoassays (CA-125 II, transthyretin [prealbumin], apolipoprotein A1, β2-microglobulin and transferrin), which are used to generate an OVA1 score. OVA1 was tested in 516 women who had been evaluated preoperatively by a physician, either a nongynecologic oncologist or a gynecologic oncologist and scheduled for surgery. The test had a sensitivity of 92.5%, a specificity of 42.8%, a positive predictive value of 42.3% and a negative predictive value of 92.7%, and significantly improved the clinician's presurgical assessment of risk for malignancy [16]. While OVA1 may be valuable in the triage setting, the use of a five-test biomarker panel is not inexpensive and further studies are needed to determine if OVA1 would be beneficial in cases where interpretation of ovarian ultrasound findings is uncertain.
Executive summary
Preoperative discrimination of benign from malignant ovarian tumors allows proper referral of patients at high risk of malignancy.
In a large European study evaluating 1938 patients with an adnexal mass from 19 centers in eight countries, subjective evaluation by an experienced ultrasonographer was the most accurate method of defining risk of malignancy.
Subjective evaluation of sonographic findings by an experienced ultrasonographer accurately predicted malignancy in over 90% of cases.
Mathematical models did not improve diagnostic accuracy in cases where subjective evaluation by an experienced sonographer was uncertain.
Footnotes
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
