Abstract
Postmenopausal bleeding is associated with an elevated risk of having endometrial cancer. The aim of this review is to give an overview of existing prediction models on endometrial cancer in women with postmenopausal bleeding. In a systematic search of the literature, we identified nine prognostic studies, of which we assessed the quality, the different phases of development and their performance. From these data, we identified the most important predictor variables. None of the detected models completed external validation or impact analysis. Models including power Doppler showed best performance in internal validation, but Doppler in general gynecological practice is not easily accessible. We can conclude that we have indications that the first step in the approach of women with postmenopausal bleeding should be to distinguish between women with low risk versus high risk of having endometrial carcinoma and the next step would be to refer patients for further (invasive) testing.
Keywords
Endometrial carcinoma is the most common gynecologic malignancy. Approximately 95% of women with endometrial carcinoma present with postmenopausal bleeding (PMB) [1,2]. PMB signals endometrial carcinoma, which is present in approximately 10% of cases [3,4], or less serious conditions, such as benign endometrial polyps or endometrial atrophy [3,5–7].
To reduce invasive procedures in women with PMB, measurement of the endometrial thickness is used to stratify women into low versus high risk of having endometrial carcinoma. Measurement of endometrial thickness has shown to be accurate in excluding endometrial cancer, although the risk of endometrial carcinoma with a negative test is still 0.7–3.5% depending on the cut-off point used [8,9].
In women with PMB there is considerable variability in endometrial thickness and the likelihood of endometrial carcinoma [10]. Individual patient characteristics, including age, time since menopause, obesity, hypertension, diabetes mellitus and reproductive factors, are associated with a higher risk of endometrial carcinoma [10–16]. While the probability of PMB decreases with increasing age [17], the probability of endometrial cancer in women with PMB increases significantly with increasing age. The probability rises from 1% in women younger than 50 years of age to 24% in women older than 80 years of age [18].
In clinical practice, tests are commonly combined in diagnostic sequences and disease probabilities are usually estimated in a hierarchical manner: first combining information from history and examination, followed by additional information obtained from diagnostic tests. The post-test probability is not only dependent on test characteristics, but also on the pretest probability, which is altered by patient characteristics. However, current diagnostic policy in women with PMB is not based on these patient-specific risk factors, but only on one fixed cut-off point for endometrial thickness [2,19–21].
Clinical doctors want to identify women with a high risk for endometrial cancer when presenting with PMB. Several articles have studied this subject and developed models to estimate the individual chance of endometrial carcinoma in women presenting with PMB. The purpose of this review is to give an overview of the existing prediction models for endometrial carcinoma in women with PMB, to assess their quality and to identify important predictor variables.
Methods
Study identification
We performed a computerized MEDLINE and EMBASE search to identify all studies on prediction models in women with PMB published from inception to June 2011. The search was limited to human studies, no restrictions were held concerning publication year or language. We included articles reporting on multivariable models predicting endometrial cancer in women with PMB. We checked references cited in the selected articles for further relevant prediction models not identified by the electronic searches. We used all known synonyms for the terms ‘PMB’ and ‘endometrial cancer’ and we used a search filter for prediction models [22]. The search strategy can be found in the Appendix.
Study selection
This review focused on articles that report on a prediction model for endometrial carcinoma in women with PMB. In this review, a prediction model was defined as a multivariable model that expresses the chance of endometrial carcinoma as a function of two or more predictor variables. PMB was defined as vaginal bleeding after more than 1 year of amenorrhea after the age of 40 years, or persistent (>3 months) unscheduled bleeding on HRT.
Two independently working reviewers (N van Hanegem and MC Breijer) selected the articles by assessing titles and abstracts. If there were any doubts about eligibility after reading the title and abstract, we read the full text version to make sure no articles were missed. In case of a disagreement, the article was included for full text reading and assessed by a third reviewer (A Timmermans).
Study quality assessment
A framework for quality assessment was developed based on the recommendations of Hayden et al. [23] and on a quality assessment framework for prediction models in subfertile women to predict the chance of pregnancy [24]. The framework was divided into four sections: study participation, predictor variables, outcome measurement and analysis. Each item in the different sections was scored with yes, no or unclear.
Predictor variables
All predictor variables were collected for each prediction model. The predictor variables are the potential predictors, which were tested both during model development and in the final model. The original articles selected multiple variables or risk factors that are thought to be associated with an increased risk of endometrial cancer. These variables have been tested in the original articles for univariate association and, if sufficiently contributing to predictive accuracy in multivariable regression analysis, combined to construct a clinical prediction model. We collected all different predictor variables from the original articles, together with their significance, to identify the most important predictor variables for endometrial cancer. The most important predictor variables had been considered as statistically significant input variables in three or more studies or were considered statistically significant in two studies and had not been tested in other studies.
Model development assessment
The development of a prediction model consists of three phases: model derivation, model validation and impact analysis [25]. In the first phase, model derivation, predictor variables are identified by logistic regression. Model validation, the second phase, consists of an internal and external validation phase [24]. In internally validated models, the performance of the model is tested in the same data set in which the model was developed, or in a group of subsequent patients within the same center. In external validation, the goal is to demonstrate generalizability and reproducibility in patients different from the patients used for derivation of the original model. Therefore, the prediction model is evaluated on new data collected from an appropriate patient population in a different center [26]. The final phase of model development is impact analysis, in which prediction models are tested for their ability to change clinicians' decisions and to change patient outcomes [27]. All prediction models identified in this review are classified into the different phases of model development. We sent an email to all authors of the identified articles to investigate if their models are undergoing external validation and are not published yet.
Model performance
Performance measures (calibration, discrimination and clinical usefulness) and the range of probabilities given by the different prediction models were recorded. Calibration refers to the agreement between observed probabilities and predicted probabilities for groups of patients; this is usually reported as a calibration plot or a Hosmer–Lemeshow statistic (test for ‘goodness-of-fit’) [28]. Discrimination is commonly reported as the c-statistic (concordance), also referred to as the area under the receiver-operating characteristic curve (AUC). It measures the ability of a prediction model in separating patients with endometrial cancer and patients without endometrial cancer. An AUC of 0.5 describes a non-informative test, whereas an AUC of 1.0 represents a test that discriminates perfectly between presence and absence of a disease [29]. Clinical usefulness measures how close a prediction for an individual patient is to their actual outcome. This is mostly reported as accuracy (percentage of patients correctly classified), sensitivity or specificity, positive or negative predictive value (NPV), or likelihood ratios (LR) of a prediction model [30]. As we are interested in identifying a group of patients with a high risk for endometrial cancer, we are most interested in a high sensitivity, high NPV and a low negative LR.
Results
Study identification & selection
Of 754 articles identified by the MEDLINE and EMBASE search, a total of nine articles met the inclusion criteria of our review [31–39]. We identified another three articles by scanning the reference lists of included articles [40–42]; however, none of these matched our inclusion criteria after reading the abstract and full text version of these articles

Study selection diagram.
Study characteristics
Study characteristics are shown in
Study characteristics of included articles.
Not well described.
CH: Cohort study; ET: Endometrial thickness; Hyst: Hysteroscopy (positive/negative); PMP: Postmenopausal; TVS: Transvaginal sonography.
Patient selection and inclusion criteria were not the same in all articles. All nine articles included women with PMB, but three of these articles studied a population of women with a high-risk profile for endometrial cancer, based on an endometrial thickness of ≥5 mm [31,36,39].
Study quality
The results of the quality assessment are reported in

Quality of included studies.
Predictor variables
The nine included articles investigated 27 different possible prediction variables
Predictor variables evaluated and used in the prediction models.
Statistically significant in multivariate analysis and included in prediction model.
Statistically significant in univariate analysis and not included in prediction model.
Not statistically significant and not included in prediction model.
ET: Endometrial thickness; FOB: Frequency of bleeding; MIP: Mean intensity of pixels; TM: Time since menopause; VAS: Visual analog scale.
Phases of model development
All articles selected in this review addressed the first phase of developing a prediction model: model derivation [24]. Of the nine articles on predicting endometrial cancer in women with PMB, eight had been internally validated but none of these models passed the external validation phase. We asked all six research groups, who developed the nine different prediction models, if their models are undergoing external validation and we received response from all of them. The two prediction models of Opolskiene et al. [36,39] are undergoing temporal validation (internal validation in a newly recruited patient group) and external validation in an international multicenter study by Valentin et al. [Valentin L, Sladkevicius P, Pers. Comm.]. No results are available yet, since they are still recruiting patients for these studies. The two prediction models developed by Burbos et al. [37,38] were recently used in an article to compare the performance in internal validation of these models [43]. This group is working on external validation. Finally, we can report that the prediction model of Opmeer et al. is currently being externally validated in two cohorts [35]: one cohort in three different hospitals in The Netherlands and one in Skåne University Hospital Malmö (Scania, Switzerland) in collaboration with the group of Valentin et al. [Valentin L, Sladkevicius P, Pers. Comm.], but this external validation is not published yet. There were no impact analysis studies (i.e., studies that showed that the prediction model indeed improved patient outcome or was cost-effective in clinical practice).
Performance of the prediction models
The performance of the eight articles that internally validated their models [31–33,35–39] is presented in
Evaluation of model development and model performance.
Discrimination is reported as AUC.
As many models are described, we selected the model with the best performance.
Including Doppler.
AUC: Area under the receiver-operating characteristic curve; DEFAB: Diabetes, endometrial thickness, frequency of bleeding, age and BMI; ET: Endometrial thickness; FAD 31: Frequency of bleeding, age, diabetes, BMI cut-off 31; LR-: Negative likelihood ratio; NPV: Negative predictive value; PH: Patient history; Prob: Probability; Sens: Sensitivity; Spec: Specificity; US: Ultrasound; VAS: Visual analog scale; VI: Vascularity index.
Calibration was described in one article [32]. The estimated probability of malignancy and the observed proportion of patients with endometrial carcinoma are mentioned in Randelzhofer et al.'s article [32]. However, calibration is generally reported as a calibration plot. None of the studies reported on calibration in a calibration plot.
Discrimination was studied in seven out of eight articles by calculating an AUC. The AUC varied from 0.66 to 0.92 for different prediction models, with the highest AUC for a model combining Doppler and gray-scale transvaginal sonography (TVS) [36].
In all internally validated studies clinical usefulness is described, with the highest sensitivity and the lowest negative LR for a combined model with patient characteristics, gray-scale TVS and Doppler [39]. The highest NPV found for a model was 0.996, which combined patient history, endometrial thickness and histology in a sequential strategy [35]. The performance of the four models using only patient characteristics showed a high sensitivity or high NPV in two models [35,39] and a low LR for a negative outcome in one model [39].
All three studies in which Doppler was studied as a predictor variable, reported this information to contribute to the prediction of endometrial carcinoma in women with PMB [31,36,39]. Endometrial thickness was used as a variable in eight prediction models and seven found that incorporating endometrial thickness may improve diagnostic accuracy of a model.
Discussion
We systematically reviewed existing prediction models for endometrial carcinoma in women with PMB and to identify the most important predictor variables. We found nine studies reporting on the development of prediction models for endometrial carcinoma in women with PMB. Eight of these studies described at least one aspect of internal validation and, until now, none of the prediction models have been externally validated.
The different predictor variables can roughly be divided into four subjects: patient characteristics, gray-scale TVS variables, Doppler TVS variables and hysteroscopy variables. Most prediction models used a combination of these subjects to predict the chance of endometrial carcinoma. We chose to limit our list of the most important predictor variables to those that had been considered as statistically significant input variables in three or more studies and to those that were significant input variables in two studies and had not been tested in other studies. By doing this, we identified the most important variables without missing possible important variables that have not yet been extensively studied. Using these limits we identified 11 important input variables for predicting endometrial cancer in women with PMB
Almost all articles reported performance in terms of discrimination and/or clinical usefulness, whereas calibration was reported only incidentally. In this study, we identified five articles describing a prediction model with good discrimination (AUC: >0.8) [31,33,35,36,39]. As only one study described data on calibration, there is insufficient data available to draw conclusions on calibration.
Two studies showed best performance regarding discrimination and clinical usefulness: Opolskiene et al. and Opmeer et al. [35,39]. In the model by Opolskiene et al., a combination of patient characteristics, gray-scale TVS and Doppler was used. They concluded that their model excludes endometrial cancer reasonably well when power Doppler is added. Furthermore, in all three studies that used Doppler, Doppler was found to contribute to the prediction of endometrial carcinoma in women with PMB [31,36,39]. Based on this, we could conclude that the best model in predicting endometrial cancer is a model that uses a combination of patient characteristics, endometrial thickness and power Doppler. However, power Doppler cannot be used in all patients. All three Doppler models excluded patients based on different reasons: Doppler artifacts, incorrect processing of TVS image, fluid in the cavity and absence of Doppler signals or large myomas. Another limitation in the use of power Doppler is that these studies do not give information on the interobserver variability and learning curve in measuring Doppler variables. For application of results found in Doppler studies it is important to use the same ultrasound system, as the color content of a power-Doppler scan depends heavily on Doppler sensitivity [39].
Although the performance of the models using Doppler seems reasonable, a model using patient characteristics and endometrial thickness may be more useful in daily practice. In a healthcare system with general practitioners referring patients with a high risk of malignant disease to a specialist, the best model would be a model that can distinguish women with a high risk of endometrial cancer from women with a low risk based on patient characteristics only. Such a model would also be useful in situations where TVS is not directly available. Only women with a high risk could be referred for TVS or to the gynecologist for further evaluation and women with a low risk could be reassured and referred only at recurrent bleeding. Based on this review we could not identify a model with a good performance in internal validation based on patient characteristics only. However, two of four models based on patient characteristics showed good performance in clinical usefulness with a high sensitivity, a high NPV and/or a low LR for a negative outcome [35,39]. Based on these results we can conclude that although these models do not show a high AUC, they could be useful in clinical practice. These models were found to discriminate women with a high risk for endometrial cancer from women with low risk and to select women for further (invasive) testing.
The above conclusions are based on reported model performance based on internal validation only. To implement a prediction model into clinical practice, external validation is essential. McGinn et al. describe three reasons [25]. First, a prediction model may reflect associations between given predictors and outcomes that are primarily due to chance. Second, the predictor variables used in a model may be idiosyncratic to that specific population, which suggests that the prediction model may fail in a new setting. Third, clinicians may fail to implement the model comprehensively or accurately in their clinical practice. The result would be that a model succeeds in theory, but fails in practice. For a successful implementation, a model should be validated both internally and externally and finally go through the phase of impact analysis in the same population from which a model is derived. As none of the prediction models have completed the phase of external validation, they cannot be used in clinical practice yet.
When evaluating these prediction models by external validation or finally in impact analysis, one should keep in mind that these models were developed in different patient populations. The target population in which a model is derived should be the same as the population in which a model is tested or clinically used. Selecting a high-risk population (e.g., a population with an endometrial thickness of ≥5 mm) will result in a different performance and possibly in the selection of different predictor variables compared with an unselected population of women with PMB. Furthermore, implicit selection mechanism could occur within a population, for example within a general population or a population within a gynecological practice, or within health systems in different countries. Different populations have different prevalence of endometrial cancer, which could be an explanation for the differences found in the performance of the models.
A consensus has not been found in systematic reviews or in international guidelines regarding the best sequence of diagnostic procedures for women with PMB [8]. Considering the performance of the existing prediction models, we can conclude that we have indications that the first step in the approach of women with PMB should be to distinguish between women with low versus increased risk of having endometrial carcinoma, and the next step would be to refer patients for TVS or further invasive testing.
Future perspective
The prediction models that have been developed for women with PMB showed good performance, but have only reached the phase of internal validation. Future research should focus on external validation and impact analysis of these prediction models. We hope that these will confirm their prognostic abilities, so that in the next few years prediction models can be implemented in general gynecological practice. Based on this review, we conclude that clinical prediction models show promising results, but further external validation is required as well as impact analysis to maximize diagnostic accuracy of the models at an acceptable patient burden and for acceptable healthcare costs.
Executive summary
Postmenopausal bleeding (PMB) is associated with an elevated risk of having endometrial cancer. Clinical and ultrasound characteristics influence this risk in the individual patient.
We systematically reviewed nine prediction models for women with PMB.
The most important predictor variables in women with PMB are: age, BMI, diabetes, frequency of bleeding and use of anticoagulants and HRT (patient characteristics), endometrial thickness, endometrial color score and endometrial border (gray-scale ultrasound) and endometrial color score and vascularity index (Doppler).
Models including power Doppler showed the best performance in internal validation, but based on the difficult use of Doppler in general gynecological practice, we concluded that the best models up to present, are models combining patient characteristics with endometrial thickness.
Eight models were internally validated, with best performance in a study combining patient characteristics and measurement of endometrial thickness. Doppler is found to contribute to the prediction of endometrial carcinoma, but cannot be used in all patients.
The first step in the approach of women with PMB should be to distinguish between low- and high-risk patients for having endometrial carcinoma and the next step would be to refer patients for transvaginal sonography or further invasive testing.
Footnotes
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
