Abstract
Advances in knowledge
Morphological features of uterine masses on MRI are not sufficient to distinguish leiomyosarcoma from degenerating fibroids.
Keywords
Introduction
Uterine fibroids (leiomyomas) are the commonest benign uterine tumour affecting 20%–40% women of reproductive age.1–3 Leiomyosarcoma (LMS) is a rare but highly aggressive myometrial malignancy with an incidence highest in the 50-54 year age group. 4 Symptomatic fibroids can be managed by a range of conservative, minimally invasive or surgical management options. However, LMS requires management at a high-volume sarcoma and gynaecology oncology centre and in the absence of metastatic disease usually involves complete surgical resection with total abdominal hysterectomy and bilateral salpingo-oophorectomy. A systematic analysis by the US Food and Drug Administration estimated that the prevalence of unsuspected uterine leiomyosarcoma in women undergoing hysterectomy or myomectomy for presumed benign leiomyoma is 1 in 498 women. 5 A retrospective series of women undergoing morcellation for suspected fibroids, but with undiagnosed uterine LMS, reported worse outcomes. 6 Uterine artery embolization followed by curettage can also result in disease dissemination. 7 Accurate pre-operative diagnosis is therefore essential to ensure safe and effective management. The myometrial location of indeterminate masses makes biopsy challenging, 8 and therefore imaging is of increasing importance.
Degenerating fibroids (DFs) can have a heterogeneous appearance on MRI mimicking LMS. 9 However, there are very few published studies on the effectiveness of imaging in distinguishing DFs from LMS. Cohorts prior to this study are limited to fewer than 20 cases of LMS and often include a range of sarcoma subtypes other than LMS.10–17 Other studies propose age as an important risk factor as during resection for a presumed fibroid, patients aged 75–79 years have a 1 in 98 chance of having an unexpected LMS resected, compared with a 1 in 547 risk in women aged 25–29.18–20
The aim of this study was to construct a diagnostic model of MRI features to distinguish DF from LMS on a training set which could then be applied to an independent validation data set. The secondary objective was to establish inter-observer variability for each MRI feature.
Material and Methods
This was a retrospective 1:1 case-control study. The study protocol was approved by the local committee for clinical research and by the NHS (National Health Service) Health Research Authority Research Ethics Committee (Yorkshire and The Humber-South Yorkshire Research Ethics Committee 19/YH/0134 and patient consent was waived as the study used fully anonymized retrospective data.
Inclusion and exclusion criteria
Inclusion criteria: • Histologically proven DF or LMS. • Minimum pre-treatment MRI dataset - axial T1W and axial T2W MRI pelvis with either sagittal and/or coronal T2W MRI pelvis. • Complex DF defined on MRI by; >20% high T2 signal and/or two or more areas of intermediate T2 signal.
Exclusion criteria: • Lack of histological confirmation • Non-leiomyosarcoma uterine sarcoma including (smooth muscle tumours of uncertain malignant potential) STUMP • Incomplete MRI dataset • Fibroids not meeting MRI inclusion criteria.
Subjects
Patient’s age, source of histology and lactate dehydrogenase (LDH) level11,13 were recorded.
DF cohort: A list of consecutive patients with histologically proven benign fibroids either at hysterectomy or myomectomy were identified from a large gynaecology unit at Imperial College Healthcare Trust and inclusion and exclusion criteria were applied.
LMS cohort: Consecutive patients with histologically proven uterine LMS were identified from the prospectively maintained database of the soft tissue sarcoma unit at The Royal Marsden Hospital, (2008-2018). Inclusion and exclusion criteria were applied.
All cases had been reviewed by specialist pathologists at the tertiary referral centres.
All anonymised MRI scans were randomly assigned on the PACS (picture archiving and communication system) system.
Image Analysis
Two gynaecology-oncology radiologists, each with >10 years’ experience (AS and TG) independently performed image scoring unaware of the histology.
The case report form (CRF) (See Supplemental) was devised following a literature review of published studies describing characteristics reported to distinguish LMS from DFs (see Supplemental for reporting lexicon).
The CRF included lesion size and location (submucosal, intramural or subserosal); 3 perpendicular measurements; lesion morphology (protrusion into the myometrium, nodular border, organ invasion, feeding/internal vessels, multifocality and volume of intermediate T2, low T2 or high T1 signal). Presence of abnormal lymph nodes, ascites and peritoneal disease were also recorded. Where post contrast and DWI sequences were available, enhancement and volume of restricting components were documented. The radiologists own level of suspicion for LMS was scored on a Likert scale 21 (very unlikely 0; unlikely 1; unsure 3; likely 4; very likely 5).
Data Analysis
Using the rated MRI imaging features, a scoring system was built and tested. Sensitivity and specificity thresholds of the scoring system were pre-specified at 85% and 80% respectively to decide if a validation study would be justified.
Primary Endpoint
The sensitivity and specificity of the proposed MRI diagnostic classification model was generated by using the consensus of two radiologists.
Secondary Endpoint
The inter-observer variability for each MRI feature, expressed as Kappa and Bland-Altman limits of agreement (LoA) between radiologists.
Statistical Analysis
Sample Size
As this is a non-trial study design and all available data was used, the sample size was pragmatic and restricted to around 70 patients for the training study, with 1:1 consecutive matching between LMS and DF. If the training set met sensitivity/specificity thresholds, a separate sample would be used for the validation study.
Construction of the Diagnostic Classification Model
Step 1
Initial descriptive analysis of the dataset was performed to review the distributions, missingness, and collinearity between all MRI features, by investigating summary statistics, correlation matrices, and histograms by histological subtype (LMS or DF), where applicable.
Variables were deemed to have insufficient data for further analysis if there was: a large proportion (>25%) missing or not applicable; perfect prediction and/or a correlation coefficient >0.9 with another variable; or inadequate/highly skewed subgroup numbers (e.g. 95% yes and 5% no).
For the variables that were included for further analysis (Step 2), descriptive statistics have been reported in the results by radiologist and histological subtype, where applicable.
Step 2
To evaluate the univariate associations with outcome (histological subtype) for all remaining potential predictor variables, simple logit models were used to test each separately. All models included radiologist as a fixed covariate to account for the data being paired. As age of patient was not an MRI feature rated by the radiologists, the association between age and outcome was assessed using a t test. Age was included for further analysis regardless of the t test significance level because of the clinical rationale that it is expected to be related to outcome.
Possible predictors were taken forward to Step 3 if the logit models demonstrated univariate associations at the p < .1 level.
Step 3
The inter-observer variability between the two radiologists for each MRI feature was tested using the following parameters and acceptable ranges:
Discrete variables
Spearman’s rank coefficient>0.5 and <10% outside the Bland Altman 95% LoA
Continuous variables
Pearson’s correlation coefficient>0.5 and <10% outside the Bland Altman 95% LoA
Categorical variables
Kappa Coefficient of agreement>0.4 And/or >50% overall Agreement
For MRI features that did not fall within the above criteria, they were deemed unacceptable for consideration into the multivariable model (Step 4), as this demonstrates little or no agreement between radiologists, which implies the variable is too subjective or unreliable.
Step 4
All variables carried forward from the previous steps, including age, were included in the final multivariable model step, where radiologist was set as a fixed covariate and histological subtype was the outcome.
Using a manual backwards stepwise selection process, the covariate with the largest p-value was removed in an iterative procedure until all remaining covariates had a p-value of <0.1.
The remaining covariates made up the final diagnostic classification model to be tested for sensitivity and specificity.
Sensitivity analysis
Two checks were run on the final diagnostic classification model. Firstly, it was confirmed that the coefficients were not meaningfully altered by removing the fixed covariate radiologist; and secondly, it was tested that the same model was chosen when using Stata’s automated Stepwise regression command, rather than manual selection.
Sensitivity and specificity of the diagnostic classification model
The sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for the diagnostic classification model. The ROC (receiver operator characteristic) area under the curve will also be presented as an overall measure of the performance of the model.
Sensitivity and specificity of radiologist suspicion score as a diagnostic tool
A score of 1-3 is considered low suspicion with a predicted diagnosis of DF. A suspicion score of 4-5 is considered a high suspicion score with a predicted diagnosis of LMS. Using these cut offs sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated from the radiologist suspicion score. The ROC area under the curve will also be presented as an overall measure of the performance of the model.
Sensitivity analysis
The diagnostics above were also produced using a cut-off from the suspicion score of 3 for diagnosis of LMS. Those with a score of 3-5 were considered a high suspicion score with a predicted diagnosis of LMS.
All analyses were performed using Stata version 17 (StataCorp, Texas).
Results
Analysis population
The mean (SD – standard deviation) age in the DF group was 54.0 years (13.8) and in the LMS group was 56.2 years (10.7). LDH was only recorded in 6 patients and has therefore not been presented. DWI, T1W high signal and contrast enhanced imaging were also excluded due to paucity of data (56%, 51% and 54% respectively missing).
The total number of readings from the original dataset was 194: 99 (51%) DFs, 94 (48%) LMSs, and one missing. Of these, 18 were excluded as they were only scored by one radiologist, which may bias the results, or if histological subtype was missing.
Therefore, the total number of readings included in the analysis was 176, which accounted for 88 pairs of images from 72 patients: 84 (48%) DFs and 92 (52%) LMSs.
For the initial analysis, each radiologist assessed 42 cases of DF and 46 of LMS.
Descriptive statistics
Descriptive statistics of the MRI features for the diagnostic classification model deemed to have suitable data for univariate and inter-observer analysis, by histological subtype and by radiologist.
Descriptive statistics of the Level of suspicion score as a diagnostic tool, by histological subtype and by radiologist.
Construction of the diagnostic classification model
Univariate associations with outcome and inter-observer agreement or correlation.
Coef: coefficient; LoA: limits of agreement; NA: not applicable as the main effect significance value has been reported, rather than for each subgroup; ref: reference category; r = Spearman’s rank correlation coefficient; ρ: Pearson’s correlation coefficient; κ = Kappa coefficient of interrater agreement. IT2: LT2.
Out of the 19 variables reported in Table 1, 14 were taken forward to the manual backwards stepwise selection process, plus age of patient. The five that were dropped during Steps 2-4 were: location of mass in myometrium, feeding vessel, internal vessels, enlarged nodes, and peritoneal disease.
Final diagnostic classification model
Final diagnostic classification model (n = 174).

MRI scans of DF and LMS patients with the corresponding level of suspicion score.
Sensitivity and specificity of the diagnostic classification model (n = 176).
Sensitivity and specificity of radiologist suspicion score as a diagnostic tool (n = 172).
Discussion
To our knowledge, this is the largest imaging study attempting to create a diagnostic model distinguishing DF from uterine LMS. Recognizing the real-world challenges, we did not include simple fibroids in the control group which are more easily distinguished from LMS. Additionally, only LMS was included in the malignant cohort. We also embedded assessment of reproducibility into our study design in order to inform the choice of imaging features to be taken forward into the model. The sensitivity of the diagnostic classification model from the training set was unfortunately, deemed too low to progress to validation.
There are several possible reasons why a diagnostic model could not be generated with sufficient sensitivity or specificity to distinguish DF from LMS. This may be due the fact that objective MRI features overlap significantly between DF and LMS. The control group may have been affected by selection bias as not all DFs undergo surgery and therefore lacked histological confirmation for study inclusion. Although intralesional haemorrhage/T1 bright components and lack of central enhancement have previously been identified as showing a stronger association with LMS 12 we were unable to explore this due to insufficient data. From the univariate analysis, five features were not taken forward to multi-variate analysis and diagnostic modelling. These were location of mass in myometrium, feeding vessel, internal vessels, enlarged nodes, and peritoneal disease. The fact that location in the myometrium was not a predictor for sarcomatous change suggests that both DF and LMS can arise from anywhere in the myometrium. Assessing nodal involvement on MRI was based on size criteria and using morphological features of nodes (including size) on anatomical sequences has limitations. Feeding and internal vessels have been reported as features of sarcomatous change, however our analysis shows poor inter-observer agreement between the radiologists suggesting these features may be too subjective to apply to a diagnostic model.
Since completion of this study, a consensus statement for MRI evaluation of uterine masses for risk of leiomyosarcoma has been published. 22 Unfortunately, we had insufficient data to include DWI and ADC (apparent diffusion coefficient). A further recent consensus from the International Society of Gynecologic Endoscopy also recommends contrast enhanced MRI. 23 Contrast enhanced examinations were also lacking from our data. However, the limited size and inconsistency between some of the referenced studies is such that the relative contributions of DWI and contrast enhanced MRI have not been confirmed. For example, the number of patients with LMS in one study proposing use of DWI was 5 and another 16 proposing contrast enhanced MRI included 10 patients with LMS. 13 The heterogeneity in the imaging data available highlights inconsistencies in clinical practice. Ultrasound is often utilised as the first line investigation for pelvic symptoms and a recent ultrasound study has showed that both LMS and STUMP tumours showed a high percentage of circumferential and intra-lesional vascularisation which contributes to the evidence that vascularity is of relevance to the differential diagnosis. 24 It is possible that some of the masses in this study were incidental findings not interrogated further with specialist uterine MRI. This reinforces the need for standardised MRI protocols. In our ongoing experience as a tertiary referral centre for soft tissue sarcoma, baseline imaging of patients with subsequent diagnosis of uterine LMS often lacks contrast enhanced or DW imaging as it is not suspected at the time of imaging.
Although intermediate T2 signal was confirmed to be useful in this study, neither peritoneal metastases nor malignant nodes which were included in the recommendations, were present in enough cases of LMS to be useful distinguishing features. After completion of data collection, a study has been published which retrospectively reviewed qualitative MRI features distinguishing LMS (n = 19) and leiomyomas (n = 25). 25 This identified seven features associated with LMS; T2 hyperintensity, enhancing finger-like projections, necrosis, intratumoral haemorrhage, T2 heterogeneity, heterogeneity on post contrast T1 and an ill-defined border on post contrast T1. Using these features in an MRI predictive score, they found a score of 0–3 to have a 100% NPV for LMS and a score of 6-7 to have a 100% PPV for LMS. Our results were less promising which is likely due to inclusion only of degenerating fibroids which is a more challenging diagnostic scenario and closer to the diagnostic challenge faced by clinical radiologists. 26
Assessment of inter-observer variability in this study provides some valuable insights which have been lacking in the literature. The size of the lesion in all dimensions had a strong kappa agreement (all >0.75). However morphological features showed only poor or moderate kappa agreement at best. This analysis suggests that morphological assessment of suspicious myometrial masses (albeit with a reporting lexicon) has significant inter-observer variability contributing to the challenge of constructing an effective objective diagnostic classification model.
Overall the diagnostic model which incorporated AP (anteroposterior) length, intermediate T2 signal and a nodular border did not confer a diagnostic advantage over the radiologists suspicion score showing similar sensitivity, specificity, NPV and PPV (70.7%, 76.5%, 70.3% and 76.5% compared with 74.7%, 70.4%, 71.3% and 73.9% respectively).
Although this MRI study has analysed the largest number of histologically proven cases of LMS:DF, it is possible that the sample size was not large enough, potentially leading to sampling error. We did not perform a formal sample size calculation, instead choosing a pragmatic approach in this rare cancer, to use all available data. We also acknowledge that stage and grade of tumours may be influential to imaging appearances but the limited size of the dataset and data missingness did not allow for sub-analysis. This study indicates that morphology alone is unlikely to be a robust discriminator between DF and LMS. Other imaging parameters such as ADC and metrics derived from contrast enhanced MRI should ideally be explored but data from standardized acquisition are unlikely to be widely available in this rare cancer. Given the lack of distinguishing MRI parameters, future work should aspire to assess combined clinic-radiological models for example by considering peri/post menopausal status, rate of growth of a mass and post menopausal bleeding.
In the absence of a robust diagnostic tool, current recommendations that laparoscopic power morcellators should not be used in women aged 50 years and over remain relevant. 25 We would suggest that all patients with complex myometrial masses should be counselled about the risks when offered minimally invasive treatments which could increase tumour dissemination if LMS is the underlying pathology. This has been highlighted in a joint statement from the Royal College of Obstetricians and Gynecologists, Royal College of Radiologists and Sarcoma UK. 27
Conclusion
Morphological MRI imaging features alone are not sufficient to obviate the need for pathological confirmation prior to non-surgical management of complex uterine mass lesions. All patients with complex myometrial masses should be counselled regarding risks when offered minimally invasive interventions which could result in tumour dissemination if LMS is confirmed.
Supplemental Material
Supplemental Material - Comparison of MRI imaging features to differentiate degenerating fibroids from uterine leiomyosarcomas
Supplemental Material for Comparison of MRI imaging features to differentiate degenerating fibroids from uterine leiomyosarcomas by William W Loughborough, Andrea G Rockall, Tanja Gagliardi T, Laura Satchwell, Emily Greenlay, Piers Osborne, Nishat Bharwani, Thomas Ind, Ayoma Attygalle, Dione Lother, Georgina Hopkinson, Robin Jones, Charlotte Benson, Aisha Miah, Aslam Sohaib A, and Christina Messiou C in Rare Tumors
Footnotes
Acknowledgements
This study represents independent research supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre at The Royal Marsden NHS Foundation Trust and The Institute of Cancer Research, London, and by the Royal Marsden Cancer Charity, and Cancer Research UK (CRUK) National Cancer Imaging Trials Accelerator (NCITA). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. We acknowledge the support of the Imperial College London NIHR BRC.
Author contributions
W.L.: literature review, study design, acquisition of data, drafted manuscript, approval of final manuscript A.R.: interpretation of data, editing, contribution to and approval of final manuscript T.G.: reader of cases, approval of final manuscript L.S.: study design, analysis and interpretation of data, approval of final manuscript E.G.: study design, analysis of data, approval of final manuscript P.O.: acquisition of data, approval of final manuscript NB.: acquisition of data, approval of final manuscript T.I.: interpretation of data, editing, contribution to and approval of final manuscript A.A.: acquisition of data, approval of final manuscript D.L.: acquisition of data, approval of final manuscript G.H.: acquisition of data, approval of final manuscript R.J.: interpretation of data, editing, contribution to and approval of final manuscript C.B.: contribution to and approval of final manuscript A.M.: contribution to and approval of final manuscript A.S.: conception of study, study design, reader of cases, drafted manuscript, substantial revision of manuscript, approval of final manuscript C.M.: conception of study, study design, reader of cases, drafted manuscript, substantial revision of manuscript, approval of final manuscript. Chief Investigator.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institute for Health and Care Research (NIHR), Biomedical Research Centre at The Royal Marsden NHS Foundation Trust The Institute of Cancer Research, London, Royal Marsden Cancer Charity, and Cancer Research UK (CRUK), National Cancer Imaging Trials Accelerator (NCITA).
Ethical statement
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
