Abstract
Background:
Gluteal tendinopathy is commonly reported in the literature, but there is a need for a validated magnetic resonance imaging (MRI)-based scoring system to grade the severity of the tendinopathy.
Purpose:
To use intra- and interobserver reliability to validate a new scoring system, the Melbourne Hip MRI (MHIP) score, for assessing the severity of gluteal tendinopathy.
Study Design:
Cohort study (diagnosis); Level of evidence, 3.
Methods:
The MHIP score assesses gluteal tendinopathy according to each 1 of 5 categories: (1) extent of tendon pathology (maximum 5 points); (2) muscle atrophy (maximum 4 points); (3) trochanteric bursitis (maximum 4 points); (4) cortical irregularity (maximum 3 points); and (5) bone marrow edema (maximum 1 point), with an overall range of 0 to 17 (most severe). A total of 41 deidentified MRI scans from 40 patients diagnosed with gluteal tendinopathy (mean baseline age, 57.44 ± 25.26 years; 4 male, 36 female) were read and graded according to MHIP criteria by 2 experienced musculoskeletal radiologists. The radiologists were blinded to previous reports, and the scans were read twice within a 2-month period. Statistical analysis using the intraclass correlation coefficient (ICC) was used to determine intra- and interobserver reliability and mean/range for the MHIP scores.
Results:
Of a total of 123 readings, the mean MHIP score (±SD) was 3.93 ± 2.24 (range, 0-17 points). The MHIP score demonstrated excellent reliability for determining the severity of gluteal tendinopathy on MRI. The ICC for intra- and interobserver reliability was 0.81 (95% CI, 0.67-0.89) and 0.78 (95% CI, 0.62-0.87), respectively.
Conclusion:
The MHIP score had excellent intra- and interobserver reliability in scoring gluteal tendinopathy. This score allows gluteal tendon pathology to be graded prior to treatment and to be used for standardized comparisons between results in future research undertaking radiological review of gluteal tendinopathy.
Tendinopathies are common overuse injuries prevalent in both the athletic and nonathletic population and are responsible for up to 30% of all musculoskeletal consultations and nearly half of all sporting injuries. 16 Gluteal tendinopathies, including gluteus medius and minimus tendinopathies, are the most common lower limb tendinopathy and a major cause of lateral hip pain. 1 Gluteal tendinopathy is up to 4 times more common in female patients than males and results in very high levels of hip dysfunction 22 with a decreased quality of life and earning potential as well as lower activity levels. 6
The diagnosis of gluteal tendinopathy is important for the management of the disease, and yet, research suggests that 20% of gluteal tears are missed clinically prior to a patient’s undergoing total hip arthroplasty. 13 The diagnosis of symptomatic disease of the gluteal tendons is based on both clinical and radiological signs. 7 While the clinical signs of gluteal tendinopathy are well-documented, more research is still needed to study the radiological signs. Magnetic resonance imaging (MRI) is currently considered the gold standard for detecting the presence of gluteal tendinopathy. 17,19 The MRI findings suggestive of gluteal tendinopathy are the presence of tendinosis, intrasubstance signal abnormality, atrophy or absence of tendon fibers, cortical irregularity, and bone marrow edema (BME). 3,4,17,19,21 Beyond a diagnostic capacity, MRI also plays a role in assessing severity at follow-up, with a decrease in tendon thickness, and a normalized tendon structure indicating signs of tendon healing. 20
Despite radiological signs of gluteal tendinopathy being well-recognized, a validated grading system for determining gluteal tendinopathy severity on MRI has not been well-established. This makes it difficult for clinicians or researchers to compare progress before and after treatment. Importantly, in guiding future research, a grading score for gluteal tendinopathy would allow researchers to use a standardized and reproducible measure to easily compare radiological outcomes between patient cohorts and across multiple studies. The first step in determining a scoring system is to test reliability. Once the score has been shown to be reliable, studies can be performed to look at the prognostic and treatment value in specific clinical conditions. The aim of this study was to validate a new scoring system for grading the severity of gluteal tendinopathy. The hypothesis was that the Melbourne Hip MRI (MHIP) score would be a reliable tool for grading the severity of gluteal tendinopathy.
Methods
Participants
This study used 41 MRI scans from 40 patients (mean baseline age, 57.44 ± 25.26; age range, 18-80 years; 4 male, 36 female) who were diagnosed with gluteal tendinopathy from 2012 to 2016. The participants had a history of >15 months of lateral hip pain; pain with such activities as walking, running, and stair climbing; and pain while lying on the affected hip at night. All patients had tenderness on palpation over the greater trochanter. Participants in this study underwent pretreatment MRIs before receiving a blinded injection of either corticosteroid injection or leukocyte-rich, platelet-rich plasma. Before undergoing MRI, each patient provided informed consent to participate in the imaging investigation. Ethics approval was obtained for this study.
Study Design
Two experienced musculoskeletal radiologists, radiologist 1 (R.O.) and radiologist 2 (K.B.), were used in the interpretation of scans for this study. Both radiologists have more than 30 years of experience in musculoskeletal MRI. To test intraobserver reliability, each image was read twice on separate days by radiologist 1, and both radiologists independently made readings of the same set of 41 scans for the interobserver trial. To reduce bias, the first set of readings from radiologist 1, obtained from the intraobserver trial, were used in the calculation of the interobserver reliability. This was to ensure that only the initial readings from both radiologists were compared with each other.
MRI Unit
Multiplanar sagittal, axial, and coronal proton density and fat-saturated T2-weighted images were obtained using a Skyra 3-T superconducting MRI unit with Numaris/4 Syngo MR 11 software (Siemens), a slew rate of 200 T/m/s, and 45 mT/m gradient amplitude with a high-resolution 18-channel surface coil anteriorly and the 32-channel body coil posteriorly (Siemens). Scans were performed using standard protocols based on the European Society of Skeletal Radiology guidelines. 15 The majority of patients had their scans taken by the same MRI unit, and any other scans were performed at 3-T strength using the same protocols.
MHIP Score Grading System
The MHIP score includes 5 categories: (1) extent of gluteal tendinopathy (GT); (2) muscle fatty atrophy (FA); (3) trochanteric bursitis (TB); (4) cortical irregularity (CI); and (5) BME. These parameters were chosen based on the prevalence seen on MRI in lateral hip pain. Grading within each category is determined as shown in Table 1, with a maximum of 5 points for GT, 4 for FA, 4 for TB, 3 for CI, and 1 for BME. Scores from each category are then added together to achieve an overall MHIP score (range, 0-17 points).
The MHIP Grading Scale for Assessing Severity of Gluteal Tendinopathy on MRI
Interobserver Reliability
Radiologist 1 and radiologist 2 both interpreted the same set of 41 scans for the interobserver reliability trial. For each reading, the 2 radiologists recorded their MHIP scores using the standardized table (Table 1). Both radiologists, blinded to previous reports, interpreted deidentified scans as many as 2 times within a 2-month period. A total of 41 results from radiologist 1 were compared against 41 results from radiologist 2 in this trial.
Intraobserver Reliability
The intraobserver reliability trial was based on the same 41 scans. Radiologist 1 read all 41 deidentified scans twice on separate occasions without access to previous readings. Each scan was graded using the standardized MHIP score (Table 1). Altogether, there were 82 results, with 41 results from the first reading compared against 41 results from the second reading.
Statistical Analysis
Statistical analysis to determine the intra- and interobserver reliability was made with the intraclass correlation coefficient (ICC) using STATA Version 14 (StataCorp) and a 2-way random-effects model with absolute agreement. Fleiss 9 criteria were used to interpret ICC values, with >0.75 representing excellent reliability, ICC between 0.4 and 0.75 representing fair-good reliability, and ICC <0.4 representing poor reliability.
Results
Reliability of the MHIP Score
The intraobserver reliability for radiologist 1 was 0.81 (95% CI, 0.67-0.89), and the interobserver reliability between the 2 radiologists was 0.78 (95% CI, 0.62-0.87). Following the Fleiss 9 criteria, the intra- and interobserver reliability was excellent for the MHIP score in determining the severity of gluteal tendinopathy.
MHIP Score Distribution
The 41 scans were interpreted a total of 3 times: twice by the first radiologist and once by the second radiologist. Of a total of 123 readings, the mean MHIP score (±SD) was 3.93 ± 2.24, and the most common score was 3 out of 17 (19.51% of total patients). There was a wide distribution of results, with 0 and 11 being the lowest and highest scores achieved, respectively, out of a possible 17 (Figure 1). There was a strong correlation between the grades by each radiologist. Figure 2 shows the distribution of scores: radiologist 1 graded 65.85% of scans <5, and radiologist 2 graded 63.41% of scans <5. Similarly, 31.71% of scans were graded between 5 and 9 by radiologist 1 compared with 36.59% of scans by radiologist 2. Figures 3 and 4 show examples of the MRI findings for mild (MHIP score of 2) and severe (MHIP score of 8) gluteal tendinopathy.

Distribution of Melbourne Hip MRI scores across all readings.

Categorization of Melbourne Hip MRI scores based on readings by each radiologist. GT, gluteal tendinopathy.

MRI scans of a patient with gluteal tendinopathy and an MHIP score of 2 (GT = 1, FA = 0, TB = 1, CI = 0, and BME = 0). (A) Coronal fat-saturated T2 image showing minimal edema in trochanteric bursa with no cortical irregularity or marrow edema. (B) Axial fat-saturated T2 image showing minimal hyperintensity in the insertional fibers of the gluteus minimus and minor trochanteric bursitis. BME, bone marrow edema; CI, cortical irregularity; FA, fatty atrophy; GT, gluteus medius/minimus tendinopathy; MHIP, Melbourne Hip MRI score; MRI, magnetic resonance imaging; TB, trochanteric bursitis.

MRI scans of a patient with gluteal tendinopathy and an MHIP score of 8 (GT = 3, FA = 0, TB = 3, CI = 1, and BME = 1). (A) Coronal fat-saturated T2 image showing moderate trochanteric bursitis with minor cortical irregularity. (B) Axial fat-saturated T2 image showing ill-defined increased signal intensity and size of gluteus medius tendon and surrounding soft tissue with discontinuity of tendon. BME, bone marrow edema; CI, cortical irregularity; FA, fatty atrophy; GT, gluteus medius/minimus tendinopathy; MHIP, Melbourne Hip MRI score; MRI, magnetic resonance imaging; TB, trochanteric bursitis.
Elements of the MHIP Score
Table 2 lists the full scores for all MHIP score readings. Trochanteric bursitis and tendon pathology were the most common MRI signs, seen in 120 of 123 (97.56%) and 101 of 123 (82.11%) scans, respectively. Radiologist 1 reported some extent of trochanteric bursitis in all 82 of his readings. Partial and full-thickness tears were uncommon: A total of 22 of 123 (17.89%) scans reported low-grade partial-thickness tears and 1 of 123 (0.81%) scans reported high-grade partial-thickness tears. No scans were reported to have full-thickness tears of the gluteal tendons. Cortical irregularity was seen commonly: 55 of 123 (44.72%). Muscle atrophy and BME were relatively uncommon, seen in 16 of 123 (13.01%) and 8 of 123 (6.50%) scans, respectively.
Mean Elements for MHIP Score per Reading a
a Dashes indicate no result in this field. Reading 1 = first reading by radiologist 1; Reading 2 = second reading by radiologist 2; and Reading 3 = first reading by radiologist 2. BME, bone marrow edema; CI, cortical irregularity; FA, fatty atrophy; GT, gluteus medius/minimus tendinopathy; MHIP, Melbourne Hip MRI score; MRI magnetic resonance imaging; TB, trochanteric bursitis.
Discussion
Interpretation of Results
The sensitivity and specificity of MRI in the diagnosis of gluteal tendinopathy has been well-documented, 12 and this study assessed the reliability of a novel scoring system, the MHIP score, which is used to determine the severity of gluteal tendinopathy. The results showed an excellent intra- and interobserver reliability across 2 highly experienced radiologists interpreting 41 scans over a 2-month period.
Our patient cohort consisted of 90% female participants, and this is in keeping with the higher prevalence of gluteal tendinopathy associated with women. 22 The high incidence of trochanteric bursitis and tendon pathology seen in our study is also consistent with the radiological literature to date, indicating that these are 2 of the most common signs of gluteal tendinopathy. 19 However, the low incidence of severe tendon pathology, particularly full-thickness tears of the gluteal tendon, is likely due to the reason for presentation of our patient cohort. The participants in our study were receiving MRI scans in preparation for noninvasive injection therapy in a randomized controlled clinical trial that excluded participants with full-thickness tears. 8,18 There was also a low incidence of BME, muscle atrophy, and cortical irregularity. Particularly, muscle atrophy was seen only in patients who received a total score of ≥4 out of 17. Given the mean score of 3.93, this could signify that these 3 radiological signs were indicators of moderate to severe gluteal tendinopathy.
Previous Studies Evaluating Radiological Outcome of Gluteal Tendinopathy
To date, many studies have evaluated the progression of clinical symptoms in response to treatment for gluteal tendinopathy, but very few studies have focused on radiological outcomes. One study has analyzed the effects of treatment on gluteal tendinopathy using MRI, 2 but no studies have analyzed the radiological outcomes of treatment using ultrasonography. Bucher et al 2 indicated that there was no recognized, objective radiological scoring system for gluteal tendinopathy. They used a previously described but nonvalidated MRI scoring system to measure the safety and effectiveness of autologous tenocyte injections in gluteal tendinopathy. 21
What Sets This Study Apart?
The MHIP score aims to address the gap in the literature highlighted by Bucher et al 2 by providing a validated MRI scoring tool so that clinicians can have a standardized and reliable means to assess the severity of gluteal tendinopathy. No previous studies have formed a validated scoring system for this purpose. What sets our study apart from previous radiological reviews of the gluteal tendon is that the MHIP score is the first to combine isolated findings across multiple studies. The criteria of the MHIP score was based on 5 studies. Kong et al 17 reported both direct and indirect MRI signs of gluteal tendinopathy, including tendon pathology, bursal fluid, bony changes, and fatty atrophy. Lequesne et al 19 noted that 100% of patients with clinical signs of gluteal tendinopathy had tendon pathology and/or bursitis on MRI as well. This is consistent with other studies that have found gluteal tendon changes are the most prevalent pathology in those presenting with pain and tenderness over the greater trochanter. 11 Chi et al 3 reported similar findings on MRI, with a focus on the importance of finer classification of greater trochanteric bursitis and gluteus medius/minimus tendon pathology in the evaluation of gluteal tendinopathy. The severity of bursitis was based on amount of fluid, margins, and effect on adjacent structures. Tendon pathology was assessed based on percentage of fluid signal intensity. Connell et al 4 assessed patients with gluteal tendinopathy on sonography and found that the most prevalent sign was tendon pathology. Trochanteric bursitis and bony abnormalities were the other less prevalent signs recorded in this study. Pfirrmann et al 21 found that tendon defects were significantly more common in symptomatic patients (P < .001). Fatty atrophy and bursal fluid collections were also significantly different in symptomatic versus asymptomatic patients. Classification of fatty atrophy was based on the ratio of fat to muscle tissue, a grading system similar to one that has proven to be reliable for use in rotator cuff muscles. 5,10 The MHIP score also included marrow edema, a radiological sign commonly associated with mechanical stress. 14 Based on these findings, we were able to collaborate the results to include 5 categories within the MHIP score: tendon pathology, fatty atrophy, trochanteric bursitis, cortical irregularity, and bony edema.
While Bucher et al 2 also assessed response to treatment in recalcitrant gluteal tendinopathy, their scoring was qualitative in nature and did not provide an easily comparable result. This adapted scoring tool was based on the findings from 1 study and analyzed the extent of tendon signal intensity (normal/abnormal), osseous attachment (absent/present), tendon diameter (thinning/normal/thickening), and bursal fluid collection (absent/present). The gluteus minimus tendon and the lateral and posterior components of the gluteus medius tendon were analyzed separately. The limited and general subcategorization of this criteria may have made it difficult to appreciate the finer differences of gluteal tendinopathy on MRI. Further, while their scoring tool had excellent reliability, the results were underpowered with only 24 scans (12 pretreatment and 12 posttreatment) available for the radiologists. The patients initially used to validate the scoring tools were also exclusively limited to surgical candidates, meaning that the severity of their disease may have been worse, potentially overlooking mild to moderate presentations, and skewing results. This is of particular importance, given the need to develop a nonsurgical treatment with proven long-term benefits for patients with early stages of gluteal tendinopathy. Our study addressed some of these limitations by having a greater number of scans and finer stratifications in our subcategories. These changes meant that we could create a more robust scoring system, reflected in our wide distribution of results while still maintaining excellent intraobserver and interobserver reliability.
Limitations
All the MRI scans in this study were performed using 3-T magnets. This is a limitation for those who have access to only 1.5-T MRI scanners. A further limitation relates to the observers. The group was solely radiologists whose experience may be different from orthopaedic surgeons or sports physicians. Additionally, the use of 1 observer to calculate interobserver reliability as well as the absence of a control group should be noted. Further studies are required to demonstrate the validity of the MHIP scoring tool in clinical settings (for example, to identify whether the MHIP score can identify response to treatment, define specific pathology, or confirm surgical findings).
Conclusion
We found that the MHIP score had excellent intra- and interobserver reliability in scoring gluteal tendinopathy. This allows gluteal tendon pathology to be graded prior to treatment and to be used for standardized comparisons between results in future research undertaking radiological review of gluteal tendinopathy.
Footnotes
Acknowledgment
All authors extend their gratitude toward Ms Sally Boyd and acknowledge her assistance in the administrative organization of patient records and retrieval of scans for this research. The authors also acknowledge Dr Carl Blecher for his contributions toward grading scans for this study.
Final revision submitted October 22, 2020; accepted November 24, 2020.
The authors declared that there are no conflicts of interest in the authorship and publication of this contribution. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was obtained from the Psychology, Health and Applied Sciences Human Ethics Subcommittee, University of Melbourne (1852900).
