Sage Journals: Discover world-class research

Abstract

French

Purpose: Bursitis is a common musculoskeletal cause of shoulder pain and treatment varies, thus correctly diagnosing and grading bursitis is paramount in deciding management. Our aim was to assess reliability in grading shoulder bursitis on ultrasonography among fellowship trained musculoskeletal radiologists at our institution. Methods: Retrospective study of patients diagnosed with bursitis on ultrasonography. Single-sonographic images of the subacromial-subdeltoid bursa were collected for each patient and randomized to form a test-bank of varying degrees of bursitis. Three months after the test was administered, the cases were randomized and readministered. The radiologists graded each case as: within normal limits, mild, moderate or severe. Intraobserver variability was measured using Cohen’s kappa coefficient. Linear regression model was performed to assess correlation between years of experience and kappa. Results: 10 radiologists reviewed 70 cases of bursitis. Kappa values ranged from .53 to .91, indicating ‘moderate’ to ‘almost perfect’ variability amongst radiologists. A moderate positive correlation of improving variability (r = .69) with increasing years of experience exists. Conclusion: Fellowship trained musculoskeletal radiologists were able to grade shoulder bursitis with moderate to almost perfect variability, with a positive correlation of improved variability with increasing experience. This may help clinicians choose the correct treatment more confidently in their patients with shoulder pain.

Keywords

intraobserver variability shoulder bursitis ultrasonography

Introduction

Shoulder pain is the third most common musculoskeletal–related reason for seeking medical attention in the United States.¹ While the underlying cause of shoulder pain can be highly variable, a correlation with bursitis has been found; Draghi et al² found that in people presenting with shoulder pain, regardless of the cause, there was a common association between pain and the presence of bursitis.

Bursae are synovial-lined structures that function to minimize friction between at least two structures moving against each other.² The bursa is considered a potential space, seen on ultrasonography (US) as hypoechoic tissue between hyperechoic peribursal fat tissue.^2,3 Bursitis is when there is swelling or inflammation of the bursa. The word bursitis is often a misnomer however, as not all cases of bursitis are primarily from an inflammatory process but instead from a non-inflammatory swelling of the bursa.⁴ In cases of bursitis on US, the bursa appears fluid-filled and is lined with a hyperechoic wall.⁵

The normal shoulder joint is comprised of multiple bursae, including the subacromial-subdeltoid (SA-SD) bursa. The SA-SD bursa is composed of two bursae that lie under the deltoid muscle and acromioclavicular joint and overlie the rotator cuff and bicipital groove.^6,7 In people presenting with shoulder pain, there is often an association with SA-SD bursitis, regardless of the aetiology.² There are many pathologies that may cause SA-SD bursitis, including repetitive stress or overuse, rotator cuff injury, trauma, rheumatoid arthritis, infection and pigmented villonodular synovitis.^2,5 The treatment for bursitis is usually conservative, including activity modification, physiotherapy, non-steroidal anti-inflammatory drugs and corticosteroid injections. Surgical resection of the bursa is reserved for treatment resistant cases. Thus, correctly diagnosing bursitis is important in regard to choosing the correct management for patients with shoulder pain.

Although findings of bursitis are relatively straightforward, there are no guidelines or classification systems that allow for standardized grading of bursitis. This leads to subjective assessments, which can lead to both intraobserver and interobserver variability. For example, Naredo et al⁸ found that interobserver variability exists between experts in musculoskeletal (MSK) US (including a combination of radiologists and rheumatologists), with 84% agreement for diagnosing bursitis on shoulder US. This is most likely attributed to differences in opinion of what constitutes bursitis, mainly whether the presence of inflammation is necessary for diagnosis or not. While controlling for the differing opinions in defining bursitis, our aim was to determine whether intraobserver variability exists among fellowship trained MSK radiologists at our institution. This could grant healthcare providers with information to choose the correct treatment more confidently for their patients presenting with shoulder pain.

Methods

We conducted a retrospective study of patients who were diagnosed with bursitis on shoulder US at our institution between January 1, 2019 and December 31, 2020. Research ethics board approval was obtained. Included patients were between 18 and 69 years of age. Patients with incomplete imaging and full-thickness rotator cuff tendon tears were excluded. A total of 70 patients were analysed, including a small subset of control cases. We collected single-sonographic images, with standard window presets, of the SA-SD bursa from our institution’s Picture Archiving and Communication System for each patient. Images were acquired by MSK-trained sonographers. These single images were randomized to form a ‘test-bank’ of varying degrees of shoulder bursitis.

The test-bank was administered to all participating fellowship trained MSK radiologists (N = 10) within Hamilton in the form of an electronic document (Microsoft PowerPoint presentation). The participants were asked to grade each case as: within normal limits, mild, moderate, or severe. Given that no gold-standard exists for grading bursitis, the present study did not seek to provide objective measurements for determining each grade. Thus, the participants were asked to grade based on their prior training and experience. The bursa was measured at the widest thickness between the peribursal fat and the superficial margin of the supraspinatus muscle, in a plane parallel to the transducer beam.⁹ Following the first administration, the test-bank was randomly reordered and readministered 3 months later. The participants were then asked to grade each case again, without knowing the grading previously assigned to each case. The participants were also asked how many years they had been practicing MSK radiology.

Data were collected and analyzed on Microsoft Excel. Cohen’s kappa coefficient was calculated to determine intraobserver variability between the four categorical variables (within normal limits, mild, moderate and severe). A linear regression model was used to assess for correlation between kappa coefficient as the dependent variable and the radiologist’s years of experience as the independent variable. Corresponding P-value and Pearson correlation coefficient (r) were calculated. Statistical significance was declared when P <.05. The Pearson correlation coefficient range was defined as between −1.0 and +1.0, with the sign of the correlation coefficient representing the direction of the relationship. The strength of the correlation was defined based on the absolute value of r as perfect (r = 1.0), strong (.8 ≤ r ≤ 1.0), moderate (.5 ≤ r ≤ .8), weak (.2 ≤ r ≤ .5), very weak (.0 < r ≤ .2) or no association (r = .0).¹⁰ Data analysis was performed using SPSS version 28.0 (SPSS Inc., Chicago, IL).

Results

A total of 10 MSK-trained radiologists volunteered to assess single sonographic images of the SA-SD bursa in 70 different patients. The variables measured between the two iterations of the tests, for each participant, are demonstrated in Table 1. This includes the number of disagreements and the number of disagreements spanning >1 category (i.e. grade of bursitis designated as ‘mild’ on one test and ‘severe’ on the other test, or ‘within normal limits’ on one test and either ‘moderate’ or ‘severe’ on the other test). Cohen’s kappa coefficient, which measures level of agreement between the tests, as well as the kappa interpretation are also presented in Table 1. The following standard has been proposed to interpret the strength of agreement for the kappa coefficient¹¹: <.01 = poor, .01–.20 = slight, .21–.40 = fair, .41–.60 = moderate, .61–.80 = substantial and .81–1.00 = almost perfect.

Table 1.

Patient Characteristics, Level of Disagreements and Intraobserver Variability.

Experience	Participant	Number of Disagreements (%)	Number of Disagreements >1 Category (%)	Kappa Coefficient (95% CI)	Kappa Interpretation
1-5 years	1	21 (30)	1 (1.4)	.57 (.15)*	Moderate
	2	16 (23)	1 (1.4)	.56 (.19)*	Moderate
	3	17 (24)	1 (1.4)	.59 (.17)*	Moderate
6-15 years	4	13 (19)	0	.68 (.16)*	Substantial
	5	14 (20)	1 (1.4)	.72 (.13)*	Substantial
	6	20 (29)	1 (1.4)	.53 (.16)*	Moderate
16-30 years	7	19 (27)	0	.58 (.16)*	Moderate
	8	18 (26)	1 (1.4)	.63 (.15)*	Substantial
	9	13 (19)	0	.73 (.13)*	Substantial
	10	4 (6)	0	.91 (.09)*	Almost perfect

Intraobserver variability was measured using Kappa coefficient. Number of disagreements >1 category refers to disagreements between two noncontiguous categories (between within normal limits and moderate, between within normal limits and severe, and between mild and severe); CI, confidence interval; *significant difference (P < .001).

The kappa coefficient with standard of error for each participant is demonstrated in Figure 1. To demonstrate the trend of variability regarding the level of experience, the x-axis is organized in increasing years of experience. There was a moderate positive correlation between years of experience and improved variability (r = .69, P = .026).

Figure 1.

Kappa coefficient (measure of intraobserver variability) for each participant in order of increasing years of experience. Values expressed as single kappa value with error bars representing 95% confidence interval.

Figure 2 displays the distribution of assigning differing grades of severity of bursitis for each radiologist. Although interobserver variability was not measured in this study, this figure further illustrates the known existence of interobserver variability between the different radiologists. Representative cases of patients with each grade of bursitis that were unanimously agreed upon amongst all radiologists are demonstrated in Figure 3.

Figure 2.

Distribution of the grades of shoulder bursitis for each participant.

Figure 3.

Representative cases of each severity grade of shoulder bursitis.

Discussion

The present study retrospectively examined patients at our institution diagnosed with shoulder bursitis on US. Fellowship trained MSK radiologists were asked to grade each case of bursitis using a single US image of the SA-SD bursa. Three months after each case was graded, the cases were randomized and reassessed to the radiologists. This allowed for the assessment of intraobserver variability. This study demonstrated that intraobserver variability exists amongst the radiologists, with a moderate positive correlation of improved variability (increasing reliability) with increasing experience.

At time of publication, intraobserver variability of grading shoulder bursitis on US has not previously been measured. The present study reports relatively good agreement in grading shoulder bursitis on US, regardless of years of experience. The kappa values ranged from .53 to .91 (Table 1). Although no researchers have previously measured intraobserver variability on grading shoulder bursitis on US, many have reported similar intraobserver variability on different MSK-related pathologies, for different joints, and amongst different sonographic experts (both radiologists and non-radiologists).^12-15

Cohen’s kappa is a useful measure of intraobserver reliability. Values range from −1 to +1, where 0 represents the amount of agreement that can be expected from chance and 1 represents a perfect agreement between two tests.¹⁶ Table 1 displays the individual kappa values for each participant and Figure 1 highlights the trend of increasing kappa with increasing experience. There was a moderate positive correlation of improved variability with increasing years of experience. The most experienced radiologist had the highest kappa of .91, representing a disagreement of only 4 of 70 cases (6%). The kappa values of the remaining participants were between .53 and .73, which is similar to previous studies assessing intraobserver variability for different sonographic findings in various joints.^12-15

To interpret the strength of agreement for given kappa values, we can separate different kappa values into descriptive categories. Landis and Koch¹¹ have proposed the following standard: <.01 = poor, .01–.20 = slight, .21–.40 = fair, .41–.60 = moderate, .61–.80 = substantial and .81–1.00 = almost perfect. Similar standards have been proposed,¹⁷ albeit with slightly different descriptors. However, the numerical value at each tier is relatively arbitrary and considering this when interpreting the results is paramount. The radiologists in the present study ranged from moderate to almost perfect (Table 1), with a moderate positive correlation of improved variability with increasing experience. In the first group of radiologists with experience ranging from 1 to 5 years (N = 3), there was moderate agreement between tests; whereas, in the most experienced group (16–30 years; N = 4), there was one moderate, two substantial and one almost perfect. Although the difference between moderate and almost perfect categorically appears significant, Figure 1 better illustrates how similar in agreement the radiologists are regarding numerical kappa values.

The magnitude of kappa is influenced by additional factors, including the number of categories and applying weighted factors to kappa.¹⁸ The greater the number of categories, the greater the potential for disagreement between tests. In this case, there were four categories (within normal limits, mild, moderate and severe). Thus, in a clinical setting disagreement between within normal limits and severe should be more significant than a disagreement between mild and moderate, for example. However, in the present study, only 6 of the 10 radiologists had a disagreement spanning two noncontiguous categories (between within normal limits and moderate, between within normal limits and severe, or between mild and severe), and each of those six radiologists only made that disagreement once (representing 1.4% of cases).

Although the present study did not measure interobserver variability directly, Figure 2 illustrates the varying allotment for the different grades of shoulder bursitis between the different radiologists. Example cases of each grading severity that were unanimously agreed upon amongst all participants are provided in Figure 3. It is proposed that interobserver variability would exist if it were measured in this cohort, as many researchers have shown interobserver variability with diagnosing and grading different MSK pathologies on US imaging.^{8,12-15,19-23} This is most likely attributed to differences in opinion amongst clinicians of what constitutes bursitis,⁸ as there is no gold standard definition of bursitis on US. A clinician’s definition and grading acumen of shoulder bursitis is likely influenced by a combination of their prior training and clinical experience. The present study did not seek to establish a consensus on the definition of bursitis or establish a grading criterion for shoulder bursitis on US. It is the current researchers’ chief interest in how clinicians would utilize this information to adjust their clinical practice. Most importantly, what impact does the radiologic impression of bursitis have in comparison to the clinical exam, or how much emphasis is placed on the radiologist’s grading of bursitis? Furthermore, do the answers to these questions differ between clinicians (orthopaedic surgeons, rheumatologists, physiatrists, physiotherapists, etc.).

The present study measured bursal distension on US as the distance between the peribursal fat and the superficial margin of the supraspinatus muscle, along a plane parallel to the transducer beam. We acknowledge that interobserver variability may exist when measuring the SA-SD bursa, thus next steps should consider this prospectively.

Our study has several limitations. It is a single centre study based on the retrospective assessment of saved US images of the SA-SD bursae without Doppler assessment. Additional clinical context, including patient characteristics, presenting symptomatology and comorbidities were not available to the interpreting radiologists. Ultrasonography is considered an inherently operator dependent imaging modality. This was not a primary objective of our study and was minimized by utilizing cases performed by MSK-trained sonographers, from only a single center.

Conclusion

This study demonstrates good intraobserver reliability in grading shoulder bursitis on US for all MSK-trained radiologists. Furthermore, there was a moderate positive correlation of improving variability with increasing years of experience. Thus, understanding the inherit intraobserver variability of shoulder US may help clinicians more confidently choose the correct treatment for their patients presenting with shoulder pain.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Tyler M. Grey

References

Department of Research Scientific Affairs . Physician visits for musculoskeletal symptoms and complaints. American Academy of Orthopaedic Surgeons; 2013. https://www5.aaos.org/CustomTemplates/Content.aspx. Accessed January 30, 2021.

Draghi

Scudeller

Draghi

Bortolotto

. Prevalence of subacromial-subdeltoid bursitis in shoulder pain: An ultrasonographic study. J Ultrasound. 2015;18(2):151-158.

van Holsbeeck

Strouse

. Sonography of the shoulder: Evaluation of the subacromial-subdeltoid bursa. AJR Am J Roentgenol. 1993;160(3):561-564.

Williams

Jamal

Sternard

. Bursitis. StatPearls. https://www.ncbi.nlm.nih.gov/books/NBK513340/. Accessed January 30, 2021.

Jacobson

. Shoulder US: Anatomy, technique, and scanning pitfalls. Radiology. 2011;260(1):6-16.

Hirji

Hunjun

Choudur

. Imaging of the bursae. J Clin Imaging Sci. 2011;1(1):22.

Martinoli

Bianchi

Prato

, et al. US of the shoulder: Non-rotator cuff disorders. Radiographics. 2003;23(2):381-401.

Naredo

Möller

Moragues

, et al. Interobserver reliability in musculoskeletal ultrasonography: Results from a “Teach the Teachers” rheumatologist course. Ann Rheum Dis. 2006;65(1):14-19.

Tsai

Y-H

Huang

T-J

Hsu

W-H

, et al. Detection of subacromial bursa thickening by sonography in shoulder impingement syndrome. Chang Gung Med J. 2007;30(2):135-141.

10.

Zou

Tuncali

Silverman

. Correlation and simple linear regression. Radiology. 2003;227(3):617-622.

11.

Landis

Koch

. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174. doi:10.2307/2529310

12.

Bruyn

GAW

Naredo

Möller

, et al. Reliability of ultrasonography in detecting shoulder disease in patients with rheumatoid arthritis. Ann Rheum Dis. 2009;68(3):357-361.

13.

Joseph

Hussain

Pirunsan

Naicker

Htwe

Paungmali

. Clinical evaluation of the anterior translation of glenohumeral joint using ultrasonography: An intra- and inter-rater reliability study. Acta Orthop Traumatol Turcica. 2014;48(2):169-174.

14.

Hougs Kjær

Ellegaard

Wieland

Warming

Juul-Kristensen

. Intra-rater and inter-rater reliability of the standardized ultrasound protocol for assessing subacromial structures. Physiother Theory Pract. 2017;33(5):398-409. doi:10.1080/09593985.2017.1318419

15.

Karim

Wakefield

Quinn

, et al. Validation and reproducibility of ultrasonography in the detection of synovitis in the knee: A comparison with arthroscopy and clinical examination. Arthritis Rheum. 2004;50:387-394.

16.

McHugh

. Interrater reliability: The kappa statistic. Biochem medica. 2012;22(3):276-282.

17.

Shrout

. Measurement reliability and agreement in psychiatry. Stat Methods Med Res. 1998;7(3):301-317.

18.

Sim

Wright

. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys Ther. 2005;85(3):257-268. doi:10.1093/ptj/85.3.257

19.

Scheel

Schmidt

Hermann

KGA

, et al. Interobserver reliability of rheumatologists performing musculoskeletal ultrasonography: Results from a EULAR “Train the trainers” course. Ann Rheum Dis. 2005;64(7):1043-1049.

20.

Szkudlarek

Narvestad

Klarlund

, et al. Ultrasonography of the metatarsophalangeal joints in rheumatoid arthritis: Comparison with magnetic resonance imaging, conventional radiography, and clinical examination. Arthritis Rheum. 2004;50(7):2103-2112.

21.

Middleton

Teefey

Yamaguchi

. Sonography of the rotator cuff: Analysis of interobserver variability. Am J Roentgenol. 2004;183(5):1465-1468. doi:10.2214/ajr.183.5.1831465

22.

Swen

WAA

Jacobs

JWG

Algra

, et al. Sonography and magnetic resonance imaging equivalent for the assessment of full thickness rotator cuff tears. Arthritis Rheum. 1999;42(10):2231-2238.

23.

Ingwersen

Hjarbaek

Eshoej

, et al. Ultrasound assessment for grading structural tendon changes in supraspinatus tendinopathy: An inter-rater reliability study. BMJ Open. 2016;6(5):e011746. doi:10.1136/bmjopen-2016-011746

Intraobserver Reliability on Classifying Bursitis on Shoulder Ultrasound