Abstract
Aims:
The aim of this study was to evaluate the implication of doubtful joint swelling on clinical examination with respect to objective markers of synovitis by ultrasound (US) in patients with rheumatoid arthritis (RA).
Methods:
Two independent observers performed a modified 28 swollen joint assessment (28SJC), in which joints could be graded as either definitely swollen, non-swollen, or doubtfully swollen. Two examiners blinded to clinical information performed US assessment of the hands. We performed descriptive statistics and models to analyse the links between clinical assessment and objective markers of inflammation.
Results:
A total of 1204 joints were evaluated in 43 RA patients; 93% (40/43) of patients had ⩾1 joint with doubtful swelling (range: 0–4/patient). Inter-reader reliability for the modified 28SJC was good (0.74). Generally, both grey scale (GS) and power Doppler (PD) discriminated across not swollen, doubtful, and swollen joints. GS signals discriminated better than PD between doubtful swelling and no swelling [odds ratio (OR) for GS: 5.2; 95% confidence interval (CI) 1.2–23.3 versus OR for PD 1.7; 95% CI 0.2–13.0], whereas PD discriminated better than GS between swelling and doubtful swelling (OR for PD: 28.7; 95% CI 3.6–228.2 versus GS: 1.7; 95% CI 0.3–8.4). Joint osteophytes did not increase the degree of doubtfulness.
Conclusion:
Clinical doubt in the assessment of joint swelling constitutes an intermediate state between unequivocal swelling and the lack thereof also regarding the objectively quantified level of inflammation. In order to increase sensitivity for joint inflammation, the historical clinical approach of considering doubtful swelling the absence of swelling should be revisited to interpret clinical doubtfulness as an indication of swelling.
Introduction
Clinical evaluation of joint involvement, specifically the assessment of tender and swollen joints, remains the hallmark of both the diagnosis as well as the monitoring of arthritis. Joint damage and impairment of physical function are the most important adverse outcomes of rheumatoid arthritis (RA), a consequence of clinical disease activity over time. Joint damage is particularly associated with swollen joint counts (SJC) and acute phase reactant levels as well as composite measures of disease activity that comprise these variables.1–3 Grading the extent of clinical joint swelling or tenderness does not improve the performance of a score, 4 and, therefore, joint swelling is recommended to be performed in an ungraded fashion. 5 To facilitate assessment in clinical practice, it has been recommended that a joint should be classified as clinically swollen only if the swelling is beyond doubt. 6 However, it may be difficult to evaluate and confirm joint swelling by clinical assessment in certain situations. In patients with pudgy or oedematous fingers, swelling due to synovitis may be difficult to distinguish from mere extra-articular soft-tissue changes on clinical examination. Indeed, a recent study found, that clinical joint swelling is less likely to represent synovitis [using power Doppler (PD) ultrasound (US) as reference) in obese RA patients as compared with those with a normal body mass index (BMI). 7 Further, certain joints may be more difficult to assess than others, for example, the shoulder or foot joints. 8 These reasons may lead the examiner to ‘doubt’ clinical swelling of a joint and thus potentially underestimate the level of disease activity, in particular because the current standards adhere to the view of ‘when in doubt indicate as non-swollen’. 6 In the present study we granted the joint examiner the option to label a joint as doubtfully swollen and investigated whether such joints would show increased US signs of subclinical activity as compared with non-swollen joints, and less than joints that are clinically swollen beyond doubt.
Patients and methods
Patients
Inclusion criteria comprised fulfilment of the 2010 American College of Rheumatology/European League against Rheumatism (ACR/EULAR) classification criteria for RA, 9 the presence of at least one clinically unequivocally swollen joint, and age of >18 years. Enrollment of patients was consecutive. We aimed to recruit a sample size of 43 patients, corresponding to 1204 joints. This provided a sufficient sample size to use grey-scale (GS) or PD signs of synovitis as outcome variables for assessment of the potential ambiguity of clinical joint swelling in a logistic regression analysis. The sample size calculation was based on an assumed sampling ratio of 0.1 between doubtfully swollen and non-swollen joints, 10 and an estimated odds of 0.3 and 0.1, respectively, for the event of detecting PD. 11 The sample size calculation was based on the joint as the unit of analysis, and not the patient. In a sensitivity analysis, we addressed the impact of non-independence of the data units (i.e. the fact that multiple joints stem from the same patient) on the results, by randomly selecting one doubtful (by any examiner), one non-swollen and one swollen joint per individual patient. The ethics committee of the Medical University of Vienna approved the study, which was conducted according to the guidelines of the Declaration of Helsinki. All patients provided written consent upon inclusion into the study.
Assessments
Clinical assessment
Swollen joint counts were performed on the 28-joint count scale (28SJC) (bilateral wrist, metacarpophalangeal joints 1–5, interphalangeal joint of the thumb, proximal interphalangeal joints 2–5, elbow, shoulder, and knee joints).7,12 In 43 patients with RA two independent biometricians, health professionals with more than 5 years of experience performing daily joint counts in patients with arthritides, who were blind with regard to the examination of their co-evaluator as well as with regard to sonographic data, performed a modified 28SJC by assigning the following labels to individual joints with regard to swelling: (a) definitely swollen; (b) doubtfully swollen and (c) definitely not swollen. Doubtful swelling was defined as a state in which the examiner was unable to clinically rule out or confirm synovial swelling. For the purpose of the primary analysis, a joint was labelled as doubtfully swollen when swelling was doubtful to both examiners.
US and radiographic assessment
Following clinical evaluation by the biometricians, a standardised US evaluation of the 43 patients was performed by a sonographer blinded to the clinical data (G.S.). A systematic, multiplanar US examination was carried out using both GS and PD with a real-time scanner (General Electric Logiq E9) with a multifrequency linear transducer (10–15 MHz). Both GS and PD examinations were recorded for each of the 28 joints assessed. The sonographic evaluation of each joint was carried out according to the standardised scanning technique described in the EULAR guidelines and included both dorsal and volar scans. 13 Presence and absence of GS signs of synovitis and intraarticular PD signal as well as presence/absence of bone erosions and of osteophytes as defined by the OMERACT Working Group were recorded for each joint on a pre-designed form. 14 Conventional anterior-posterior radiographs of the hands collected annually as part of the routine follow up of RA patients at our department were evaluated for the presence/absence of osteophytes using the Interphalangeal Osteoarthritis Radiographic Simplified (iOARS) score. 15
Statistical analysis
Agreement between examiners on the modified 28 swollen joint assessment was assessed on joint level by weighted Fleiss kappa (agreement beyond chance quantified between 0 and 1). Intrareader reliability for the sonographer performing the US assessments for assessing synovitis using the same US machine and the same settings was found to be good to excellent (0.665 and 0.972 for PD and GS US, respectively) in a previous study. 16 To evaluate the association between clinical assessment of joint swelling and the presence of sonographic signs of synovitis, we applied logistic regression analyses: we utilized positivity for GS and PD as dependent variables, and doubtfully swollen versus non-swollen or swollen as explanatory variables, followed by chi-square test. To avoid retrieving mean results, we performed the same logistic regression analyses for each examiner individually.
The influence of demographic variables such as age, gender, disease duration and BMI as well as descriptive, clinical and laboratory characteristics: rheumatoid factor (RF), anti-citrullinated antibodies (ACPA), C-reactive protein (CRP), evaluators global assessment (EGA), erythrocyte sedimentation rate (ESR), fatigue (visual analogue scale 0–10), health assessment questionnaire (HAQ), pain (visual analogue scale 0–10), stiffness (visual analogue scale 0–10), patient global assessment (PGA), disease activity score with CRP or ESR (DAS28-CRP and DAS28-ESR), swollen joint count (SJC), tender joint count (TJC), simplified disease activity index (SDAI) and clinical disease activity index (CDAI) were then evaluated using linear regression. In order to see whether patients with high counts of doubtfully swollen joints differ from those with low counts, we divided them into tertiles based on the number of doubtfully swollen joints, and calculated trends over these tertiles of differences in descriptive, clinical and laboratory characteristics using the Jonckheere-Terpstra test.
We then examined the influence of potential osteoarthritis on ambiguity of joint assessment, by evaluating the number of osteophytes in each of the different swelling categories. Additionally, we reran the above-mentioned model omitting joints that showed signs of osteophytes by US.
In a sensitivity analysis, we calculated a mean numerical swelling status for each joint taking the evaluation of both examiners into account by calculating a score ranging from 0 to 2 for each joint (0 = non-swollen, 1 = doubtfully swollen, and 2 = definitely swollen) divide by 2 (for the two examiners). These scores ranged from 0 to 2 in steps of 0.5 and were used it in the logistic regression model. The dependent variables in this logistic regression analysis were GS signs of synovitis, PD signal and presence of osteophytes on US.
Data analyses were carried out using SPSS®, Version 25 (SPSS, Chicago, IL, USA) and STATA (StataCorp. 2017. Stata: Release 15. Statistical Software. StataCorp LLC, College Station, TX, USA). Significance level was 0.05.
Results
Frequency of doubtful joint swelling and respective agreement
A total of 1204 joints were evaluated in 43 RA patients (Table 1); 93% (40/43) of patients had ⩾1 DSJ, with a maximum number of 4 DSJ/patient. Doubtfully swollen joints by one examiner were classified so by the other examiner in only 17%; in the majority of cases they were classified differently, and to a comparable proportion as non-swollen (45% of cases) and swollen joints (38%).
Descriptive characteristics of study patients (n = 43).
Values are displayed as means (SD), unless indicated otherwise.
CDAI, clinical disease activity index; DAS28, disease activity score 28; HAQ, health assessment questionnaire; SD, standard deviation; SDAI, simplified disease activity index; IU, international unit.
The joints that were most frequently found to be doubtfully swollen by at least one examiner were the wrist, proximal interphalangeal (PIP) 3, PIP 5 and metacarpophalangeal (MCP) 2 joints, with the elbow and the shoulder joints least commonly doubtful (Figure 1), but also least commonly affected by swelling. Inter-examiner reliability for the modified (allowing for doubtful swelling) 28 swollen joint assessment was good (0.74; 95% CI: 0.70–0.79).

Frequency of doubtful joints. Numbers represent the percentage of patients (n = 44), in which the respective joint was classified as doubtful by at least one observer; joints included in the 28JCS are shown.
Sonographic characterization of doubtfully swollen joints
The prevalence of sonographic signs of synovitis in joints according to swelling status is shown in Table 2. Based on logistic regression modelling, the comparative risk for sonographic signs of inflammation for the different states by clinical assessment was calculated.
Prevalence of GS signs of synovitis, PD signal and osteophytes according to swelling status.
DSJ/DSJ, joints rated as doubtful by both observers; DSJ/SJ, joints rated as doubtful by one observer and swollen by the other observer; GS, grey scale; NSJ/DSJ, joints rated as doubtful by one observer and non-swollen by the other observer; NSJ/NSJ, joints rated non-swollen by both observers; PD, power Doppler; SJ/SJ, joints rated swollen by both observers.
Results are cross-tabulated and summarized for GS and PD findings in Figure 2. Generally, for both GS and PD, ORs to the left and below the diagonal are <1, whereas ORs to the right and above the diagonal are >1, indicating also an inflammatory continuity from no swelling to doubtful swelling to definite swelling. GS signs of synovitis discriminated better between no swelling and various grades of doubtful swelling as assessed by two observers: DSJ/NS or DSJ/DSJ versus NS/NS: OR 2.4 (95% CI: 1.2–4.9) and 5.2 (95% CI: 1.2–23.2) p < 0.05 for both, respectively (Figure 2A); than PD signal: OR 2.1 (95% CI: 1.0–4.5) p = 0.056; and 1.7 (95% CI: 0.2–13.0) p = 0.618, respectively (Figure 2B). In contrast, in more active joints, PD signal discriminated better than GS, as for the comparison of SJ/SJ versus DSJ/SJ or DSJ/DSJ: OR for PD signal were as follows: 3.1 (95% CI: 1.4–6.7) and 28.7 (3.6–228.2) p < 0.01 for both, respectively (Figure 2B); while the OR for GS signs of synovitis were: 0.9 (95% CI: 0.2–3.5) p = 0.878 and 1.6 (95% CI: 0.3–8.4), p = 0.548, respectively (Figure 2A). Evaluation of the association between doubtfulness and US findings on patient level confirmed the results of the analysis on joint level.

OR and 95% CI (in brackets) of joints calculated by logistic regression, for the detection of (A) GS signs of synovitis and (B) PD signal as compared with the reference swelling status.
Sensitivity analyses
We performed two sensitivity analyses. First, we performed the above analysis separately for each examiner, which were confirmatory of the overall findings: when compared with non-swollen joints, doubtfully swollen joints were more likely to exhibit both GS signs of synovitis: OR examiner 1 was 5.4 (95% CI: 2.3–13.0) p < 0.001; and OR examiner 2 was 3.4 (95% CI: 1.8–6.4) p < 0.001; and PD signal: OR examiner 1 was 3.1 (95% CI: 1.6–6.1) p < 0.001; and OR examiner 2 was 1.9 (95% CI: 1.0–3.7) p < 0.05. Similarly, when compared with joints with doubtful swelling, joints with definite swelling were even more likely to exhibit PD signal: OR examiner 1 was 3.6 (95% CI: 1.8–7.4) p < 0.001; and OR examiner 2 was 5.6 (95% CI: 2.8–11.5) p < 0.001 (Table 3). GS signs of synovitis alone were found to be less discriminative between doubtful swelling and definite swelling: OR examiner 1 of 1.3 (95% CI: 0.5–3.6) p = 0.590 and OR examiner 2 of 2.2 (95% CI: 0.9–5.0) p = 0.074.
OR (95% CI) for the detection of GS signs of synovitis and PD signal in relation to swelling status for each examiner.
CI, confidence interval; DSJ, joint rated as doubtful; GS, grey scale; NSJ, joint rated non-swollen; OR, odds ratio; PD, power Doppler; SJ, joints rated as swollen.
p ⩽ 0.05; **p ⩽ 0.01; ***p ⩽ 0.001.
In another sensitivity analysis, we used a modified swelling metric (ranging from 0 to 2 in steps of 0.5), which integrated the rating of both examiners, and thus reduced noise by inter-rater variability. Joints with higher mean measurements were more likely to exhibit GS signs of synovitis: OR 3.3 (95% CI: 2.5–4.5) p < 0.001; and PD signal: OR 3.8 (95% CI: 3.1–4.7) p < 0.001 (Figure 3), indicating again that doubtful swelling is characterized by higher GS signs of synovitis and PD signal as compared with the absence of swelling and lower signal as compared with definitely swollen joints. Finally, we could not demonstrate any association between radiographic or sonographic osteophytes and doubtful swelling joints (labelled doubtful by one or both examiners) (Figure 3C; other data not shown).

Predicted probability of GS signs of synovitis (A), PD signal (B), presence of osteophytes (C) on US in relation to mean swelling status (ranging from 0 to 2, with 0 non-swollen, 1 doubtfully swollen and 2 definitely swollen, taking both examiners into account as mean).
Patient factors associated with doubtfulness
We found no correlation between age, gender or BMI and the presence of doubtful swelling. After dividing the patients into tertiles based on the number of doubtfully swollen joints, patients with doubtfully swollen joints tended to have a shorter disease duration (p = 0.045) and symptom duration (p = 0.029). Patients exhibiting a higher number of doubtfully swollen joints had significantly higher disease activity as indicated by the traditional SDAI and CDAI (p = 0.006 and 0.008 respectively), but not by the DAS28-CRP or DAS28-ESR, driven mainly by a higher overall SJC28 (p < 0.001) and EGA (p < 0.001). We found no significant association between the number of doubtfully swollen joints and presence or levels of RF, ACPA, CRP, ESR, fatigue, HAQ, pain, PGA, stiffness or tender joint count (Supplemental Table S1).
Discussion
GS signs of synovitis and PD signal on US were more common in doubtfully swollen joints than non-swollen joints, and less common than in definitely swollen joints, suggesting that ‘doubtful’ swelling may indeed represent not only an intermediate state of clinical ambiguity, but also underlying inflammatory joint activity.
At the same time, the clinical state of ‘doubtfulness’ as a grey zone between clear presence and absence of joint swelling is very subjective. This is also reflected by the low agreement between examiners for doubtful swelling. They agreed on ‘doubtfulness’ in only 17% of the joints that were labelled as doubtfully swollen by any one of them, and the remaining 83% of these joints were split by the respective investigator about 50–50% into non-swollen and swollen joints, indicating that these non-concordantly judged joints were at least highly controversial with respect to their swelling status. This is in line with previous observations,17–19 and in fact supports former conclusions on the appropriateness of the ungraded evaluation of joint swelling. 5 However, when the individual assessments of the two examiners were compared separately to the US findings, results for both GS signs of synovitis and PD signal were confirmatory of the main analysis (which classified joints based on the concordant adjudication).
Of interest, our results using the modified swelling metric, suggest that synovitis as assessed by GS may be more sensitive in discriminating differences in inflammation on the lower end of the scale, whereas with higher levels of inflammation GS signals may not be able to increase further; for the increased vascularity as assessed by PD, it may be the opposite, with less discriminatory capacity in case of low-level inflammation, and better discrimination in more highly inflamed joints, in line with recent studies that imply that PD signal may be a better tool to detect active joint inflammation.20–22 This different range of discriminatory sensitivity is supported by comparison of panels A and B of Figure 2, and even more so by comparison of the slopes of the regression curves in Figure 3A and B.
The fact that patients with shorter disease duration and higher disease activity more often exhibited doubtful swelling suggests an association between ambiguity of swelling and more established disease. The number of doubtful joints was also higher in patients with a higher number of swollen joints, which is not surprising if one considers doubtfulness to be an intermediary state of inflammatory activity. The fact that we demonstrated correlation between the number of doubtful joints and disease activity as assessed by the SDAI and CDAI, but not by DAS28 might again be explained by the lower weight of the SJC in the latter index. The lack of association between the number of doubtful joints and many other patient and disease activity factors is likely related to the weaker association of these factors with inflammatory joint activity as such.
It is noteworthy to mention that the phenomenon of doubtful swelling may also occur during the clinical assessment of other rheumatic and musculoskeletal diseases such as psoriatic arthritis and should thus be further investigated in other conditions and cohorts. Performing a semiquantitative assessment of sonographic findings as well as other forms of imaging such as magnetic resonance imaging in future studies may allow us to further substantiate the state of doubtful joint swelling.
Although more than 1000 joints were assessed, enabling us to arrive at these conclusions, our study may be regarded as limited due to a relatively small sample size. In addition, the frequency of unequivocal DSJ sites was low; however, a number of sensitivity analyses and separate assessment of the two examiners revealed fully confirmatory results.
When designing the sonographic protocol for our study, our goal was to simulate as far as possible the clinical examination, which is why we choose binary grading performed by the sonographer during the examination rather than post hoc scoring of static images. While binary grading is less suitable as compared with semiquantitative scoring for monitoring treatment in longitudinal studies, our study was not designed to monitor therapeutic response and, in addition, previous studies have shown that binary grading is more reliable than semiquantitative scoring.23,24 However, the use of binary, rather than semiquantitative grading, and the difficulties in defining a threshold of GS and PD indicative of active synovitis, recently highlighted by a number of recent studies may be considered as a limitation of our study.25,26 In this study, we used US, a technique that has been shown to be more sensitive than clinical examination, not as a gold standard but as a surrogate for inflammation. The goal of the study was to evaluate the ambiguity of clinical joint swelling, and not to draw conclusions on the use of US for monitoring disease activity. Furthermore, we focused exclusively on articular rather than extraarticular structures. Evaluation of extraarticular structures including tendons might have provided valuable information on clinical joint ambiguity.
Our data support the notion that doubtful joints on clinical examination represent an intermediate state between swelling and absence of swelling, when compared with more sensitive imaging techniques, such as sonography. In principal, this challenges the current logic of ‘when in doubt indicate as non-swollen’ with the potential conversion to ‘when in doubt indicate as swollen’. 6 Changing this paradigm in clinical practice as stated above may lead to higher swollen joint counts and consequent categorization in higher disease activity states, as was the case in one of our earlier studies investigating multimodal disease activity indices utilising both sonographic and clinical data. 27 Such practice might have therapeutic consequences, and may run the risk of overtreatment. At the same time, the opposite (and current practice) runs the risk of undertreatment.
In summary, our data might indicate that the sensitivity of clinical joint assessment for inflammation can be increased when doubtful joints are considered as swollen and included into joint counts and composite scores.
Supplemental Material
Supplementary_File_1_1 – Supplemental material for Doubtful swelling on clinical examination reflects synovitis in rheumatoid arthritis
Supplemental material, Supplementary_File_1_1 for Doubtful swelling on clinical examination reflects synovitis in rheumatoid arthritis by Peter Mandl, Paul Studenic, Gabriela Supp, Martina Durechova, Stefanie Haider, Michaela Lehner, Tanja Stamm, Josef S. Smolen and Daniel Aletaha in Therapeutic Advances in Musculoskeletal Disease
Footnotes
Conflict of interest statement
P.M. reports grants and personal fees from AbbVie, BMS, Chugai, MSD, Janssen, Lilly, Novartis, Pfizer, Roche, outside the submitted work; T.S. reports personal fees from AbbVie, Janssen, MSD, Novartis and Roche, outside the submitted work; J.S. received grants to his institution from Abbvie, AstraZeneca, Janssen, Lilly, Merck Sharpe & Dohme, Pfizer, and Roche and provided expert advice for, or had symposia speaking engagements with, AbbVie, Amgen, AstraZeneca, Astro, Bristol-Myers Squibb, Celgene, Celltrion, Chugai, Gilead, Glaxo, ILTOO Pharma, Janssen, Lilly, Merck Sharp & Dohme, Novartis- Sandoz, Pfizer, Roche, Samsung, Sanofi, and UCB; P.S., G.S., M.D., S.H., M.L. and D.A. have nothing to disclose.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
