Abstract
Background:
Previous systematic reviews and meta-analyses on the diagnostic accuracy of shoulder clinical tests do not reach conclusions regarding subscapularis tears.
Purpose:
To compare the diagnostic accuracy of commonly used clinical tests for subscapularis tears.
Study Design:
Systematic review; Level of evidence, 3.
Methods:
An electronic literature search was conducted using Medline, Embase, and the Cochrane Library/Central. Eligibility criteria were original clinical studies reporting the diagnostic accuracy of clinical tests to diagnose the presence of rotator cuff tears involving the subscapularis.
Results:
The electronic literature search returned 2212 records, of which 13 articles were eligible. Among 8 tests included in the systematic review, the lift-off test was most frequently reported (12 studies). Four tests were eligible for meta-analysis: bear-hug test, belly-press test, internal rotation lag sign (IRLS), and lift-off test. The highest pooled sensitivity was 0.55 (95% CI, 0.28-0.79) for the bear-hug test, while the lowest pooled sensitivity was 0.32 (95% CI, 0.13-0.61), for the IRLS. In all tests, pooled specificity was >0.90.
Conclusion:
Among the 4 clinical tests eligible for meta-analysis (bear-hug test, belly-press test, IRLS, and lift-off test), all had pooled specificity >0.90 but pooled sensitivity <0.60. No single clinical test is sufficiently reliable to diagnose subscapularis tears.
Registration:
PROSPERO (CRD42019137019).
Keywords
Assessment of history and physical examination are the first steps in diagnosing patients presenting with shoulder pain, which is often the result of degenerative rotator cuff disease. 33 Primary physical examination includes clinical tests that aim to reproduce symptoms to identify which tendons are torn.
More than 180 shoulder clinical tests have been described in the literature. 28 In some instances, the same test is used to diagnose different tendons; in others, the same test may simply have a different name. This heterogeneity of clinical tests, in purpose and terminology, renders the assessment of their diagnostic accuracy difficult and leads clinicians to question their usefulness altogether. 12 Tests commonly used to diagnose subscapularis tears, whether isolated or concomitant with supraspinatus tears, involve active internal shoulder rotation at different flexion angles. 27 The lift-off test 11 was the first test designed to evaluate the integrity of the subscapularis, followed by the internal rotation lag sign (IRLS), 15 the belly-press test, 10 and a variant of the latter, the Napoleon test. 35 The belly-off sign and bear-hug tests were later described by Scheibel et al 35 and Barth et al, 4 respectively.
Previous systematic reviews 6,14,17 and meta-analyses 12,13 on the diagnostic accuracy of shoulder clinical tests, while providing well-designed analysis of more general shoulder tests, have not yielded conclusions on the reliable detection of subscapularis tears. While a number of recent studies 5,21,39 investigated newer clinical tests used for the diagnosis of subscapularis tears, none compared their diagnostic accuracy across the spectrum of clinical tests available. This systematic review and meta-analysis therefore aims to collect, synthesize, and critically evaluate the literature on the diagnostic accuracy of the clinical tests most commonly utilized for assessing the presence of subscapularis tears and determine any gaps in the literature and directions for future research.
Methods
This systematic review and meta-analysis adhered to the principles outlined in the handbook of the Cochrane Collaboration 16 and the established guidelines from PRISMA-DTA (Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies). 29 The study protocol, including the search strategy, was registered on PROSPERO (CRD42019137019).
Search Strategy
We conducted an electronic literature search using Medline (1946–), Embase (1980–), and the Cochrane Library/Central on July 7, 2020, using the following search strategy: (“rotator cuff” OR “subscapularis” OR “supraspinatus”) AND (“disease” OR “rupture” OR “tear” OR “pathology”) AND (“clinical test” OR “clinical examination” OR “physical test” OR “physical examination”) (Table 1). The electronic literature search returned 2212 records, of which 710 were duplicates.
Keyword Search Terms
The titles and abstracts of the remaining 1502 records were screened by 2 independent reviewers (A.L. and M.S.) to determine relevance according to the following eligibility criteria.
Inclusion Criteria
Each original clinical study had to report at least 1 of the following: (1) true and false positives and true and false negatives; (2) sensitivity and specificity; and/or (3) positive predictive value (PPV) and negative predictive value (NPV) of individual clinical tests (physical examination) against radiographic, arthroscopic, or intraoperative observations. Diagnoses had to focus on the presence of rotator cuff tears involving the subscapularis: either isolated subscapularis tears or anterosuperior tears of the subscapularis and supraspinatus. Patients had to present with shoulder pain, functional impairment, or other evidence of rotator cuff disease.
Exclusion Criteria
Cohorts were excluded if they had patients with shoulder injury <6 weeks, history of shoulder instability, dislocation, rheumatoid arthritis, fracture, fibromyalgia, labral lesion, adhesive capsulitis, tumor, complex regional pain syndrome, or stroke-related disorder. Articles written in languages other than English, French, German, Spanish, or Italian were also excluded.
Study Selection and Data Extraction
Two reviewers (A.L. and M.S.) independently performed the search. The reference lists of all selected publications were checked. Gray literature, systematic reviews, meta-analyses, and guidelines on shoulder clinical tests were searched to retrieve relevant publications not identified in the electronic search. Selection of relevant articles was first performed through titles and then abstracts. Full-text articles were retrieved if the abstract provided insufficient information to establish eligibility or the article passed the first eligibility screening. Disagreements between the reviewers were discussed and resolved by a third independent reviewer (P.C.).
The 2 reviewers independently extracted study characteristics (year of publication, journal, level of evidence, prevalence of subscapularis tears, age, eligibility, reference diagnostic method) and data (true and false positives, true and false negatives, sensitivity, specificity, diagnostic odds ratio, PPV, and NPV). For each finding, the sensitivity, specificity, PPV, NPV, and diagnostic odds ratio with their 95% CIs were recalculated from data in the article, using a continuity correction of 0.5 if applicable. 7
The 2 reviewers assessed risk of bias on eligible studies using the QUADAS-2 criteria (Quality Assessment of Diagnostic Accuracy Studies). 29 In line with recommendations, the original 14 questions and scoring system were adapted to this study.
Statistical Analysis
Clinical tests were described in summary format when (1) true and false positives and true and false negatives could not be retrieved or (2) tests were reported in only 1 study. A meta-analysis was performed on clinical tests reported in at least 3 studies, for which true and false positives and true and false negatives were described or could be retrieved from corresponding authors. A bivariate random effects approach was taken for the meta-analysis of the pairs of sensitivity and specificity
32
and pairs of PPV and NPV.
24
The main outcomes of interests were the sensitivity/specificity and PPV/NPV for each test, presented with their 95% CIs in forest plots, as well as summary receiver operating characteristic curves, constructed for pairs of sensitivity and specificity. Heterogeneity was investigated visually by examining forest plots. Publication bias could not be evaluated statistically because none of the tests were represented by at least 10 studies.
8
Statistical analyses were performed using R Version 3.5.0 (R Foundation for Statistical Computing) with the
Results
Systematic Review
A total of 1439 articles were excluded by reading their titles or abstracts, and a further 50 were excluded by reading their full text, leaving 13 from which data were extracted for this review (Figure 1). No additional relevant articles were identified from citations in selected studies, gray literature, systematic reviews, meta-analyses, or guidelines.

PRISMA diagram. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-analyses.
The 13 eligible studies (Figure 1), ∥ all published between 2006 and 2018, reported diagnostic accuracy for 8 clinical tests: bear-hug test, belly-off sign, belly-press test, IRLS, internal rotation resistance test (IRRT), lift-off test, Napoleon test, and supine Napoleon test. The most frequently cited test was the lift-off test (12 studies), while the least cited tests were the IRRT and supine Napoleon test (1 study each).
The study design was prospective in 11 studies and retrospective in 2 (Table 2). The reference diagnostic method was arthroscopy in 7 studies, ultrasound in 4, and magnetic resonance imaging (MRI) or magnetic resonance arthrography (MRA) in 2. Quality assessment using QUADAS-2 revealed that the risk of bias was low in 3 studies, moderate in 7, and high in 4, owing to flaws regarding patient selection in 5 studies, reference standard in 8, and low and timing in 1 (Table 3).
Description of Included Studies
Quality Assessment of Studies Using the QUADAS-2
Meta-analysis
Of the 8 clinical tests, 4 were evaluated in ≥3 studies, which reported true and false positives as well as true and false negatives for any subscapularis tear, and were therefore eligible for meta-analysis: bear-hug test (Figure 2), belly-press test (Figure 3), IRLS (Figure 4), and lift-off test (Figure 5). The most frequently represented test was the lift-off test (8 studies), while the least represented were the bear-hug and IRLS tests (4 studies each). The level of evidence was 1 or 2 in 5 studies and 3 or 4 in 3 studies (Table 2). The reference diagnostic method was arthroscopy in 6 studies and MRI or MRA in 2 studies. According to QUADAS-2 criteria, the risk of bias was low in 2 studies, moderate in 4, and high in 2 (Table 3).

Forest plot representing the diagnostic accuracy of the bear-hug test. FN, false negative; FP, false positive; NPV, negative predictive value; PPV, positive predictive value; Se, sensitivity; Sp, specificity; TN, true negative; TP, true positive.

Forest plot representing the diagnostic accuracy of the belly-press test. See Figure 2 for abbreviations.

Forest plot representing the diagnostic accuracy of the internal rotation lag sign. See Figure 2 for abbreviations.

Forest plot representing the diagnostic accuracy of the lift-off test. See Figure 2 for abbreviations.
The highest pooled sensitivity was 0.55 (95% CI, 0.28-0.79) for the bear-hug test, while the lowest pooled sensitivity was 0.32 (95% CI, 0.13-0.61) for the IRLS. There was considerable variation in reported sensitivity; for each clinical test, there was no overlap in the 95% CIs cited by ≥2 studies. The highest pooled specificity was 0.94, achieved by the bear-hug (95% CI, 0.80-0.99), belly-press (95% CI, 0.77-0.99), and lift-off (95% CI, 0.81-0.98) tests, while the lowest pooled specificity was 0.92 (95% CI, 0.73-0.98) for the IRLS. In all tests, pooled specificity was >0.90. By setting the threshold for sensitivity and specificity at >0.80, none of the tests met both criteria.
The highest pooled PPV was 0.82 (95% CI, 0.63-0.93) for the bear-hug test, while the lowest pooled PPV was 0.58 (95% CI, 0.31-0.82) for the IRLS. The highest pooled NPV was 0.80 (95% CI, 0.70-0.87) for the belly-press, while the lowest pooled NPV was 0.75 (95% CI, 0.62-0.85) for the IRLS. When the threshold for PPV and NPV was set at >0.80, only the bear-hug test met both criteria (Table 4). Clinical and methodological heterogeneities were considerable for all tests (Figures 2 -5).
Diagnostic Accuracy of Clinical Tests for Subscapularis Tears vs Reference Observations
Unpooled Data
There were insufficient data on the belly-off sign, IRRT at 0° and 90°, the Napoleon test, and the supine Napoleon test to be included in the meta-analysis. For the belly-off sign, Bartsch et al 5 reported sensitivity and specificity to be >0.80, while Kappe et al 21 noted a sensitivity of 0.31 and a sensitivity of 0.97. For the IRRT at 0° and 90°, Lin et al 25 cited sensitivity of 0.62 and 0.77, respectively, and specificity of 0.76 and 0.81, respectively. For the Napoleon test, Barth et al 4 indicated sensitivity to be 0.25 and specificity to be 0.98, while Takeda et al 37 reported sensitivity to be 0.63 and specificity to be 0.90. For the supine Napoleon test, Takeda et al cited sensitivity and specificity as >0.80.
Discussion
The most important finding of this study was that no single clinical test is sufficiently reliable to diagnose subscapularis tears. It is possible that using several in combination could reduce reliance on costly or lengthy radiologic assessments, 26 but this would need well-evidenced studies to establish. The present systematic search yielded 13 articles reporting the diagnostic accuracy of 8 clinical tests for subscapularis tears, of which 4 tests were eligible for meta-analysis: bear-hug test, belly-press test, IRLS, and lift-off test. All 4 tests had pooled specificity >0.90 but pooled sensitivity <0.60, suggesting that none are individually reliable to diagnose subscapularis tears. These tests are commonly used to diagnose subscapularis tears by inducing active internal rotation of the shoulder at different flexion angles. 27 The lift-off test 11 was the first test designed to evaluate the integrity of the subscapularis, followed by the IRLS 15 and the belly-press test, 10 the latter of which the Napoleon test 35 is a modified version. The belly-off sign and bear-hug test were later described by Scheibel et al 35 and Barth et al, 4 respectively.
The bear-hug test, designed by Barth et al, 4 is the newest of all tests in the meta-analysis. The belly-off sign, Napoleon test, and supine Napoleon test are more recent but lacked sufficient data to be included in the meta-analysis. The bear-hug test appears to be the most promising, based on pooled results from 4 series (598 patients), with best sensitivity (0.55), specificity (0.94), PPV (0.82), and NPV (0.80). The Napoleon test also had promising accuracy. As for the 4 other tests, sensitivity is the diagnostic weakness of the bear-hug test, so the test cannot be used alone to diagnose the presence of subscapularis tears.
Existing studies reporting the diagnostic accuracy of clinical tests for combined IRTT and belly-press test 1 and combined belly-press, bear-hug, and lift-off tests 9 yielded mixed results, with sensitivity of 0.46 and 0.81, respectively. An electromyographic study 31 found that the belly-press, bear-hug, and lift-off tests all activate the integrity of the subscapularis and concluded that these 3 tests can be used interchangeably. A comprehensive meta-analysis on shoulder clinical tests published in 2012 concluded that a combination of clinical tests marginally improves test accuracy. 13 Although medical history and physical examination have limited diagnostic accuracy, they can give useful indications in interpreting clinical tests. 2,14,18,20
The IRLS constitutes the passive version of the lift-off test (also known as the Gerber test). The 2 tests had equivalent pooled sensitivity (0.32 vs 0.33), specificity (0.92 vs 0.94), and NPV (0.75 vs 0.76), but the IRLS had lower PPV (0.70) than the lift-off test (0.58). This could be explained by a greater familiarity with the lift-off test, which was the most frequently reported. Unlike the lift-off test, the belly-press test and its modified versions (also known as the Napoleon test and the supine Napoleon test) can be performed in the presence of pain or stiffness. 3 Data on the supine Napoleon test from a single study are very promising, with a diagnostic accuracy of 0.84 for sensitivity, 0.96 for specificity, 0.94 for PPV, and 0.90 for NPV, although the risk of bias for this study 37 was high.
Publication bias could not be evaluated statistically; however, studies on clinical tests do not involve medical devices or treatments, which make them less prone to publication bias. In fact, the wide range of sensitivity (0.0-100), with the rather symmetrical distribution of data, suggests that publication bias was low.
Clinical heterogeneity was low for mean patient age, ranging from 45 to 65 years, but considerable for patient selection, as some series comprised patients who were diagnosed with rotator cuff disease or scheduled to undergo surgery, 4,5,19,23,25,37,39 while others included patients consulting for shoulder pain. 34,36,38 The prevalence of subscapularis tears was higher in series on patients who were diagnosed with rotator cuff disease or scheduled to undergo surgery (mean, 34%; range, 5%-43%) than on patients presenting with shoulder pain (mean, 15%; range, 6%-23%).
Methodological heterogeneity was considerable, given the use of 4 reference diagnostic methods (arthroscopy, MRI, MRA, ultrasound) (Table 1), missing information regarding blinding and/or timing of surgery relative to clinical testing (5 of 10 studies) (Table 4), and subjective thresholds in assessing muscle weakness in clinical tests, which could explain the high variability in sensitivity for all 5 clinical tests. Itoi et al 19 drew attention to the issue of intraobserver repeatability, which the authors assessed in a previous work (correlation coefficient, 0.71). Given that sensitivity was the diagnostic weakness of all pooled tests, combining tests may not improve diagnostic accuracy. Comparing the performance of the painful shoulder with the contralateral shoulder could, however, help circumvent subjectivity in clinical testing. 31
The quality of any meta-analysis relies on the quality of available studies. Of the 8 studies in the meta-analysis, 5 had a level of evidence of 1 or 2. Furthermore, quality assessment using QUADAS-2 revealed that most studies presented flaws regarding patient selection or diagnostic reference standard or failed to specify blinding and time to surgery, rendering the risk of bias moderate to high in 8 of the 10 studies. Given the small number of primary studies available for pooling, heterogeneity could not be evaluated by hierarchical or bivariate random effects modeling. Other limitations include the high prevalence of rotator cuff disease and comorbidities, as well as the lack of intra- and interobserver repeatability. We therefore recommend that future studies on diagnostic accuracy of clinical tests evaluate repeatability and take into account surgeon experience. Despite these limitations, this study adhered to the standard methodology for systematic reviews and diagnostic meta-analysis outlined in the handbooks of the Cochrane Collaboration 16 and the established guidelines from the PRISMA-DTA. 29
All tests displayed poor sensitivity, demonstrating that the diagnostic accuracy of clinical tests in evaluating the presence of subscapularis tears is limited, and radiographic assessment remains necessary. Four of the 8 tests—belly-off sign, IRRT, Napoleon test, and supine Napoleon test—could not be pooled for statistical analysis, as too few studies were identified. These tests, which show early promise in the identification of subscapularis tears, would be better understood through future well-designed research.
Conclusion
Only 4 tests were eligible for meta-analysis: bear-hug test, belly-press test, IRLS, and lift-off test. All 4 tests had pooled specificity >0.90 but pooled sensitivity <0.60, suggesting that none are individually reliable in diagnosing subscapularis tears. Well-designed studies assessing combinations of tests and less expensive imaging solutions could lead to more reliable clinical diagnosis of subscapularis tears and reduce the reliance on costly or lengthy radiologic assessments.
Footnotes
Notes
Final revision submitted June 4, 2021; accepted June 9, 2021.
One or more of the authors has declared the following potential conflict of interest or source of funding: A.L. has received consulting fees from Wright, Arthrex, and Medacta and royalties from Wright. P.C. has received consulting fees from Arthrex and Wright and royalties from Wright. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
