The legitimacy of manual muscle testing (MMT) is dependent in part on the reliability of assessments obtained using the procedure.
OBJECTIVE:
The purpose of this review, therefore, was to consolidate findings regarding the test-retest and inter-rater reliability of MMT from studies meeting inclusion and exclusion criteria.
METHODS:
An electronic search of PubMed, Scopus, and CINAHL databases and a hand search were conducted to identify articles addressing the test-retest or inter-rater reliability of MMT. Data on participants, testing specifics, and findings regarding reliability were extracted.
RESULTS:
Of 189 unique articles identified, 9 were found to meet inclusion/exclusion criteria. The studies were highly variable in regard to the population tested, MMT procedure and scoring, and findings. Nevertheless, based on pairwise comparisons, substantial or almost perfect test-retest and inter-rater agreement was demonstrated for most muscle actions tested.
CONCLUSIONS:
Reliable assessments of strength may be obtained by MMT but not assumed. Further research is required to address the reliability of MMT across pathologies, muscle groups, and test procedures.
Manual muscle testing (MMT) has a long history as a clinical procedure for grading muscle strength [1]. Individuals conducting the procedure use observation of muscles’ ability to create movement and respond to manual resistance to assign ordinal scores. The validity of MMT is well established relative to other measures of muscle strength [2, 3, 4] and function [4, 5], even though it is notably lacking in sensitivity [6, 7, 8].
The reliability of MMT has been examined extensively but not adequately summarized [9]. The purpose of this systematic review, therefore, was to remedy this shortcoming by consolidating the findings of studies describing the test-retest and inter-rater reliability of MMT.
Quality checklist applied to 9 articles included in systematic review*
Study
Study period
Inclusion/
MMT procedures
MMT grading
Blinding
Reliability % agreement
Total (9)
and setting (2)
Exclusion (2)
described (1)
scale described (1)
addressed (1)
and (2)
Brandsma et al.
1
2
1
1
1
1
7
Florence et al.
1
2
0
1
1
1
6
Frese et al.
1
2
1
1
1
2
8
Hough et al.
2
2
0
1
1
2
8
Paternostro-Sluga et al.
1
2
1
1
0
1
6
Personius et al.
1
1
1
1
1
1
6
Pfister et al.
2
2
1
1
0
1
7
Savic et al.
1
1
1
1
0
2
6
Tan et al.
1
2
1
1
1
1
7
*Study period time over which study was conducted (e.g., January–April, 2017), Setting actual location of testing (e.g., “single academic county hospital”), inclusion and exclusion criteria one or both explicitly noted, MMT procedures described reference to specific procedures (e.g., Daniels and Worthingham), MMT grading scale number of levels with scores for each, Blinding addressed explicitly stated or clearly implied, Reliability description of % agreement and weighted kappa.
PRISMA flowchart illustrating how the final sample of relevant articles was determined.
Method
Relevant research was identified by searches of the PubMed (since 1950), Scopus (since 1950), and CINAHL (since 1986) databases on April 19, 2018. The search string used for PubMed was: “manual muscle test*” AND (reliability or reproducibility). A hand search followed. Inclusion required that an article addressed the test-retest or inter-rater reliability of MMT and was written in English. Articles were excluded if they were reviews [9], addressed only composite scores rather than scores for individual muscles or actions [10], used MMT to simply identify the presence of weakness rather than gradations of strength [11], used inappropriate statistics to characterize reliability (e.g., Pearson or intraclass correlation coefficients) [12], or focused on tests scored in a unique manner (e.g., heel-raise test) [13]. For studies describing test-retest reliability, papers were excluded that addressed reliability determined during proximate sessions on the same day [14].
Articles retained on the basis of inclusion and exclusion criteria were abstracted by the author for information on participants (number, country of residence, medical condition), muscle test method, grading scale and range of grades assessed (based on Medical Research Council grades), muscles/actions tested, and findings relative to reliability (% agreement or weighted kappa ()). For studies addressing test-retest reliability, the time between tests was delineated. For studies addressing inter-rater reliability the number and profession (e.g., therapists) of testers was recorded.
The quality of articles in this systematic review was scored using a 6 item, 9 point assessment (Table 1) similar to one described by Bohannon and Glenney [15]. Scores depended on the identification of the study period and setting, inclusion and exclusion criteria, MMT procedures, blinding, and reliability using % agreement and weighted kappa. Quality was not a criterion for inclusion.
Results
The database search identified 189 unique articles (Fig. 1). Three additional articles were identified by a hand searches. After application of inclusionary and exclusionary criteria, 9 relevant articles [16, 17, 18, 19, 20, 21, 22, 23, 24] remained to contribute data to this systematic review.
Table 2 summarizes information abstracted from 5 articles addressing test-retest reliability. The studies involved residents of 4 different countries with 5 different neuromuscular conditions. Test-retest intervals ranged from 1 day to 1 week. The MMT test methods were either not stated, self-described, Medical Research Council [25], Daniels and Worthingham [26] or MMT 8 [27] Grading scales consisted of 6 to 13 levels. Based on the Medical Research Council grading scale, studies indicating the range of grades assigned typically included muscles with grades throughout most of the 0 to 5 scale. The number of muscles/actions tested ranged from 3 to 18. The reliability of 31 different muscles/actions described using ranged from 0.35 to 0.98. Of the 59 reported, 28.8 % were between 0.61 and 0.80 (substantial) and 66.1% were between 0.81 and 1.00 (almost perfect) [28]. The for patients with myopathy (0.35 to 0.69) was lowest [21].
Summary of 5 articles describing the test-retest reliability of manual muscle testing
Study
Participants and testing hiatus
Muscle test method andgrading (grade range)
Muscles/actions and findings
Brandsma et al.(1995)
28 Thai patients with leprosyneuropathyTested within 3 days
Self described6 level scale (notindicated)
Abduction of little finger: 0.96Adduction of little finger: 0.72Abduction of index finger: 0.80Abduction of thumb: 0.96Opposition of thumb: 0.90Intrinsic plus of index finger: 0.80Intrinsic plus of middle finger: 0.85Intrinsic plus of ring finger: 0.71Intrinsic plus of little finger: 0.83
Florence et al.(1992)
102 American boys withDuchenne dystrophyTested 5 days apart
Not stated11 level ordinalscale (0–5)
Neck flexors: 0.85Neck extensors: 0.84Left and right shoulder abductors: 0.87Left and right shoulder external rotators: 0.89Left and right elbow flexors: 0.82Left and right elbow extensors: 0.86Left and right wrist flexors: 0.65Left and right wrist extensors: 0.69Left and right thumb abductors: 0.71Left and right hip flexors: 0.90Left and right hip extensors: 0.88Left and right hip abductors: 0.89Left and right knee flexors: 0.79Left and right knee extensors: 0.84Left and right ankle dorsiflexors: 0.81Left and right ankle plantarflexors: 0.71Left and right ankle everters: 0.73Left and right ankle inverters: 0.72
Paternostro-Slugaet al. (2008)
22 Austrian patients with paresisof radially innervated musclesTested 7 days (median) apart
Summary of 8 articles describing the inter-rater reliability of manual muscle testing
Study
Participants and testsers
Muscle test method andgrading (grade range)
Muscles/actions and findings
Brandsma et al.(1995)
28 Thai patients with leprosy andneuropathy2 therapists
Self described6 level scale (not indicated)
Abduction of little finger: Pairwise 0.79Adduction of little finger: Pairwise 0.72Abduction of index finger: Pairwise 0.74Abduction of thumb: Pairwise 0.80Opposition of thumb: Pairwise 0.81Intrinsic plus of index finger: Pairwise 0.78Intrinsic plus of middle finger: Pairwise 0.77Intrinsic plus of ring finger: Pairwise 0.75Intrinsic plus of little finger: Pairwise 0.93
Frese et al.(1987)
110 American patients referredfor therapy for musculoskeletal orneurological disorders11 therapists
Kendall and McCreary orDaniels and Worthingham13 level scale (2–5)
*Of every therapist with every other therapist, †Of each therapist with each other therapist.
Table 3 summarizes information gleaned from 8 articles addressing inter-rater reliability. The studies involved residents of 6 different countries with 8 different conditions. Most studies used 2 raters, but 5, 7, and 11 raters were each used in 1 study. The MMT test methods were either not stated, self-described, Medical Research Council [25], Daniels and Worthingham [26], MMT 8 [27], ASIA [29], or Kendall [30]. The MMT grading scales consisted of 6 to 13 levels. Based on the Medical Research Council grading scale, studies indicating the range of grades assigned typically included muscles with grades throughout most of the 0 to 5 scale. The number of muscles/actions tested ranged from 3 to 20. The reliability of 34 different muscles/actions was described using percentage agreement or . In some cases the description related to overall (or mean) percentage agreement or . In other cases the description related to pairwise percentage agreement or . Pairwise agreement ranged from 40% to 95.5%. Pairwise ranged from 0.04–1.00. Of 75 specific pairwise reported, 30.7 % were between 0.61 and 0.80 (substantial) and 41.3% were between 0.81 and 1.00 (almost perfect) [28]. The for patients with myopathy (0.08 to 0.54) was lowest [21].
Scores on the quality checklist ranged from 6/10 to 8/10. The factors most often contributing to reduced scores were a failure to identify the period of time over which the study was conducted or the failure to describe both percentage agreement and .
Discussion
Considerable research has been published that focuses on the test-retest or inter-rater reliability of MMT. The majority of such research was excluded from this review because MMT was only used to dichotomously characterize individuals (e.g., weak versus not weak), the reliability of MMT was described using correlations (e.g., intraclass correlation coefficients) not suited for characterizing agreement between ordinal scores, or reliability of MMT was focused on composite scores of multiple muscle actions rather than individual muscle actions.
The results of studies summarized in this review were variable in how they were reported. For the sake of cogency, results discussed hereafter are focused on simple pairwise comparisons (e.g., test-retest by a single tester or single tests by one pair of testers). The for the majority (94.9%) of pairwise test-retest comparisons was either substantial or nearly perfect. The for the inter-rater comparisons were lower, but the majority (72.0%) was still either substantial or nearly perfect. Together these findings show that assessments of muscle strength obtained by MMT can be reliable, but that acceptable reliability cannot be assumed. The reliability of testers, therefore, should be assessed before their assessments are used to make clinical judgements regarding status or change. The findings of this review also suggest that, when possible, the same tester should be responsible for obtaining repeated MMT measures over time.
Factors limiting the reliability of MMT are well established. Chief among such factors are the subjectivity of force application by testers [31]. While this factor is not a problem when weakness is so great that the application of manual force is not necessary, it is a potential issue with higher MMT grades. Another major factor is tester strength. Weaker testers are able to apply less force. This is particularly problematic when testing muscle actions such as knee extension which can produce particularly high forces [32]. It is interesting to note that the two studies reporting the lowest inter-rater reliability had a large proportion of participants with maximum or near maximum strength scores. Over 65 percent of the patients whose left and right gluteus medius were tested by Frese et al received a Medical Research Council score of 5/5 [18]; the median Medical Research Council score assigned to all muscles in the study of Hough et al was 4/5 or more [19].
This study had several limitations. First, only one individual was involved in selecting and abstracting articles. Thus, while any reader is free to conduct the same searches and examination of the literature described herein, there is no internal confirmation of findings. Second, the consolidation of findings was limited by the very small number of studies included, different populations tested, array of MMT procedures and grading scales used, different muscles tested, and inconsistency in how reliability was reported. Further research addressing these variables is clearly needed.
Conclusion
Research reviewed herein indicates that it is possible to obtain reliable assesments of strength with MMT in the cohorts quoted. Test-retest reliability tends to be greater than inter-rater reliability. Nevertheless, the reliability of measures obtained in specific clinical and research settings cannot be assumed; rather it should be confirmed before conducted repeatedly over time or by different testers.
Footnotes
Conflict of interest
None to report.
References
1.
LovettRWMartinEG. Certain aspects of infantile paralysis with a description of a method of muscle testing. JAMA1916; LXVI(10): 729-733.
2.
AndresPLSkerryLMThornellBPortneyLGFinisonLJMunsatTL. A comparison of three measures of disease progression in ALS. J Neurol Sci1996; 139(Suppl): 64-70.
3.
BohannonRW. Measuring knee extensor muscle strength. Am J Phys Med Rehabil2001; 80(1): 13-18.
4.
BohannonRW. How informative are manual muscle test scores obtained from home-care patients? Isokinet Exerc Sci2009; 17(1): 15-17.
5.
EriksrudOBohannonRW. Relationship of knee extension force to sit-to-stand performance in patients receiving acute rehabilitation. Phys Ther2003; 83(6): 544-551.
6.
BeasleyWC. Influence of method on estimates of normal knee extensor force among normal and post polio children. Phys Ther Rev1956; 36(1): 21-41.
7.
BohannonRW. Manual muscle testing: Does it meet the standards of an adequate screening test? Clin Rehabil2005; 19(6): 662-667.
8.
DvirZ. Grade 4 in manual muscle testing: The problem with submaximal strength assessment. Clin Rehabil1997; 11(1): 36-41.
9.
CuthbertSCGoodheartGJ. On the reliability and validity of manual muscle testing: A literature review. Chiropract Osteopathy2007; 15: 4.
10.
ParrySMBerneySGrangerCLDunlopDLMurphyLEl-AnsaryD, et al. A new two-tier strength assessment approach to the diagnosis of weakness in intensive care: An observational study. Critical Care2016; 19: 52.
11.
JepsenJR. Can testing of six individual muscles represent a screening approach to upper limb neuropthic conditins. BMC Neurology2014; 14: 90.
12.
KlingelsKDeCockPMolenaersGDesloovereKHuenaertsCJaspersE, et al. Upper limb motor and sensory impairments in children with hemiplegic cerebral palsy. Can they be measured reliably? Disabil Rehabil2010; 32(5): 409-416.
13.
Harris-LoveMOShraderJADavenportTEJoeGRakocevicGMcElroyB, et al. Are repeated single-limb heel raises and manual muscle testing associated with peak plantar-flexor force in people with inclusion body myositis? Phys Ther2014; 94(4): 543-552.
14.
ConnollyAMMalkusECMendellJRFlanaganKMMillerJPSchierbeckerJR, et al. Outcome reliability in nonambulatory boys/men with Duchenne muscular dystrophy. Muscle Nerve2015; 51(4): 522-532.
15.
BohannonRWGlenneySS. Minimal clinically important difference for change in comfortable gait speed of adults with pathology: A systematic review. J Eval Clin Pract2014; 20(4): 295-300.
16.
BrandsmaJWSchreudersTQRBirkeJAPieferAOostendorpR. Manual muscle strength testing: Intraobserver and interobserver reliabilities for the intrinsic muscles of the hand. J Hand Ther1995; 8(3): 185-190.
17.
FlorenceJMPandyaSKingWMRobisonJDBatyJMillerJP, et al. Intrarater reliability and manual muscle test (Medical Research Council Scale) grades in Duchenne’s muscular dystrophy. Phys Ther1992; 72(2): 115-122.
18.
FreseEBrwonMNortonBJ. Clinical reliability of manual muscle testing. Middle trapezius and gluteus medius muscles. Phys Ther1987; 67(7): 1072-1076.
19.
HoughCLLieuBKCaldwellES. Manual muscle strength testing of critically ill patients: Feasibility and interobserver agreement. Critical Care2011; 15R43.
20.
Paternostro-SlugaTGrim-StiegerMPoschMSchuhfriedOVacariuGMittermaierC, et al. Reliability and validity of the Medical Research Council (MRC) scale and a modified scale for testing muscle strength in patients with radial palsy. J Rehabil Med2008; 40(8): 665-671.
21.
PersoniusKEPandyaSKingWMTawllRMcDermottMP. Fascioscapulohumeral dystrophy natural history study: Standardization of testing procedures and reliability of assessments. Phys Ther1994; 74(3): 253-263.
22.
PfisterPBdeBruinEDDterkeleIMaurerBdeBrieRAKnolsRH. Manual muscle testing and hand-held dynamometry in people with inflammatory myopathy: An intra-and interrater reliability and validity study. PLOS One2018; 13: 3.
23.
SavicGBergströmEMKFrankelHLJamousMAJonesPW. Inter-rater reliability of motor and sensory examinations performed according to American Spinal Injury Association standards. Spinal Cord2007; 45(6): 444-451.
24.
TanJLThomasNMJohnstonLM. Reproducibility of muscle strength testing for children with spina bifida. Phys Occup Ther Pediatr2017; 37(4): 362-373.
25.
O’BrienMO. Aids to the Examination of the Peripheral Nervous System. Edinburgh: Saunders; 2010.
26.
AversDBrownM. Daniels and Worthingham’s Muscle Testing: Techniques of Manual Examination and Performance Testing. 10 edition. St Louis: Elsevier; 2018.
27.
Manual Muscle Testing Procedures for MMT8 Testing(June 18, 2007) https//www.google.com/search?q=MMT+8+procedures&rlz=1C1GGRV_enUS751US751&oq=MMT+8+procedures&aqs=chrome..69i57.6535j1j8&sourceid=chrome&ie=UTF-8 Accessed July 23, 2018.
28.
LandisJRKochGG. The measurement of observer agreement for categorical data. Biometrics1977; 33(1): 159-174.
29.
International Standards for the Classification of Spinal Cord Injury. Motor Exam Guide (June 2008) http://asia-spinalinjuryorg/wp-content/uploads/2016/02/Motor_Exam_Guide.pdf Accessed July 23, 2018.
30.
KendallFPMcCrearyEKProvancePGRodgersMMRomaniWA. Muscles: Testing and Function, with Posture and Pain. 5th edition. Philadelphia: Lippincott Williams and Wilkins; 2005.
31.
KneplerCBohannonRW. Subjectivity of forces associated with manual-muscle test grades of 3+, 4-, and 4. Percept Mot Skills1998; 87(3): 1123-1128.
32.
MulroySJLassenKDChambersSHPerryJ. The ability of male and female clinicians to effectively test knee extension strength using manual muscle testing. J Orthop Sports Phys Ther1997; 26(4): 192-197.