Sage Journals: Discover world-class research

Abstract

Background. To investigate the effectiveness of upper limb rehabilitation, sound measures of upper limb function, capacity, and performance are paramount. Objectives. This systematic review investigates reliability and responsiveness of upper limb measurement tools used in pediatric neurorehabilitation. Methods. A 2-tiered search was conducted up to July 2014. The first search identified upper limb motor assessments for 1- to 18-year-old children with neuromotor disorders. The second search examined the psychometric properties of the tools. Methodological quality was rated according to COSMIN guidelines, and results for each tool were assembled in a “best evidence synthesis.” Furthermore, we delineated whether tools were unimanual or bimanual tests and if they measured recovery or did not distinguish between physiological and compensatory movements. Results. The first search delivered 2546 hits. Of these, 110 articles on 51 upper limb assessment tools were included. The second search resulted in 58 studies on reliability, 11 on measurement error, and 10 on responsiveness. Best evidence synthesis revealed only 2 assessments with moderate positive evidence for reliability, whereas no evidence on measurement error and responsiveness was found. The Melbourne Assessment showed moderate positive evidence for interrater and a fair positive level of evidence for intrarater reliability. The Pediatric Motor Activity Log Revised revealed moderate positive evidence for test–retest reliability. Conclusions. There is a lack of high-quality studies about psychometric properties of upper limb measurement tools in children with neuromotor disorders. To date, upper limb rehabilitation trials in children and adolescents risk being biased by insensitive measurement tools lacking reliability.

Keywords

pediatrics rehabilitation cerebral palsy brain injury neuromuscular diseases psychometric properties COSMIN best evidence synthesis International Classification of Functioning Disability and Health

Introduction

The importance of adequate assessments is well known. They can provide information about various health care–related topics, such as the effectiveness of rehabilitation interventions; the course of a disease; the planning and adjustment of treatments; the objective reporting of the patient’s progress to health care specialists, patients, and their families; and the justification of treatments to health insurance companies.¹

While many studies established the psychometric properties of measurement tools, most of these studies focused on healthy or disabled adults. However, for those working in a pediatric setting, the results of these studies, and even the assessment tools themselves, often cannot directly be applied to children.² Especially in pediatrics, assessments that take a long time to complete are not well tolerated by young patients, and therapists do not have the time to familiarize themselves with many assessment tools for different patient populations.

Over the past decades, the development of assessment tools for this younger population has advanced, and studies have been conducted investigating the psychometric properties of child-friendly adaptations of existing measurement tools. Lately, a number of reviews have addressed psychometrics of assessments for children.^3-5 Yet most of them focused on a specific patient and/or age group, and it remains unclear whether results can be transferred to children with other diagnoses and of other ages. Moreover, none of these reviews differentiated between assessments concentrating on the evaluation of true recovery and those allowing compensatory strategies. However, to understand rehabilitation processes and improve therapeutic decision-making, it is crucial to know the underlying mechanisms of changing motor outcomes. Last, many reviews focused only on one single component of the International Classification of Functioning, Disability and Health (ICF), whereas for a comprehensive evaluation, measures at the ICF component levels Body Function and Activity and Participation should be investigated.⁶ The latter is further divided in the ICF qualifiers Capacity, reflecting the best possible the child can do when circumstances are ideal, and Performance, reflecting what the child actually does in its natural environment.⁶

Therefore, the objective of this study was to systematically review the literature for all measurement tools available to assess the upper extremity at the level of the ICF qualifiers Body Function, Capacity, and Performance in children and adolescents with a wide range of central motor disorders including cerebral palsy (CP), stroke, traumatic brain injury (TBI), or myelomeningocele (MMC). We thereby provide an overview of the current level of evidence for reliability, measurement error, and responsiveness of those measures for specific diagnoses. While reliability is defined as the proportion of total variance in the measurements derived from “true” differences among patients, measurement error stands for the systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured.⁷ Responsiveness is defined as the ability to detect change over time in the construct to be measured.⁷

Additionally, we differentiated whether measurement tools (a) assess only physiologically desired movements; (b) assess both physiological and compensatory movements, but account for compensation in the scoring process; (c) do not distinguish between physiological or compensatory movements. Furthermore, we presented whether task execution and scoring are performed and evaluated unimanually or bimanually. We hope this review can help clinicians and researchers decide what measurement tool would be most appropriate for their therapeutic or experimental intervention.

Methods

Search Strategy

The standardized protocol consisted of a 2-tiered search⁸ between December 2012 and June 2013, and an update of the search was performed in July 2014.

Search 1: Identification of Measurement Tools

We comprehensively searched through 6 electronic databases (CINAHL, Medline, Cochrane, OTSeeker, PEDro, Embase) to find all available upper limb assessments. The search term included keywords of (a) diagnosis, (b) age, (c) upper limb, (d) assessment tools, and (e) psychometric properties. An example of the search term for Medline is given in Appendix A.

The results were imported to a reference management system (Mendeley, Mendeley Ltd, London, UK). Duplicates were merged. Two reviewers independently screened titles, abstracts, and, if necessary, full texts for inclusion or exclusion according to predefined criteria. In case of disagreement, the reviewers discussed until consensus was reached. Otherwise, the opinion of an independent third reviewer was taken into account. A hand search of reference lists of all articles that met the inclusion criteria was then conducted.

Articles were included if they met the following criteria: (a) participants were children or adolescents (1-18 years) and (b) had diagnosed central motor disorders; (c) the body part of interest was (part of) the upper limb; (d) measurement tools assessed Body Function and/or Activity and/or Participation; (e) the evaluation of at least one psychometric property of the given tool was accomplished (irrespective of the main aim of the article); and (f) articles were peer-reviewed full texts (g) written in English or German.

Articles were excluded if (a) any of the participants exceeded the age of 20 years or more than 10% of the participants were between 18 and 20 years; (b) the assessment tool was invasive (eg, needle electromyography), (c) only used for diagnosis or (d) measured sensory function; and (e) they were case studies with less than 5 subjects.

Search 2: Selecting Psychometric Studies for Determining the Level of Evidence for Reliability, Measurement Error, and Responsiveness

For each measurement tool identified from the first search, a second search was conducted in the 6 databases mentioned above to find all articles investigating the psychometric properties of the tools. Therefore, the name of the tool and its abbreviations and variations were added to the search term used in the first literature search. The search result of each database was exported to the reference management system, and the same 2 reviewers screened titles, abstracts, and full texts for inclusion or exclusion of articles. In case of disagreement, the reviewers discussed until consensus was reached. Otherwise, the opinion of an independent third reviewer was taken into account. Additionally to the exclusion criteria mentioned in step 1, studies that only considered validity of a measurement tool, studies focusing on lower extremities, and review articles were excluded. Last, reference lists of all articles that met the inclusion criteria were screened for additional literature.

Quality Assessment

Quality assessment was administered at 2 levels.

Level 1: Quality Rating of Individual Studies

The quality of articles was rated according to the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist.⁹ It is a rating scale to evaluate the methodological quality of studies on measurement properties of health status instruments. In 9 boxes with 5 to 18 items, methodological aspects, such as applied statistics or independency of test administration, are rated for each psychometric property outlined in the article. Each item can be rated as “poor,” “fair,” “good,” or “excellent.”¹⁰

In the current review, the quality of studies on reliability, measurement error, and responsiveness was rated. Data were entered in a previously developed Microsoft Access COSMIN database,¹¹ which included also a Generalizability box to extract data on the characteristics of the study population and sampling procedure.¹² Two reviewers independently rated every article with the COSMIN checklist and discussed whenever ratings differed. Otherwise, the opinion of an independent third reviewer was taken into account. An overall score was defined by taking the lowest rating of all items per box.

One item in each box addresses sample size. Based on the COSMIN checklist, the sample size must be at least 30 to be rated as “fair.” Subsequently, many studies would be rated as “poor” even though all other items would be scored at least as “fair.” Therefore, in line with previous studies,^11,13,14 we accounted for the score of this item at the level of the best evidence synthesis, where only studies of “fair,” “good,” or “excellent” methodological quality were included.

Level 2: Best Evidence Synthesis for Each Measurement Tool

Results of each study were rated as positive, indeterminate, or negative¹⁵ (Table 1). If multiple studies on the same assessment were homogeneous enough, an overall rating was performed as proposed by van Tulder et al.¹⁶ To facilitate the choice of an adequate assessment, a “best evidence synthesis” based on the strategy of the Cochrane Back Review Group was performed.¹⁶ The level of overall evidence was rated as “strong” if findings were consistent in multiple studies of good OR in one study of excellent methodological quality; as “moderate” if there were consistent findings in multiple studies of fair OR in one study of good methodological quality; as “limited” if there was only one study of fair methodological quality; as “conflicting” whenever findings throughout studies were conflicting; or as “unknown” when only studies of poor methodological quality were available¹⁶ (Appendix B). To account for sample size, the level of evidence was rated as “strong,” when total sample size of combined studies was ≥100, “moderate” for a total sample size between 50 and 99, “limited” for a total sample size between 25 and 49, and “unknown,” when sample size was less than 25.^11,14

Table 1.

Quality Criteria for Measurement Properties^a.

Property	Rating	Quality Criteria
Reliability	+	ICC/weighted kappa ≥0.70 OR Pearson’s r ≥ 0.80
	?	Neither ICC/weighted kappa, nor Pearson’s r determined
	−	ICC/weighted kappa <0.70 OR Pearson’s r < 0.80
Measurement error	+	MIC > SDC OR MIC outside the LoA
	?	MIC not defined
	−	MIC ≤ SDC OR MIC equals or inside LoA
Responsiveness	+	Correlation with an instrument measuring the same construct ≥0.50 OR at least 75% of the results are in accordance with the hypotheses OR AUC ≥0.70 AND correlation with related constructs is higher than with unrelated constructs
	?	Solely correlations determined with unrelated constructs
	−	Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the hypotheses OR AUC <0.70 OR correlation with related constructs is lower than with unrelated constructs

Abbreviations: ICC, intraclass correlation coefficient; MIC, minimal important change; SDC, smallest detectable change; LoA, limits of agreement; AUC, area under the receiver operating characteristics curve; +, positive rating; ?, indeterminate rating; −, negative rating.

Adapted from Terwee et al.¹⁵

Results

Identification of Measurement Tools

As illustrated in Figure 1, 83 articles met the inclusion criteria. After reference searching, an additional 27 articles were included. Screening of retrieved articles revealed 51 upper limb assessment tools.

Figure 1.

Flowchart of the literature search and the selection of the studies.

Evidence for Reliability, Measurement Error, and Responsiveness

The search for psychometric properties of each tool resulted in 1731 articles. For the quality assessment, 62 articles covering 34 measurement tools met the inclusion criteria (see Figure 1).

In 58 articles, data about reliability were reported (Table 2), measurement error was outlined in 11 studies (Table 3), whereas 10 studies provided information about responsiveness (Table 4). Please note that kinematic measures are not included in these tables.

Table 2.

Characteristics of Included Studies (Reliability).

ICF
Function	Capacity	Performance	Tool	Article	Reliability	N	Age	Diagnoses	Results	COSMIN	COSMIN*
✓			Dynamometry	Effgen and Brown (1992)¹⁷	Test–retest	12	9.8-17.4	MMC	ICC 0.75-0.99	Poor	Fair
				Klingels et al (2010)¹⁸	Interrater	30	5-15	CP	ICC 0.95 (0.89-0.97)	Fair	Good
				Klingels et al (2010)¹⁸	Test–retest	23	5-15	CP	ICC 0.96 (0.90-0.98)	Poor	Fair
✓			Modified Ashworth Scale (MAS)	Clopton et al (2005)¹⁹	Interrater	17	2-17	Mixed	Elbow flexors: ICC 0.79 (0.67-0.88)	Poor	Poor
				Clopton et al (2005)¹⁹	Test–retest	17	2-17	Mixed	Elbow flexors: ICC 0.58	Poor	Fair
				Numanoğlu and Günel (2012)²⁰	Test–retest	37	2-16	CP	Elbow flexors: ICC 0.66 (0.48-0.79)	Fair	Fair
				Numanoğlu and Günel (2012)²⁰					Wrist flexors: ICC 0.57 (0.35-0.73)
				Klingels et al (2010)¹⁸	Interrater	30	5-15	CP	Total score: ICC 0.88 (0.76-0.94)	Fair	Good
				Klingels et al (2010)¹⁸					Elbow flexors: ICC 0.72 (0.50-0.86)
									Wrist flexors: ICC 0.65 (0.39-0.82)
					Test–retest	23	5-15	CP	Total score: ICC 0.90 (0.78-0.96)	Poor	Fair
									Elbow flexors: ICC 0.85 (0.69-0.93)
									Wrist flexors: ICC 0.80 (0.60-0.91)
✓			Manual Muscle Testing (MMT)	Klingels et al (2010)¹⁸	Interrater	30	5-15	CP	ICC 0.90 (0.80-0.95)	Fair	Good
				Klingels et al (2010)¹⁸	Test–retest	23	5-15	CP	ICC 0.96 (0.91-0.98)	Poor	Fair
✓			Mowery-Classification Videotaped Evaluation	Waters et al (2004)²¹	Interrater	10	12-16	CP	Kappa: 0.30 (0.21-0.46)	Poor	Fair
			Mowery-Classification Videotaped Evaluation	Waters et al (2004)²¹	Intrarater	10	12-16	CP	Kappa 0.45 (0.10-0.84)	Poor	Fair
✓			Reach/pinch/grip function Videotaped Evaluation	Waters et al (2004)²¹	Interrater	10	12-16	CP	Reach: ICC 0.73 (0.55-1.00)	Poor	Fair
				Waters et al (2004)²¹					Grip and Release: ICC 0.67 (0.43-1.00)
									Pinch: ICC 0.40 (0.23-0.69)
					Intrarater	10	12-16	CP	Reach: ICC 0.87 (0.62-1.00)	Poor	Fair
									Grip and Release: ICC 0.92 (0.75-1.00)
									Pinch: ICC 0.61 (0.36-0.85)
✓			Passive Range of Motion (pROM)	Klingels et al (2010)¹⁸	Interrater	30	5-15	CP	Elbow Extension: ICC 0.69 (0.44-0.84)	Fair	Good
									Wrist: Supination: ICC 0.73 (0.51-0.86)
									Extension: ICC 0.48 (0.15-0.71)
					Test–retest	23	5-15	CP	Elbow Extension: ICC 0.94 (0.86-0.97)	Poor	Fair
									Wrist: Supination: ICC 0.81 (0.61-0.91)
									Extension: ICC 0.88 (0.74-0.95)
✓			Active and Passive Range of Motion (a&pROM) Videotaped Evaluation	Waters et al (2004)²¹	Interrater	10	12-16	CP	aROM: Kappa 0.16-0.45	Poor	Fair
				Waters et al (2004)²¹					pROM: Kappa 0.40-0.85
					Intrarater	10	12-16	CP	aROM: Kappa 0.50-0.75	Poor	Fair
									pROM: Kappa 0.63-0.92
✓			Tardieu	Gracies et al (2010)²²	Interrater	20	6-17	CP	Without formal training: 67% to 95% agreement	Poor	Poor
							4-15		With formal training: 82% to 96% agreement
					Test–retest	20	6-17	CP	Without formal training: 72% to 97% agreement	Poor	Poor
							4-15		With formal training: 87% to 100% agreement
✓			Modified Tardieu (mTardieu)	Numanoğlu and Günel (2012)²⁰	Test–retest	37	2-16	CP	Elbow flexors	Fair	Fair
				Numanoğlu and Günel (2012)²⁰					Velocities: ICC > 0.62
									Angles: ICC > 0.76
									Wrist flexors
									Velocities: ICC > 0.75
									Angles: ICC > 0.85
✓			Tonic stretch reflex threshold	Jobin and Levin (2000)²³	Test–retest	14	6-18	CP	Lambda: ICC 0.73	Poor	Fair
				Jobin and Levin (2000)²³					Clinical spasticity score: ICC 0.60
✓			Upper Extremity Rating Scale (UERS)	Koman et al (2008)²⁴	Interrater	65	3-18	CP	Right side total score: Kappa 0.94	Fair	Fair
				Koman et al (2008)²⁴					Left side total score: Kappa 0.96
					Test–retest	62	3-18	CP	Right side total score: Kappa 0.93	Fair	Fair
									Left side total score: Kappa 0.96
✓			Accelerometry	Reddihough et al (1987)²⁵	Test–retest	8	4-5	CP	Number of turns: α 0.978	Poor	Poor
				Reddihough et al (1987)²⁵					Number of baseline crossings: α 0.983
				Reddihough et al (1991)²⁶	Test–retest	20	6-9	CP	Kendall: w 0.72	Poor	Poor
✓			Accelerometry (given task; pick up and eat a raisin)	Reddihough et al (1990)²⁷	Test–retest	32	?	CP	FRQMAX: Spearman 0.70	Fair	Fair
✓	(✓)		Quality of Upper Extremities Skills Test (QUEST)	Haga et al (2007)²⁸	Interrater	11	2-4.5	CP	Total score: Spearman > 0.72	Poor	Fair
			Quality of Upper Extremities Skills Test (QUEST)	Haga et al (2007)²⁸	Test–retest	21	2-4.5	CP	Total score: Spearman 0.92	Poor	Fair
					Intrarater	10	2-4.5	CP	Total score: Spearman > 0.63	Poor	Fair
				Sakzewski et al (2001)²⁹	Interrater	16	6.83-16.08	ABI	Total score: ICC > 0.91	Poor	Fair
				Sakzewski et al (2001)²⁹	Test–retest	16	6.83-16.08	ABI	Total score: ICC 0.93	Poor	Poor
				Sorsdahl et al (2008)³⁰	Interrater	25	2-13	CP	Total score: ICC 0.91 (0.80-0.96)	Poor	Good
				Sorsdahl et al (2008)³⁰	Intrarater	25	2-13	CP	Total score Rater A: ICC 0.69	Poor	Good
									Total score Rater B: ICC 0.89
				Thorley et al (2012)³¹	Interrater	31	2.5-12.58	CP	Total score: ICC 0.86 (0.73-0.93)	Poor	Poor
				Thorley et al (2012)³¹	Intrarater	31	2.5-12.58	CP	Total score: ICC 0.96 (0.93-0.98)	Poor	Poor
				DeMatteo et al (1993)³²	Interrater	17	1.5-8	CP	Total score: ICC 0.90	Poor	Fair
				DeMatteo et al (1993)³²	Test–retest	17	1.5-8	CP	Total score: ICC 0.95	Poor	Fair
				Klingels et al (2008)³³	Interrater	21	5-8	CP	Total score: ICC 0.96 (0.90-0.98)	Poor	Fair
✓	✓		Melbourne Assessment of Unilateral Upper Limb Function	Bard et al (2009)³⁴	Interrater	11	5.5-15.5	CP	ICC > 0.80	Poor	Fair
			Melbourne Assessment of Unilateral Upper Limb Function	Bard et al (2009)³⁴	Intrarater	11	5.5-15.5	CP	Kappa > 0.80	Poor	Fair
				Cusick et al (2005)³⁵	Interrater	9	5.42-12.0	CP	Kappa > 0.80	Poor	Good
				Johnson et al (1994)³⁶	Interrater	20	Incl. 6-12	CP	Kappa 0.65	Poor	Poor
				Johnson et al (1994)³⁶	Intrarater	20	Incl. 6-12	CP	Kappa 0.72	Poor	Poor
				Randall et al (2001)³⁷	Interrater	20	5.92-15.08	CP	ICC 0.95	Poor	Excellent
				Randall et al (2001)³⁷	Test–retest	19	5.92-15.08	CP	CCC > 0.97	Poor	Good
					Intrarater	20	5.92-15.08	CP	ICC 0.97	Poor	Excellent
				Spirtos et al (2011)³⁸	Interrater	11	6.08-14.42	CP	ICC 0.96 (0.93-0.98)	Poor	Poor
				Jayaraman and Puckree (2009)³⁹	Interrater	5	5-15	CP	Kappa 0.72	Poor	Poor
				Jayaraman and Puckree (2009)³⁹	Test–retest	5	5-15	CP	Kappa 0.82	Poor	Poor
				Klingels et al (2008)³³	Interrater	21	5-8	CP	ICC 0.97 (0.93-0.99)	Poor	Fair
	✓		Modified House Functional Classification (MHC)	Koman et al (2008)²⁴	Interrater	58	3-18	CP	Original House: ICC 0.92	Fair	Fair
				Koman et al (2008)²⁴					Sum of descriptors: ICC 0.94
					Test–retest	62	3-18	CP	Original House: ICC 0.94	Fair	Fair
									Sum of descriptors: ICC 0.96
	✓		House-Classification Videotaped Evaluation	Waters et al (2004)²¹	Interrater	10	12-16	CP	Kappa 0.54 (0.32-1.00)	Poor	Fair
				Waters et al (2004)²¹	Intrarater	10	12-16	CP	Kappa 0.80 (0.56-1.99)	Poor	Fair
	✓		Pediatric Arm Function Test (PAFT)	Uswatte et al (2012)⁴⁰	Test–retest	21	2-8	CP	ICC 0.74	Poor	Poor
	✓		Peabody Developmental Motor Scales 2 Fine Motor (PDMS-2 FM)	Wang et al (2006)⁴¹	Test–retest	32	2.25-5.33	CP	ICC 0.99	Fair	Fair
	✓		Peabody Developmental Motor Scales Fine Motor (PDMS-FM)	Russell et al (1994)⁴²	Test–retest	18	1.75-7.92	CP	Total score: ICC 0.99	Poor	Fair
	✓	(✓)	Video Observation Aarts and Aarts module: Determine Developmental Disregard (VOAA-DDD)	Aarts et al (2009)⁴³	Interrater	2 × 10	Group 1: 2.5-5	CP	Capacity	Poor	Fair
							Group 2: 5-8		Group 1: ICC 0.97 (0.90-0.99)
									Group 2: ICC 0.95 (0.86-0.99)
									Performance:
									Group 1: ICC 0.99 (0.97-1.00)
									Group 2: ICC 1.00 (0.99-1.00)
									Developmental disregard:
									Group 1: ICC 0.97 (0.92-0.99)
									Group 2: ICC 0.99 (0.98-1.00)
					Test–retest	2 × 10	Group 1: 2.5-5	CP	Capacity	Poor	Fair
							Group 2: 5-8		Group 1: ICC 0.87 (0.42-0.97)
									Group 2: ICC 0.90 (0.58-0.98)
									Performance:
									Group 1: ICC 0.97 (0.89-0.99)
									Group 2: ICC 0.99 (0.95-1.00)
									Developmental disregard:
									Group 1: ICC 0.88 (0.55-0.97)
									Group 2: ICC 0.98 (0.90-0.99)
					Intrarater	2 × 10	Group 1: 2.5-5	CP	Capacity	Poor	Fair
							Group 2: 5-8		Group 1: ICC > 0.96
									Group 2: ICC > 0.95
									Performance:
									Group 1: ICC > 0.99
									Group 2: ICC > 0.99
									Developmental disregard:
									Group 1: ICC > 0.97
									Group 2: ICC > 0.96
	✓	(✓)	Video Observation Aarts and Aarts module: Determine Developmental Disregard Revised (VOAA-DDD-R)	Houwink et al (2013)⁴⁴	Interrater	25	2.9-8.0	CP	Capacity: ICC 0.98 (0.95-0.99)	Poor	Fair
				Houwink et al (2013)⁴⁴					Performance: ICC 0.99 (0.98-1.00)
									Developmental disregard: ICC 0.95 (0.90-0.98)
					Test–retest	25	2.9-8.0	CP	Capacity: ICC 0.98 (0.96-0-99)	Poor	Fair
									Performance: ICC 0.99 (0.97-0.99)
									Developmental disregard: ICC 0.79 (0.57-0.90)
					Intrarater	25	2.9-8.0	CP	Capacity: ICC 1.00 (1.00-1.00)	Poor	Fair
									Performance: ICC 1.00 (1.00-1.00)
									Developmental disregard: ICC 0.98 (0.96-0-99)
	✓	(✓)	Assisting Hand Assessment (AHA)	Holmefur et al (2009)⁴⁵	Test–retest	18	2.25-4.92	CP	ICC 0.99 (0.79-0.99)	Poor	Fair
				Davis et al (2010)⁴⁶	Interrater	26	1.76-16.62	ABI	ICC: 0.97 (0.93-0.99)	Poor	Fair
					Intrarater	26	1.76-16.62	ABI	ICC: 0.99 (0.97-0.99)	Poor	Fair
	✓	(✓)	Observational Skills Assessment Score (OSAS)	Speth et al (2013)⁴⁷	Interrater	16	2.5-6	CP	% of use of both hands: ICC 0.932-0.950	Poor	Fair
			Observational Skills Assessment Score (OSAS)						Quality of use of affected hand: ICC 0.078-0.826
						16	12-16	CP	% of use of both hands: ICC 0.785-0.877	Poor	Fair
									Quality of use of affected hand: ICC 0.019-0.967
					Test–retest	10	2.5-6	CP	% of use of both hands: ICC 0.689-0.775	Poor	Fair
									Quality of use of affected hand: ICC 0.116-0.934
						16	12-16	CP	% of use of both hands: ICC 0.038-0.446	Poor	Fair
									Quality of use of affected hand: ICC 0.064-0.979
					Intrarater	16	2.5-6	CP	% of use of both hands: ICC 0.888-0.989	Poor	Fair
									Quality of use of affected hand: ICC 0.106-0.933
						16	12-16	CP	% of use of both hands: ICC 0.857-0.960	Poor	Fair
									Quality of use of affected hand: ICC 0.749-0.999
	✓	✓	Besta-Scale	Rosa-Rizzotto et al (2014)⁴⁸	Interrater	39	1.5-8	CP	Kendall’s K: 0.35-0.64	Fair	Fair
					Intrarater		1.5-8	CP	ICC: 0.813-0.971	Fair	Fair
(✓)	(✓)	(✓)	Shriners Hospitals for Children Cerebral Palsy Computer-Adapted Testing Battery Upper Extremities (UE-CAT)	Haley et al (2010)⁴⁹	Test–retest	27	2-19	CP	ICC 0.86 (0.71-0.94)	Poor	Fair
		✓	ABILHAND-Kids	Arnould et al (2004)⁵⁰	Test–retest	113	6-15	CP	r = 0.91	Fair	Fair
		✓	Bimanual Fine Motor Function (BFMF)	Randall et al (2013)⁵¹	Interrater	20	4-11	CP	Kappa: 0.98 (0.94-1)	Poor	Fair
		✓	Münchner ADL-Fragebogen (M-ADL) “Hand skill in everyday life” subscale	Blank (2007)⁵²	Interrater	48	3.00-6.11	CP	Hand global: Spearman 0.79	Fair	Fair
									Hand scale: Spearman 0.86
					Test–retest	28	3.00-6.11	CP	Hand global: Spearman 0.90	Poor	Fair
									Hand scale: Spearman 0.83
		✓	Pediatric Motor Activity Log Revised (PMAL-R)	Wallen et al (2009)⁵³	Test–retest	31	1.58-7.92	CP	How often scale: ICC 0.94	Fair	Fair
			Pediatric Motor Activity Log Revised (PMAL-R)	Wallen et al (2009)⁵³					How well scale: ICC 0.93
		✓	Pediatric Motor Activity Log Revised (PMAL-R) How well scale	Uswatte et al (2012)⁵⁴	Test–retest	31	2-8	CP	6-step rating scale: ICC 0.89	Poor	Fair
				Uswatte et al (2012)⁵⁴					3-step rating scale: ICC 0.90
		✓	Pediatric Outcomes Data Collection Instrument Upper Extremity (PODCI UE)	Gates et al (2010)⁵⁵	Interrater	139	11-18	CP	Self-parent: ICC 0.68 (0.35-0.84)	Fair	Fair
✓	✓	✓	Shriners Hospital for Children Upper Extremity Evaluation (SHUEE)	Davids et al (2006)⁵⁶	Interrater	11	6.92-13.75	CP	Spontaneous functional analysis: ICC 0.90	Poor	Poor
									Dynamic positional analysis: ICC 0.89
									Grasp-and-release score: Kappa 1.00
					Intrarater	11	6.92-13.75	CP	Spontaneous functional analysis: ICC 0.99	Poor	Poor
									Dynamic positional analysis: ICC 0.98
									Grasp-and-release score: Kappa 1.00

Abbreviations: ICF: International Classification of Functioning, Disability and Health, N, sample size; MMC, myelomeningocele; CP, cerebral palsy; ABI, acquired brain injury; ICC, intraclass correlation coefficient; FRQMAX, maximal frequency; COSMIN, COSMIN score with sample size item; COSMIN*, COSMIN score without sample size item.

Table 3.

Characteristics of Included Studies (Measurement Error).

ICF
Function	Capacity	Performance	Tool	Article	N	Age	Diagnoses	Results	COSMIN	COSMIN*
✓	(✓)		Quality of Upper Extremities Skills Test (QUEST)	Klingels et al (2008)³³	21	5-8	CP	SEM QUEST: 3.2%	Poor	Fair
			Quality of Upper Extremities Skills Test (QUEST)	Klingels et al (2008)³³	21	5-8	CP	SDD QUEST: 7.1%	Poor	Fair
✓	✓		Melbourne Assessment of Unilateral Upper Limb Function	Cusick et al (2005)³⁵	9	5.42-12.0	CP	SEM < 2.53	Poor	Good
				Klingels et al (2008)³³	21	5-8	CP	SEM Melbourne: 2.6%	Poor	Fair
				Klingels et al (2008)³³	21	5-8	CP	SDD Melbourne: 8.9%	Poor	Fair
	✓		Peabody Developmental Motor Scales 2 Fine Motor (PDMS-2 FM)	Wang et al (2006)⁴¹	32	2.25-5.33	CP	SEM: 1.3%	Fair	Fair
	✓	(✓)	Video Observation Aarts and Aarts module: Determine Developmental Disregard Revised (VOAA-DDD-R)	Houwink et al (2013)⁴⁴	25	2.9-8.0	CP	Capacity: SEM 5.1%; SDD 14.0%	Poor	Fair
								Performance: SEM 4.5%; SDD 12.5%
								Developmental disregard: SEM 6.8%; SDD 19.0%
	✓	(✓)	Assisting Hand Assessment (AHA)	Holmefur et al (2009)⁴⁵	18	2.25-4.92	CP	SEM: sum scores 1.40/logits 0.35	Poor	Fair
			Assisting Hand Assessment (AHA)	Holmefur et al (2009)⁴⁵	18	2.25-4.92	CP	SDD: sum scores 3.89/logits 0.97	Poor	Fair
	✓	(✓)	Observational Skills Assessment Score (OSAS)	Speth et al (2013)⁴⁷	10	2.5-6	CP	% use of both hands: SDD 22.65-30.82	Poor	Fair
								Quality of use affected hand: SDD 0.37-1.11
					16	12-16	CP	% use of both hands: SDD 11.30-14.50
								Quality of use affected hand: SDD 0.10-0.85
		✓	Pediatric Motor Activity Log (PMAL)	Lin et al (2012)⁵⁷	41	2-10	CP	MDC: score 0.66-0.67	Poor	Poor
			Pediatric Motor Activity Log (PMAL)	Lin et al (2012)⁵⁷	41	2-10	CP	Percent 39.02-46.34	Poor	Poor
		✓	Pediatric Motor Activity Log Revise (PMAL-R) How well scale	Uswatte et al (2012)⁵⁴	28	2-8	CP	SEM: 0.15; MDC: 0.42	Poor	Fair
		✓	Pediatric Outcomes Data Collection Instrument Upper Extremity (PODCI UE)	Oeffinger et al (2008)⁵⁸	381	4.25-18.33	CP	?	Poor	Poor

Abbreviations: ICF, International Classification of Functioning, Disability and Health; N, sample size; CP, cerebral palsy; SEM, standard error of measurement; SDD, smallest detectable change; MDC, minimal detectable change; COSMIN, COSMIN score with sample size item; COSMIN*, COSMIN score without sample size item.

Table 4.

Characteristics of Included Studies (Responsiveness).

ICF
Function	Capacity	Performance	Tool	Article	N	Age	Diagnoses	Results	COSMIN	COSMIN*
✓			Accelerometry	Reddihough et al (1991)²⁶	20	6-9	CP	F: 2.08 (P = .139)	Poor	Poor
✓	(✓)		Quality of Upper Extremities Skills Test (QUEST)	Wright et al (2005)⁵⁹	9	Year 1: mean 6.5, SD 0.8	CP	SRM: subscales 0.13-0.63/total 0.72	Poor	Poor
			Quality of Upper Extremities Skills Test (QUEST)	Wright et al (2005)⁵⁹		Year 2: mean 4.6, SD 1.0
	✓		Pediatric Arm Function Test (PAFT)	Uswatte et al (2012)⁴⁰	29	2-6	CP	SRM: 0.73	Poor	Poor
	✓		Peabody Developmental Motor Scales 2 Fine Motor (PDMS-2 FM)	Wang et al (2006)⁴¹	32	2.25-5.33	CP	GRI: 2.3	Poor	Poor
	✓		Peabody Developmental Motor Scales Fine Motor (PDMS-FM)	Wright et al (2005)⁵⁹	9	Year 1: mean 6.5, SD 0.8	CP	SRM: 0.42	Poor	Poor
			Peabody Developmental Motor Scales Fine Motor (PDMS-FM)	Wright et al (2005)⁵⁹		Year 2: mean 4.6, SD 1.0
	✓	(✓)	Besta-Scale	Rosa-Rizzotto et al (2014)⁴⁸	105	1.5-8	CP	?	Poor	Poor
		✓	Pediatric Motor Activity Log (PMAL)	Lin et al (2012)⁵⁷	41	2-10	CP	SRM: 0.89-0.99	Poor	Poor
		✓	Pediatric Motor Activity Log Revise (PMAL-R) How well scale	Uswatte et al (2012)⁵⁴	29	2-6	CP	SRM: 4.4	Poor	Poor
		✓	Pediatric Outcomes Data Collection Instrument Upper Extremity (PODCI UE)	Allen et al (2008)⁶⁰	91	4-19	CP	Surgical intervention group: Upper Extremity: t = −2.38, P = .01	Poor	Poor
✓	✓	✓	Shriners Hospital for Children Upper Extremity Evaluation (SHUEE)	Davids et al (2006)⁵⁶	18	6.33-14.67	CP	Differences pre to post:	Poor	Poor
				Davids et al (2006)⁵⁶				Spontaneous functional analysis: 3.4 ± 5.4 (−5, 13)
								Dynamic positional analysis wrist: 6.4 ± 1.6 (4, 8)
								Grasp and release: 0.07 ± 1.1 (−2, 2)

Abbreviations: ICF, International Classification of Functioning, Disability and Health; N, sample size; CP, cerebral palsy; SRM, standardized response mean; GRI, Guyatt Responsiveness Index; SD, standard deviation; COSMIN, COSMIN score with sample size item; COSMIN*, COSMIN score without sample size item.

An overview of the levels of evidence for reliability, measurement error, and responsiveness of each measurement tool is shown in Table 5, where we also present (a) whether the measurement tool assesses recovery, allows compensatory strategies, or allows compensatory strategies but accounts for it in the scoring process; (b) whether task execution and scoring are performed unimanually or bimanually. Results for the Manual Ability Classification System (MACS) are tabulated separately (Table 6).

Table 5.

Best Evidence Synthesis.

ICF			Measurement Tool	Interrater Reliability		Intrarater Reliability		Test–Retest Reliability		Measurement Error		Responsiveness		Recovery vs Compensation			Task		Scoring
Function	Capacity	Performance	Measurement Tool	CP	OD	CP	OD	CP	OD	CP	OD	CP	OD	Recovery	Adj. Scores	No Diff.	Unimanual	Bimanual	Unimanual	Bimanual
✓			Dynamometry						?^b					✓			✓		✓
✓			Dynamometry (grip strength)	+				?						✓			✓		✓
✓			Modified Ashworth Scale (MAS)	±	?^c			±	?^c					✓			✓		✓
✓			Manual Muscle Testing (MMT)	+				?						✓			✓		✓
✓			Mowery-Classification Videotaped Evaluation	?		?										✓	✓		✓
✓			Reach/pinch/grip function Videotaped Evaluation	?		?								✓			✓		✓
✓			Passive Range of Motion (pROM)	±				?						✓			✓		✓
✓			Active and Passive Range of Motion (a&pROM) Videotaped Evaluation	?		?								✓			✓		✓
✓			Modified Tardieu					±						✓			✓		✓
✓			Tonic stretch reflex threshold					?						✓			✓		✓
✓			Upper Extremity Rating Scale (UERS)	+				+						✓		✓	✓		✓
✓			Accelerometry					-				?				✓	✓		✓
✓	(✓)		Quality of Upper Extremities Skills Test (QUEST)	±	?^a	±		+	?^a	?		?		✓	✓		✓	✓	✓
✓	✓		Melbourne Assessment of Unilateral Upper Limb Function	++		+		?		?					✓		✓		✓
	✓		House Functional Classification (modified)	+				+								✓	✓	✓	✓
	✓		Consolidated House-Classification (videotaped evaluation)	?		?										✓	✓	✓	✓
	✓		Peabody Developmental Motor Scales 2 Fine Motor (PDMS-2 FM)					+		?		?				✓		✓		✓
	✓		Peabody Developmental Motor Scales Fine Motor (PDMS-FM)					?				?				✓		✓		✓
	✓	(✓)	Video Observation Aarts and Aarts module: Determine Developmental Disregard (VOAA-DDD)	?		?		?								✓		✓	✓
	✓	(✓)	Video Observation Aarts and Aarts module: Determine Developmental Disregard Revised (VOAA-DDD-R)	+		+		+		?						✓		✓	✓
	✓	(✓)	Assisting Hand Assessment (AHA)		+^a		+^a	?		?						✓		✓	✓
	✓	(✓)	Observational Skills Assessment Score (OSAS)	?		?		?		?					✓	✓		✓	✓
	✓	✓	Besta Scale	-		+						?				✓	✓	✓	✓
		✓	ABILHAND-Kids					+								✓		✓		✓
		✓	Bimanual Fine Motor Function (BFMF)	?												✓		✓		✓
		✓	Münchner ADL-Fragebogen (M-ADL) “Hand skill in everyday life” subscale	±				+								✓		✓		✓
		✓	Pediatric Motor Activity Log Revise (PMAL-R) How often scale					+		?		?				✓	✓	✓	✓
		✓	Pediatric Motor Activity Log Revise (PMAL-R) How well scale					++		?		?				✓	✓	✓	✓
		✓	Pediatric Outcomes Data Collection Instrument Upper Extremity (PODCI UE)	-						?		?				✓	✓	✓	✓	✓
(✓)	(✓)	(✓)	Shriners Hospitals for Children Cerebral Palsy Computer-Adapted Testing battery Upper Extremities (UE-CAT)	+										(✓)	(✓)	(✓)	(✓)	(✓)	(✓)	(✓)

Abbreviations: ICF, International Classification of Functioning, Disability and Health; CP, cerebral palsy; OD, other diagnoses—(a) acquired brain injury, (b) myelomeningocele, (c) mixed; UL, upper limb; ADL, activities of daily living.

(✓), only few items at or little aspects of this qualifier OR items at this qualifier depend on previously given answers; Adj. scores, adjusted scores, Tool does allow compensation but accounts for it in the scoring process; No diff., no differentiation between physiological or compensatory movements.

Table 6.

Study Characteristics and Best Evidence Synthesis of the Manual Ability Classification System (MACS).

Reliability	Rater	Article	N	Age	Diagnoses	Results	COSMIN	COSMIN*	BES
Interrater	Therapists	Eliasson et al (2006)⁶¹	168	4-18	CP	Therapist–therapist: ICC 0.97 (0.96-0.98)	Fair	Fair	++
		Morris et al (2006)⁶²	14	Mean 9.92, SD 1.92	CP	PTs–OTs: Kappa 1.0	Fair	Fair
		Akpinar et al (2010)⁶³	44	4-18	CP	PTs–PTs: 0.97 (0.95-0.98)	Fair	Fair
	Physicians	Akpinar et al (2010)⁶³	36	4-18	CP	ICC 0.98 (0.97-0.99)	Fair	Fair	+
	Therapists–Parents	Eliasson et al (2006)⁶¹	25	8-12	CP	Therapist–parent: ICC 0.96 (0.89-0.98)	Poor	Fair	+/−
		Jang et al (2013)⁶⁴	69	4-14	CP	OTs–parents: ICC 0.96 (0.94-0.97)	Poor	Poor
		Morris et al (2006)⁶²	59	Mean 9.92, SD 1.92	CP	Parents–PTs: Kappa 0.38	Fair	Fair
			21			Parents–OTs: Kappa 0.32
		Mutlu et al (2011)⁶⁵	100	4-18	CP	Therapist–parents: ICC 0.96 (0.95-0.98)	Fair	Fair
		Riyahi et al (2013)⁶⁶	100	4-18	CP	Parents–OTs: Kappa > 0.85	Good	Good
		Akpinar et al (2010)⁶³	117	4-18	CP	Parents–PTs: 0.89 (0.85-0.92)	Fair	Fair
	Physicians–Parents	Jang et al (2013)⁶⁴	69	4-14	CP	ICC 0.93 (0.90-0.95)	Poor	Poor	+/−
		Morris et al (2006)⁶²	52	Mean 9.92, SD 1.92	CP	Kappa 0.46	Fair	Fair
		Akpinar et al (2010)⁶³	117	4-18	CP	ICC 0.90 (0.86-0.93)	Fair	Fair
	Physicians–Therapists	Jang et al (2013)⁶⁴	69	4-14	CP	Physicians–OTs: ICC 0.96 (0.94-0.97)	Poor	Poor	+/−
		Kuijper et al (2010)⁶⁷	61	5-14	CP	Physician–OT: Kappa 0.86 (0.78-0.94)	Good	Good
		Morris et al (2006)⁶²	40	Mean 9.92, SD 1.92	CP	PTs–pediatricians: Kappa 0.43	Fair	Fair
			13			Pediatricians–OTs: Kappa 0.30
		Akpinar et al (2010)⁶³	117	4-18	CP	Physician–PTs: 0.96 (0.95-0.97)	Fair	Fair
		Plasschaert et al (2009)⁶⁸	30	1-5	CP	Pediatrician–PT: Kappa 0.62 (0.49-76)	Fair	Excellent
Test–Retest	Parents	Akpinar et al (2010)⁶³	87	4-18	CP	ICC 0.91 (0.86-0.94)	Fair	Fair	++
		Imms et al (2010)⁶⁹	86	11-13	CP	ICC 0.92 (0.87-0.95)	Fair	Fair
		Jang et al (2013)⁶⁴	69	4-14	CP	ICC 0.97 (0.95-0.98)	Fair	Fair
		Riyahi et al (2013)⁶⁶	100	4-18	CP	Kappa 0.87	Fair	Fair
	Therapists	Akpinar et al (2010)⁶³	104	4-18	CP	PTs: ICC > 0.96	Fair	Fair	++
		Riyahi et al (2013)⁶⁶	100	4-18	CP	OTs: Kappa 0.87	Good	Good
		Jang et al (2013)⁶⁴	69	4-14	CP	OTs: ICC 0.98 (0.97-0.99)	Fair	Fair
		Öhrvall et al (2014)⁷⁰	1267	4-17	CP	Different raters!: ICC 0.97 (0.97-0.97)	Fair	Fair
		Öhrvall et al (2014)⁷⁰	445	4-17	CP	Different raters!: ICC 0.96 (0.95-0.97)	Fair	Fair
	Physicians	Akpinar et al (2010)⁶³	87	4-18	CP	ICC > 0.97	Fair	Fair	++
	Physicians	Jang et al (2013)⁶⁴	69	4-14	CP	ICC 0.99 (0.98-0.99)	Fair	Fair	++

Abbreviations: N: Sample size, SD: Standard deviation, CP: Cerebral Palsy, ICC: Intraclass correlation coefficient, PT: Physiotherapist, OT: Occupational therapist.

Reliability

Of the 58 studies dealing with reliability, 32 were of fair, 3 of good, and 1 of excellent methodological quality (Table 2). Reasons for the rating of poor methodological quality in 13 articles were insufficient study description,^25,26,38,56 inappropriate statistical methods,^22,36,71 questionable independency of test administration,^{22,31,39,40,72-74} and unsuitable methodology.²⁶ In the remaining 9 studies, methodological quality varied between different kinds of reliability and therefore they could not be classified as a whole (Table 2). The best evidence synthesis did not reveal any tool of strong positive evidence for reliability (Table 5). For the Melbourne Assessment (Melbourne),^33-39 there was moderate positive evidence for interrater reliability. The MACS^61-70 as well as the Pediatric Motor Activity Log (PMAL) “how well” scale^53,54 showed moderate positive evidence for test–retest reliability. Limited positive evidence for interrater reliability was found for dynamometry,^17,18 Modified House Classification (MHC),²⁴ Manual Muscle Testing,¹⁸ Upper Extremity Rating Scale (UERS),²⁴ and the Video Observation Aarts and Aarts module: Determine Developmental Disregard Revised (VOAA-DDD-R).⁴⁴ The Melbourne, the VOAA-DDD-R, the Assisting Hand Assessment (AHA),^45,46 and the Besta-Scale⁴⁸ showed limited positive evidence for intrarater reliability. For test–retest reliability, the MHC, Quality of Upper Extremities Skills Test (QUEST),^28-33 UERS, Peabody Developmental Motor Scales-2–Fine Motor abilities (PDMS-2 FM),⁴¹ VOAA-DDD-R, ABILHAND-Kids,⁵⁰ Münchner ADL-Fragebogen (M-ADL),⁵² the PMAL “how often” scale,⁵³ and the Shriners Hospital for Children CP Computer-Adapted Testing battery–Upper Extremities⁴⁹ showed limited positive evidence. All other assessments (House-Classification,²¹ Modified Ashworth Scale [MAS],^18-20 Mowery Classification,²¹ reach/pinch/grip function,²¹ range of motion^18,21 [modified] Tardieu,^20,22 tonic stretch reflex threshold,²³ Pediatric Arm Function Test,⁴⁰ PDMS-FM,⁴² VOAA-DDD,⁴³ Bimanual Fine Motor Function [BFMF],⁵¹ Pediatric Outcomes Data Collection Instrument Upper Extremity [PODCI UE],⁵⁵ Observational Skills Assessment Score [OSAS],⁴⁷ and Accelerometry^25-27) were found to have unknown, conflicting, or negative levels of evidence for reliability (Table 2).

For the MACS, many different comparisons were drawn to analyze reliability (Table 6). Interrater reliability within a professional category (ie, therapists or physicians) showed a positive level of evidence, whereas analysis of interrater reliability between professional categories or with parents revealed conflicting results. For test–retest reliability, a positive level of evidence was found for parents, therapists, and physicians.

Reliability of kinematic measures was examined in 8 studies.^71-78 However, very different tasks and measurement techniques were used. Therefore, a best evidence synthesis could not be performed.

Measurement Error

Six studies about measurement error were rated as being of fair and 2 of good methodological quality. As independency of test administration was doubtful,^58,73 the statistical analysis inappropriate,⁵⁸ or the description of the study population missing,⁵⁷ 3 studies were rated as being of poor methodological quality (Table 3). None of the studies dealing with measurement error defined a minimal important change. Consequently, no study about measurement error achieved positive or negative ratings as proposed by Terwee et al,¹⁵ and the level of evidence remains unknown. Measurement error of kinematic measures was examined in 2 studies.^73,75 The study of Jaspers et al⁷⁵ was of good methodological quality, but it included only 12 participants and thus the evidence for measurement error of this tool remains unknown.

Responsiveness

With one exception, the methodological quality of all articles about responsiveness was rated as poor. In 6 of 10 studies, statistical methods used to prove responsiveness of the tool were not appropriate. In the remaining studies, essential methodological details were missing^26,60 or no hypotheses about changes in scores were formulated a priori⁵⁹ (Table 4). Hence, no best evidence synthesis could be performed for responsiveness.

Mackey et al⁷⁹ conducted a study of fair methodological quality about the responsiveness of kinematics. However, as they only included 10 patients, the evidence for responsiveness of this measurement remains unclear.

Discussion

The objective of this systematic review was to screen peer-reviewed literature for measurement tools used to assess upper extremity motor function and activities in children with a wide range of central motor disorders. The aim was to offer clinicians as well as researchers an overview about the evidence concerning reliability, measurement error, and responsiveness of measurement tools.

In comparison with previous reviews,^3-5,8,80,81 the current article covered a broader age range and several diagnoses. This ensured a comprehensive overview of the evidence of assessment tools used in clinical everyday life. Furthermore, the COSMIN checklist⁹ was applied to systematically rate the methodological quality of articles, and a best evidence synthesis allowed providing assembled information of single measurement tools.^15,16

In 62 eligible studies reliability and/or measurement error and/or responsiveness was reported for 34 measurement tools. The methodological quality of most of these studies was rated as “fair” using the COSMIN checklist without rating sample size. All but 4 studies comprised exclusively children with CP. One study including 12 participants with MMC investigated the test–retest reliability of dynamometry.¹⁷ Clopton et al¹⁹ examined the interrater and test–retest reliability of the MAS in a mixed patient group. The QUEST was assessed for its interrater and test–retest reliability in a group of patients with ABI,²⁹ and in the same population, Davis et al⁴⁶ studied interrater and intrarater reliability for the AHA. But even only for children with CP, there was no assessment showing at least moderate positive evidence for all psychometric properties examined in this review.

The search revealed tools that measure upper extremity Function and Activity but no tool was found at the ICF Participation level. Participation hardly ever depends solely on upper extremity motor ability but also on many other factors and therefore no Participation measure complied with our inclusion criteria.

Reliability

At the Body Function level, the Melbourne is the most comprehensively studied upper limb assessment for children. It showed moderate positive evidence for interrater and a fair positive level of evidence for intrarater reliability.

As the Melbourne covers the component Body Function, and only partially Capacity, there is no comprehensive Capacity measurement tool with at least moderate positive evidence for reliability.

In line with a previous study of McConnell et al,⁸¹ the MACS was found to be the tool with most published evidence. It shows moderate positive levels of evidence for interrater reliability (when only considering comparisons between therapists) and test–retest reliability. Likewise does the PMAL “how well” scale. Thus, the MACS and the PMAL seem to be the most promising tools to measure Performance with respect to the level of evidence for reliability. However, when measuring rehabilitation outcomes, it has to be considered that the MACS is a classification system and is not designed to measure changes over time.⁸²

Measurement Error

While reliability parameters are highly dependent on the variation in the population sample, agreement parameters, based on measurement error, are more a characteristic of the measurement instrument itself.⁸³ Thus, agreement parameters are more stable over different population samples. Furthermore, as measurement error is expressed on the actual scale of the measurement, clinical interpretation of a change score is straightforward.⁸³

Based on the quality criteria proposed by Terwee et al,¹⁵ results for measurement error can only be rated as positive, when a minimal important change (MIC) is defined and its magnitude exceeds the smallest detectable change. The MIC (also minimally important clinical change) is defined as the minimal change that patients perceive as beneficial,⁸⁴ but there is currently no agreement on the definition of “minimal” and “important.”⁸⁵ Hence, its scientific acceptance is controversial. This might be one of the reasons why in none of the included studies a MIC was determined.

Responsiveness

According to COSMIN guidelines, the appropriate way to quantify responsiveness is to correlate changes in scores of the assessment with changes in scores of a “gold standard” or “external criterion.” As there is a lack of “gold standard” assessments in rehabilitation, an alternative would be to formulate hypotheses about the direction and magnitude of the correlation a priori. Reports about responsiveness were mostly rated as “poor” because neither a “gold standard” was available nor were hypotheses formulated. Recently, the approach to prove responsiveness proposed in the COSMIN guidelines has been put into question, as it confronts traditional methods.^11,86

Recovery Versus Compensation

Depending on the severity of impairment, patients might improve their independency in daily life by either using compensatory movement patterns or via the reduction in impairment through (re-)appearance of motor patterns present prior to the injury of the central nervous system (ie, recovery).⁸⁷ To avoid confusion in the interpretation of the efficacy of different treatment interventions, Levin et al⁸⁷ proposed a distinction of recovery/compensation at the Body Function/Structure and Activity levels. Consequently, motor measures should be selected in order to differentiate between desirable physiological motor patterns and compensatory strategies.

At the ICF Body Function level, the focus lies more on quality of movement than on movement outcome or task performance. Accordingly, several tools at this level measure physiologically desirable movements and thus recovery. The Mowery Classification, Accelerometry, and one UERS item do not exclude compensatory movements. Neither does the reach/pinch/grip function Videotaped Evaluation but its scoring system accounts for compensation.

As at the ICF Activity level the emphasis lies more on task accomplishment, most of the evaluations at this level do not determine how the task is completed. Accordingly, all included assessments at the Activity level permit patients to use compensatory strategies. Only the OSAS accounts for compensation.

The Melbourne and the QUEST cannot be exclusively classified as either a Body Function or Activity measure. Scoring systems of both assessments account for compensatory movements and the QUEST additionally comprises several items measuring recovery.

Compensatory strategies help patients in their activities of daily life but may also be associated with long-term problems and potentially lead to learned nonuse.⁸⁸ Therefore, therapeutic interventions should exploit the full potential of recovery, and efforts need to be made to provide the appropriate assessments. To date, there is no upper extremity tool with at least moderate positive evidence for reliability, measurement error, or responsiveness that measures recovery.

Unilateral Versus Bimanual Motor Measures

Interestingly, all measures at the Body Function level are measured and scored unilaterally, whereas at the Activity level all include bimanual items. This can be explained with the fact that strength, range of motion, spasticity, and so on, are Body Functions that are often unilaterally measured. In contrast, the focus of the Activity level lies on task execution and therefore assessments at this ICF component comprise usually bimanual elements.

The original and second edition of the PDMS-FM, the ABILHAND-Kids, BFMF, M-ADL, and PODCI UE focus on general age-appropriate activities of daily living and do not differ between the left arm and the right arm. The 2 House Classifications, the original and revised VOAA-DDD, the AHA, OSAS, Besta Scale, and PMAL-R “how often” and “how well” scales are bimanual tests, but as they were developed for hemiplegic patients, only the affected upper extremity is scored. Even though most of these assessments have been developed for a specific patient group (eg, unilateral CP), we believe that after validation, they could also be used for other patients with similar functional deficits (eg, unilateral stroke). For instance, the AHA has originally been developed for children with hemiplegic CP but recently has been used and psychometrically analyzed in children with acquired brain injuries.⁴⁶

The Melbourne and the QUEST are tested unilaterally (the latter also includes a few bimanual items), and the scoring is performed for each side separately.

Activity measures should be chosen depending on whether the focus lies on general changes of task execution or on performance alteration of the assisting hand. However, so far only the PMAL-R “how well” scale, which focuses on the weaker arm/hand, shows moderate positive evidence for one of the psychometric properties (ie, test–retest reliability) examined in this study.

Methodological Considerations

The COSMIN checklist was developed for the quality scoring of self-reported questionnaires. However, it is also suitable for other assessments. Recently, several reviews about assessments in the pediatric field used the COSMIN checklist.^11,89-91 Nevertheless, despite the extensive guidelines, there were some items where subjective interpretation was inevitable. Consider the following example about the MACS, which is a 5-level classification system: When scored twice by the same therapist at an interval of approximately 2 weeks, the ratings probably are independent and the time interval can be rated as appropriate, because therapists see several children per day, which impedes recall bias. In contrast, if parents score their own child twice, 2 weeks apart, they will most likely remember their rating of time point one because the MACS is a classification system with only one single score. In this case, the time interval would have to be rated “not appropriate.”

Furthermore, studies investigated the long-term stability of the MACS^69,70 and of dynamometry.¹⁷ As long-term stability of an assessment tool is not acknowledged as a psychometric property, neither the reliability box (the intermediate time between ratings would be scored as poor) nor the box for responsiveness (most of the patients should not show improvements) can satisfactorily cope with such studies.

Another difficulty of the COSMIN scoring is the concept of responsiveness that differs from traditional methods.⁸⁶ Consequently, statistical methods are rated inappropriate in some included studies as they used the standardized response mean without formulating hypotheses about its expected size to measure responsiveness. The advised alternative—comparing improvements with improvements in other non–gold standard assessments—might also not be appropriate, as the alternative outcomes could even be less responsive.

As our first aim was to get an oversight of available assessment tools, we chose a 2-tiered search⁸ rather than a validated search filter (eg, as proposed by Terwee et al⁹²). To evaluate the influence of the search strategies, we compared their results for one database (ie, Medline) of a randomly chosen year (ie, 2008). No substantial differences were found comparing the 2 strategies, and we therefore conclude that our comprehensive 2-tiered search did not miss any relevant measurement tools.

Other than proposed by the COSMIN guidelines, we did not rate the sample size item in the quality assessment but accounted for it at the best evidence synthesis. As in neuropediatric studies sample sizes usually are rather small, by choosing this approach we could augment the available evidence by more than 60%.

For some measurement tools, the psychometric properties of different versions or subscales have been investigated separately. In these cases, a best evidence synthesis was not applicable.

Conclusion

On the one hand, due to the lack of articles about psychometric properties in the population of interest, poor-quality studies, and small sample sizes, we cannot give substantial recommendations about which upper limb measurement tools should be used. On the other hand, no tool was found to have distinct negative evidence for reliability, measurement error, or responsiveness. Thus, we cannot conclude that the existing measurement tools are unsound. Furthermore, we only evaluated reliability, measurement error, and responsiveness, whereas information about validity, clinical utility, practicability, and costs are needed as well for thorough evaluation. The most promising assessments with respect to reliability are the Melbourne at the ICF Body Function level (with items at Capacity level) and the PMAL “how well” scale at the Performance level. The MACS appears to be the most reliable upper limb classification system (ICF Performance level).

Summing up, before investing in randomized controlled trials, researchers in the pediatric field dealing with upper extremities should work on high-quality psychometric studies. To date, trials on upper limb rehabilitation in children and adolescents risk being biased by insensitive measurement tools lacking reliability.

Footnotes

Appendix A

Appendix B

Levels of Evidence for the Overall Quality of the Measurement Properties.

Level	Rating^a	Criteria
Strong	+++ or −−−	Consistent findings in multiple studies of good methodological quality OR in one study of excellent methodological quality
Moderate	++ or −−	Consistent findings in multiple studies of fair methodological quality OR in one study of good methodological quality
Limited	+ or −	One study of fair methodological quality
Conflicting	±	Conflicting findings
Unknown	?	Only studies of poor methodological quality

+ = Positive results; − = Negative results.

Adapted from Van Tulder et al.¹⁶

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Clinical Research Priority Program (CRPP) Neuro-Rehabilitation of the University of Zurich (Switzerland); the Fondation Gaydoul (Zurich, Switzerland); the Swiss National Science Foundation (Grant 32003B_156646); and the Mäxi-Foundation (Zurich, Switzerland).

References

Kirshner

Guyatt

. A methodological framework for assessing health indices. J Chronic Dis. 1985;38:27-36.

Stille

Turchi

Antonelli

. The family-centered medical home: specific considerations for child health research and policy. Acad Pediatr. 2010;10:211-217.

Klingels

Jaspers

Van de Winckel

De Cock

Molenaers

Feys

. A systematic review of arm activity measures for children with hemiplegic cerebral palsy. Clin Rehabil. 2010;24:887-900.

Debuse

Brace

. Outcome measures of activity for children with cerebral palsy: a systematic review. Pediatr Phys Ther. 2011;23:221-231.

Wagner

Davids

. Assessment tools and classification systems used for the upper extremity in children with cerebral palsy. Clin Orthop Relat Res. 2012;470:1257-1271.

World Health Organization. International Classification of Functioning, Disability and Health (ICF). http://www.who.int/classifications/icf/en/. Accessed February 2, 2015.

Mokkink

Terwee

Patrick

. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737-745.

Greaves

Imms

Dodd

Krumlinde-Sundholm

. Assessing bimanual performance in young children with hemiplegic cerebral palsy: a systematic review. Dev Med Child Neurol. 2010;52:413-421.

Mokkink

Terwee

Patrick

. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19:539-549.

10.

Terwee

Mokkink

Knol

Ostelo

RWJG

Bouter

de Vet

HCW

. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21:651-657.

11.

Ammann-Reiffer

Bastiaenen

CHG

de Bie

van Hedel

HJA

. Measurement properties of gait-related outcomes in youth with neuromuscular diagnoses: a systematic review. Phys Ther. 2014;94:1067-1082.

12.

Mokkink

Terwee

Patrick

. The COSMIN checklist manual. http://cosmin.nl. Accessed February 2, 2015.

13.

Bartels

de Groot

Terwee

. The six-minute walk test in chronic pediatric conditions: a systematic review of measurement properties. Phys Ther. 2013;93:529-541.

14.

Dobson

Hinman

Hall

Terwee

Roos

Bennell

. Measurement properties of performance-based measures to assess physical function in hip and knee osteoarthritis: a systematic review. Osteoarthritis Cartilage. 2012;20:1548-1562.

15.

Terwee

Bot

SDM

de Boer

. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34-42.

16.

van Tulder

Furlan

Bombardier

Bouter

. Updated method guidelines for systematic reviews in the Cochrane collaboration back review group. Spine (Phila Pa 1976). 2003;28:1290-1299.

17.

Effgen

Brown

. Long-term stability of hand-held dynamometric measurements in children who have myelomeningocele. Phys Ther. 1992;72:458-465.

18.

Klingels

De Cock

Molenaers

. Upper limb motor and sensory impairments in children with hemiplegic cerebral palsy. Can they be measured reliably? Disabil Rehabil. 2010;32:409-416.

19.

Clopton

Dutton

Featherston

Grigsby

Mobley

Melvin

. Interrater and intrarater reliability of the Modified Ashworth Scale in children with hypertonia. Pediatr Phys Ther. 2005;17:268-274.

20.

Numanoğlu

Günel

. Intraobserver reliability of modified Ashworth scale and modified Tardieu scale in the assessment of spasticity in children with cerebral palsy. Acta Orthop Traumatol Turc. 2012;46:196-200.

21.

Waters

Zurakowski

Patterson

Bae

Nimec

. Interobserver and intraobserver reliability of therapist-assisted videotaped evaluations of upper-limb hemiplegia. J Hand Surg Am. 2004;29:328-334.

22.

Gracies

Burke

Clegg

. Reliability of the Tardieu Scale for assessing spasticity in children with cerebral palsy. Arch Phys Med Rehabil. 2010;91:421-428.

23.

Jobin

Levin

. Regulation of stretch reflex threshold in elbow flexors in children with cerebral palsy: a new measure of spasticity. Dev Med Child Neurol. 2000;42:531-540.

24.

Koman

Williams

RMM

Evans

. Quantification of upper extremity function and range of motion in children with cerebral palsy. Dev Med Child Neurol. 2008;50:910-917.

25.

Reddihough

Court

Evans

Hudson

. Objective assessment of limb movement in children with cerebral palsy. Aust Paediatr J. 1987;23:289-291.

26.

Reddihough

Bach

Burgess

Oke

Hudson

. Comparison of subjective and objective measures of movement performance of children with cerebral palsy. Dev Med Child Neurol. 1991;33:578-584.

27.

Reddihough

Bach

Burgess

Oke

Hudson

. Objective test of the quality of motor function of children with cerebral palsy: preliminary study. Dev Med Child Neurol. 1990;32:902-909.

28.

Haga

van der Heijden-Maessen

van Hoorn

Boonstra

Hadders-Algra

. Test-retest and inter- and intrareliability of the quality of the upper-extremity skills test in preschool-age children with cerebral palsy. Arch Phys Med Rehabil. 2007;88:1686-1689.

29.

Sakzewski

Ziviani

Van Eldik

. Test/retest reliability and inter-rater agreement of the Quality of Upper Extremities Skills Test (QUEST) for older children with acquired brain injuries. Phys Occup Ther Pediatr. 2001;21:59-67.

30.

Sorsdahl

Moe-Nilssen

Strand

. Observer reliability of the Gross Motor Performance Measure and the Quality of Upper Extremity Skills Test, based on video recordings. Dev Med Child Neurol. 2008;50:146-151.

31.

Thorley

Lannin

Cusick

Novak

Boyd

. Reliability of the quality of upper extremity skills test for children with cerebral palsy aged 2 to 12 years. Phys Occup Ther Pediatr. 2012;32:4-21.

32.

DeMatteo C, Law M, Russell D, et al. The reliability and validity of the Quality of Upper Extremity Skills Test. Phys Occup Ther Pediatr. 1993;13:1-18.

33.

Klingels

De Cock

Desloovere

. Comparison of the Melbourne Assessment of Unilateral Upper Limb Function and the Quality of Upper Extremity Skills Test in hemiplegic CP. Dev Med Child Neurol. 2008;50:904-909.

34.

Bard

Chaléat-Valayer

Combey

Bleu

Perretant

Bernard

. Upper limb assessment in children with cerebral palsy: translation and reliability of the French version for the Melbourne Unilateral Upper Limb Assessment (test de Melbourne). Ann Phys Rehabil Med. 2009;52:297-310.

35.

Cusick

Vasquez

Knowles

Wallen

. Effect of rater training on reliability of Melbourne Assessment of Unilateral Upper Limb Function scores. Dev Med Child Neurol. 2005;47:39-45.

36.

Johnson

Randall

Reddihough

Oke

Byrt

Bach

. Development of a clinical assessment of quality of movement for unilateral upper-limb function. Dev Med Child Neurol. 1994;36:965-973.

37.

Randall

Carlin

Chondros

Reddihough

. Reliability of the Melbourne Assessment of Unilateral Upper Limb Function. Dev Med Child Neurol. 2001;43:761-767.

38.

Spirtos

O’Mahony

Malone

. Interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function for children with hemiplegic cerebral palsy. Am J Occup Ther. 2011;65:378-383.

39.

Jayaraman

Puckree

. A pilot study on the test re-test and the inter-rater reliability of the Melbourne Assessment of Unilateral Upper Limb Function. South African J Physiother. 2009;65:17-20.

40.

Uswatte

Taub

Griffin

Rowe

Vogtle

Barman

. Pediatric Arm Function Test: reliability and validity for assessing more-affected arm motor capacity in children with cerebral palsy. Am J Phys Med Rehabil. 2012;91:1060-1069.

41.

Wang

Liao

Hsieh

. Reliability, sensitivity to change, and responsiveness of the Peabody Developmental Motor Scales-Second Edition for children with cerebral palsy. Phys Ther. 2006;86:1351-1359.

42.

Russell

Ward

Law

. Test-retest reliability of the fine motor scale of the Peabody developmental motor scales in children with cerebral palsy. Occup Ther J Res. 1994;14:178-182.

43.

Aarts

PBM

Jongerius

Geerdink

Geurts

. Validity and reliability of the VOAA-DDD to assess spontaneous hand use with a video observation tool in children with spastic unilateral cerebral palsy. BMC Musculoskelet Disord. 2009;10:145.

44.

Houwink

Geerdink

Steenbergen

Geurts

ACH

Aarts

PBM

. Assessment of upper-limb capacity, performance, and developmental disregard in children with cerebral palsy: validity and reliability of the revised Video-Observation Aarts and Aarts module: Determine Developmental Disregard (VOAA-DDD-R). Dev Med Child Neurol. 2013;55:76-82.

45.

Holmefur

Aarts

Hoare

Krumlinde-Sundholm

. Test-retest and alternate forms reliability of the assisting hand assessment. J Rehabil Med. 2009;41:886-891.

46.

Davis

Galvin

Soo

. Reliability of the assisting hand assessment (AHA) for children and youth with acquired brain injury. Brain Impair. 2010;11:113-124.

47.

Speth

Janssen-Potten

Leffers

. Observational skills assessment score: reliability in measuring amount and quality of use of the affected hand in unilateral cerebral palsy. BMC Neurol. 2013;13:152.

48.

Rosa-Rizzotto

Visonà Dalla Pozza

Corlatti

. A new scale for the assessment of performance and capacity of hand function in children with hemiplegic cerebral palsy: reliability and validity studies. Eur J Phys Rehabil Med. 2014;50:543-556.

49.

Haley

Chafetz

Tian

. Validity and reliability of physical functioning computer-adaptive tests for children with cerebral palsy. J Pediatr Orthop. 2010;30:71-75.

50.

Arnould

Penta

Renders

Thonnard

. ABILHAND-Kids: a measure of manual ability in children with cerebral palsy. Neurology. 2004;63:1045-1052.

51.

Randall

Harvey

Imms

Reid

Lee

Reddihough

. Reliable classification of functional profiles and movement disorders of children with cerebral palsy. Phys Occup Ther Pediatr. 2013;33:342-352.

52.

Blank

. Measurement of activities of daily living in children—standardisation of a screening questionnaire. Klin Pädiatr. 2007;219:32-36.

53.

Wallen

Bundy

Pont

Ziviani

. Psychometric properties of the Pediatric Motor Activity Log used for children with cerebral palsy. Dev Med Child Neurol. 2009;51:200-208.

54.

Uswatte

Taub

Griffin

Vogtle

Rowe

Barman

. The pediatric motor activity log-revised: assessing real-world arm use in children with cerebral palsy. Rehabil Psychol. 2012;57:149-158.

55.

Gates

Otsuka

Sanders

McGee-Brown

. Functioning and health-related quality of life of adolescents with cerebral palsy: self versus parent perspectives. Dev Med Child Neurol. 2010;52:843-849.

56.

Davids

Peace

Wagner

Gidewall

Blackhurst

Roberson

. Validation of the Shriners Hospital for Children Upper Extremity Evaluation (SHUEE) for children with hemiplegic cerebral palsy. J Bone Joint Surg Am. 2006;88:326-333.

57.

Lin

Chen

. Validity, responsiveness, minimal detectable change, and minimal clinically important change of the Pediatric Motor Activity Log in children with cerebral palsy. Res Dev Disabil. 2012;33:570-577.

58.

Oeffinger

Bagley

Rogers

. Outcome tools used for ambulatory children with cerebral palsy: responsiveness and minimum clinically important differences. Dev Med Child Neurol. 2008;50:918-925.

59.

Wright

Boschen

Jutai

. Exploring the comparative responsiveness of a core set of outcome measures in a school-based conductive education programme. Child Care Health Dev. 2005;31:291-302.

60.

Allen

Gorton

Oeffinger

Tylkowski

Tucker

Haley

. Analysis of the pediatric outcomes data collection instrument in ambulatory children with cerebral palsy using confirmatory factor analysis and item response theory methods. J Pediatr Orthop. 2008;28:192-198.

61.

Eliasson

Krumlinde-Sundholm

Rösblad

. The Manual Ability Classification System (MACS) for children with cerebral palsy: scale development and evidence of validity and reliability. Dev Med Child Neurol. 2006;48:549-554.

62.

Morris

Kurinczuk

Fitzpatrick

Rosenbaum

. Reliability of the manual ability classification system for children with cerebral palsy. Dev Med Child Neurol. 2006;48:950-953.

63.

Akpinar

Tezel

Eliasson

Icagasioglu

. Reliability and cross-cultural validation of the Turkish version of Manual Ability Classification System (MACS) for children with cerebral palsy. Disabil Rehabil. 2010;32:1910-1916.

64.

Jang

Sung

Kang

. Reliability and validity of the Korean version of the manual ability classification system for children with cerebral palsy. Child Care Health Dev. 2013;39:90-93.

65.

Mutlu

Kara

Gunel

Karahan

Livanelioglu

. Agreement between parents and clinicians for the motor functional classification systems of children with cerebral palsy. Disabil Rehabil. 2011;33:927-932.

66.

Riyahi

Rassafiani

Akbar Fahimi

Sahaf

Yazdani

. Cross-cultural validation of the Persian version of the Manual Ability Classification System for children with cerebral palsy. Int J Ther Rehabil. 2013;20:19-25.

67.

Kuijper

van der Wilden

Ketelaar

Gorter

. Manual ability classification system for children with cerebral palsy in a school setting and its relationship to home self-care activities. Am J Occup Ther. 2010;64:614-620.

68.

Plasschaert

VFP

Ketelaar

Nijnuis

Enkelaar

Gorter

. Classification of manual abilities in children with cerebral palsy under 5 years of age: how reliable is the Manual Ability Classification System? Clin Rehabil. 2009;23:164-170.

69.

Imms

Carlin

Eliasson

. Stability of caregiver-reported manual ability and gross motor function classifications of cerebral palsy. Dev Med Child Neurol. 2010;52:153-159.

70.

Öhrvall

Krumlinde-Sundholm

Eliasson

. The stability of the Manual Ability Classification System over time. Dev Med Child Neurol. 2014;56:185-189.

71.

Mackey

Walt

Lobb

Stott

. Intraobserver reliability of the modified Tardieu scale in the upper limb of children with hemiplegia. Dev Med Child Neurol. 2004;46:267-272.

72.

Kawamura

Klejman

Fehlings

. Reliability and validity of the kinematic dystonia measure for children with upper extremity dystonia. J Child Neurol. 2012;27:907-913.

73.

Lempereur

Brochard

Mao

Rémy-Néris

. Validity and reliability of shoulder kinematics in typically developing children and children with hemiplegic cerebral palsy. J Biomech. 2012;45:2028-2034.

74.

Butler

Rose

. The pediatric upper limb motion index and a temporal-spatial logistic regression: quantitative analysis of upper limb movement disorders during the Reach & Grasp Cycle. J Biomech. 2012;45:945-951.

75.

Jaspers

Feys

Bruyninckx

. The reliability of upper limb kinematics in children with hemiplegic cerebral palsy. Gait Posture. 2011;33:568-575.

76.

Mackey

Walt

Lobb

Stott

. Reliability of upper and lower limb three-dimensional kinematics in children with hemiplegia. Gait Posture. 2005;22:1-9.

77.

Reid

Elliott

Alderson

Lloyd

Elliott

. Repeatability of upper limb kinematics for children with and without cerebral palsy. Gait Posture. 2010;32:10-17.

78.

Schneiberg

McKinley

Gisel

Sveistrup

Levin

. Reliability of kinematic measures of functional reaching in children with cerebral palsy. Dev Med Child Neurol. 2010;52:e167-e173.

79.

Mackey

Miller

Walt

Waugh

Stott

. Use of three-dimensional kinematic analysis following upper limb botulinum toxin A for children with hemiplegia. Eur J Neurol. 2008;15:1191-1198.

80.

Gilmore

Sakzewski

Boyd

. Upper limb activity measures for 5- to 16-year-old children with congenital hemiplegia: a systematic review. Dev Med Child Neurol. 2010;52:14-21.

81.

McConnell

Johnston

Kerr

. Upper limb function and deformity in cerebral palsy: a review of classification systems. Dev Med Child Neurol. 2011;53:799-805.

82.

Harvey

. Stability of parent-reported manual ability and gross motor function classification of cerebral palsy. Dev Med Child Neurol. 2010;52:114-115.

83.

De Vet

HCW

Terwee

Knol

Bouter

. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59:1033-1039.

84.

Jaeschke

Singer

Guyatt

. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407-415.

85.

Gatchel

Lurie

Mayer

. Minimal clinically important difference. Spine (Phila Pa 1976). 2010;35:1739-1743.

86.

Angst

. The new COSMIN guidelines confront traditional concepts of responsiveness. BMC Med Res Methodol. 2011;11:152.

87.

Levin

Kleim

Wolf

. What do motor “recovery” and “compensation” mean in patients following stroke? Neurorehabil Neural Repair. 2009;23:313-319.

88.

Taub

Uswatte

Elbert

. New treatments in neurorehabilitation founded on basic research. Nat Rev Neurosci. 2002;3:228-236.

89.

Dekkers

KJFM

Rameckers

EAA

Smeets

RJEM

Janssen-Potten

YJM

. Upper extremity strength measurement for children with cerebral palsy: a systematic review of available instruments. Phys Ther. 2014;94:609-622.

90.

Balemans

Fragala-Pinkham

Lennon

. Systematic review of the clinimetric properties of laboratory- and field-based aerobic and anaerobic fitness measures in children with cerebral palsy. Arch Phys Med Rehabil. 2013;94:287-301.

91.

Bar-On

Aertbeliën

Molenaers

Dan

Desloovere

. Manually controlled instrumented spasticity assessments: a systematic review of psychometric properties. Dev Med Child Neurol. 2014;56:932-950.

92.

Terwee

Jansma

Riphagen

de Vet

HCW

. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18:1115-1123.

Reliability and Responsiveness of Upper Limb Motor Assessments for Children With Central Neuromotor Disorders

Abstract

Keywords

Introduction

Methods

Search Strategy

Search 1: Identification of Measurement Tools

Search 2: Selecting Psychometric Studies for Determining the Level of Evidence for Reliability, Measurement Error, and Responsiveness

Quality Assessment

Level 1: Quality Rating of Individual Studies

Level 2: Best Evidence Synthesis for Each Measurement Tool

Results

Identification of Measurement Tools

Evidence for Reliability, Measurement Error, and Responsiveness

Reliability

Measurement Error

Responsiveness

Discussion

Reliability

Measurement Error

Responsiveness

Recovery Versus Compensation

Unilateral Versus Bimanual Motor Measures

Methodological Considerations

Conclusion

Footnotes

Appendix A

Appendix B

Declaration of Conflicting Interests

Funding

References