Abstract
Background:
There is limited research on responsiveness of prosthetic rehabilitation outcome measures.
Objectives:
To examine responsiveness of the Box and Block test, Jebsen–Taylor Hand Function tests, Upper Extremity Functional Scale, University of New Brunswick skill and spontaneity tests, Activity Measure for Upper Limb Amputation, and the Patient-Specific Functional Scale.
Study design:
This was a quasi-experimental study with repeated measurements in a convenience sample of upper limb amputees.
Methods:
Measures were collected before, during, and after training with the DEKA Arm.
Results:
Largest effect sizes were observed for Patient-Specific Functional Scale (effect size: 1.59, confidence interval: 1.00, 2.14), Activity Measure for Upper Limb Amputation (effect size: 1.33, confidence interval: 0.73, 1.90), and University of New Brunswick skill test (effect size: 1.18, confidence interval: 0.61, 1.73). Other measures that were responsive to change were Box and Block test, Jebsen–Taylor Hand Function light and heavy can tests, and University of New Brunswick spontaneity test. Responsiveness and pattern of responsiveness varied by prosthetic level.
Conclusions:
The Box and Block test, Jebsen–Taylor Hand Function light and heavy can tests, University of New Brunswick skill and spontaneity tests, Activities Measure for Upper Limb Amputation, and the Patient-Specific Functional Scale were responsive to change during prosthetic training. These findings have implications for choice of measures for research and practice and inform clinicians about the amount of training necessary to maximize outcomes with the DEKA Arm.
Clinical relevance
Findings on responsiveness of outcome measures have implications for the choice of measures for clinical trials and practice. Findings regarding the responsiveness to change over the course of training can inform clinicians about the amount of training that may be necessary to maximize specific outcomes with the DEKA Arm.
Background
Systematic tracking and evaluation of outcome measures can be used to assess treatment effectiveness, monitor function, and assess value of rehabilitation services for persons with upper limb amputation. Measurement of functional status during rehabilitation is increasingly valued by health-care providers, researchers, policy makers, and the payer community. Evaluation of treatment outcomes is a basic tenet of evidence-based care. 1 Studies of treatment efficacy and comparative effectiveness are often predicated on use of measures that assess important domains in a scientifically sound way. Measurement of outcomes is needed to evaluate effectiveness of interventions as well as the necessary dosage and timing of treatment.
In 2013, the Center for Medicare and Medicaid Services (CMS) implemented a mandatory functional status reporting system for outpatient therapy services for Medicare patients. Functional outcomes assessment is included as part of Physician Quality Reporting System (PQRS) for Medicare patients. 2 Unlike inpatient rehabilitation, settings that use CMS-prescribed data elements for prospective payment, in the Medicare reporting systems providers choose the measure of function that they feel is most appropriate for their patients.
For these reasons, it is critical that research be conducted to provide clinicians and researchers with the data to guide selection and interpretation of outcome measures. Leaders in upper limb prosthetic rehabilitation understand the importance of choosing outcome measures that have been evaluated for persons with upper limb amputation and that are reliable, valid, and responsive to change.3 –6 However, limited research on measurement properties of existing measures and lack of measures that assess important domains have created challenges for the field.6 –10
Our group has conducted several studies to examine measurement properties of the measures utilized in the Department of Veterans Affairs (VA) Study to Optimize the DEKA Arm.11,12 We also designed and tested a new measure to fill the gap in activity performance measures for adults with upper limb amputation. 13 We reported test–retest reliability of the Box and Block Test of Manual Dexterity (BB),14 –16 the Jebsen–Taylor Hand Function (JTHF) test, 17 the Activities Measure for Adults with Upper Limb Amputation (AM-ULA), 11 the University of New Brunswick (UNB) measure of prosthetic skill and spontaneity, 11 and two self-report measures of function, the Upper Extremity Functional Scale (UEFS),9,18 and the Patient-Specific Functional Scale (PSFS). 19 We also examined inter-rater reliability of all performance measures rated by clinicians, 11 and known group validity of the UNB 11 measure and AM-ULA. 11
However, we have not yet reported on measure responsiveness to change during prosthetic training. Therefore, the purpose of this study was to examine responsiveness of outcome measures utilized in the VA Study to Optimize the DEKA Arm. We aimed to (1) compare responsiveness of measures; (2) determine whether responsiveness varied by level of prosthesis, and (3) examine whether measures plateaued or continued to improve over the course of training. We hypothesized that the majority of measures would be responsive to training, and that responsiveness would vary by prosthetic level and amount of training.
Methods
This was a multi-site study with repeated measurements of subjects. Data were collected at four sites: Department of Veterans Affairs New York Harbor Health Care System (VA NYHHS), James Haley VA, Long Beach VA, and Center for the Intrepid. The study received Institutional Review Board approval at all study sites.
Sample
Subjects were a convenience sample of upper limb amputees participating in the VA Study to Optimize the DEKA Arm. 20 Subjects were eligible if they were at least 18 years old and had single or bilateral transradial, transhumeral, shoulder disarticulation or forequarter level amputation. Subjects were excluded if they had significant uncorrectable visual deficits, major communication or neurocognitive deficits, skin conditions prohibiting prosthetic wear, or had an electrically controlled medical device. Subjects were enrolled based upon their availability, availability of a DEKA Arm at the necessary level, and the desire to balance the sample by amputation level and gender.
Subjects were recruited by clinicians, emails, press releases, flyers, and brochures. All were trained to utilize the DEKA Arm, a pre-commercial upper limb prosthetic prototype, funded by the Defense Advanced Research Projects Agency (DARPA) Revolutionizing Prosthetics Program. 21 The DEKA Arm comes in three configurations, or levels: the radial configuration (RC), used for transradial amputees; the humeral configuration (HC), used for transhumeral amputees; and the shoulder configuration (SC), used for persons with forequarter amputation, shoulder disarticulation, or very short transhumeral amputation. The SC DEKA Arm has 10 degrees of powered movement. All levels have six pre-programmed handgrips and are operated by a combination of methods that may include foot controls, optional EMGs, pressure switches, or other commonly available prosthetic input elements. A detailed description of the DEKA Arm and its features can be found elsewhere. 21
Training to use the DEKA Arm
Subjects were oriented to the device features and controls through an interactive Virtual Reality Environment (VRE) program 22 and then trained by the study therapists using a standardized protocol, described in detail elsewhere. 23 An overview of the training protocol is shown in Table 1. The first step in training process is prosthetic fitting and basic controls set-up. During this phase, the prosthetist and user determine the initial control scheme that the prosthetist will use to configure the DEKA Arm.The subject then practices activating each control and identifying the associated action until it is clear that he or she clearly understands the control for each Arm action. A picture of the controls is created and given to the user, who is instructed to review the controls handout each night until he or she demonstrates consistent memory recall for device control. The next segment within the training protocol is pre-prosthetic training, which includes instruction about the features of the device and simulated use of the DEKA Arm within a VRE. When VRE training is complete and the user is comfortable with basic operations, training with the activated DEKA Arm begins.
Comparison of training protocol by level of DEKA Arm.
OT: Occupational Therapist; RC: radial configuration; SC: shoulder configuration; IMU: inertial measurement unit; VRE: virtual reality environment; VEP: voluntary elbow positioning; ADL: activities of daily living; HC: humeral configuration; ROM: range of motion; EMG: myoelectrode.
Training with the DEKA Arm itself began with reinforcement of prosthetic control patterns of motions, control mechanisms, and safety features. Training progresses from simple movement activation drills and grasp and release activities to performance of increasingly complex unilateral and bilateral activities such as opening a door with a knob, cutting meat with knife and fork, folding a bath towel, and reaching overhead to grasp an object. Training sessions also included practice of a required list of activities which were identical to items in the UEFS measure. As training progressed, less time was spent on controls training and grasp and release activities and more focus was directed to activities of daily living (ADL) as well as advanced unilateral and bimanual activities. Advanced training included performance of short-term projects, such as preparing a simple meal, specific recreational tasks or games of the subject’s choosing such as putting a golf ball, or completing a model-building project. Training sessions also included time to use the DEKA Arm without any instruction, but still under close supervision by the therapist.
For most subjects, this progression of training took place over the course of 10 2-h sessions. Partway through the study, the training protocol was adapted for SC subjects by adding five additional training visits (10 h) to allow this group of amputees more time to master the more complex SC device. Of the 14 people using an SC Arm, eight completed 15 training visits.
Data collection
Outcome measures were administered at the onset of training, after five training visits (10 h of training) and at final testing (18 h of training or more). All performance-based tests were administered by Occupational Therapists (OTs). The dexterity tests were administered by OTs and timed by research assistants. OTs were trained in the test administration methods by the first author. Performance measures were scored at the time of data collection analysis.11,13 Subjects completed self-report measures by paper and pencil.
Measures
The following measures were utilized:
BB.14 –16 It consists of a box with a center partition. Small wooden blocks were placed in one side of the box and the subject was asked to use the device to grasp one block at a time, transport it over the partition, and release it. The number of blocks transported to the other side in 60 s was counted. Therefore, transporting more boxes indicated better dexterity. Test–retest reliability of the BB was 0.91. 12
JTHF test. 17 It is a seven-part dexterity test that evaluates the time needed to perform seven hand-related tasks including (1) printing a 24-letter, third-grade reading difficulty sentence, (2) turning over 7.6 × 12.7 cm (3 × 5) cards in simulated page turning, (3) picking up small common objects (including pennies, paper clips, bottle caps) and placing them in a container, (4) stacking checkers, (5) simulated feeding, (6) moving large empty cans, and (7) moving large 1 lb cans. Each subtest is scored separately. We modified the test administration and scoring method by capping the maximal allowable time for each subtask at 2 min. The score was calculated as the number of items completed per second. Therefore, completing more items per second indicated better function. Test–retest reliability for JTHF test components was 0.68–0.92, with the number of checkers/second test having the lowest intraclass correlation coefficient (ICC) of 0.68 (confidence interval (CI): 0.49–0.80). 12
AM-ULA. It is an 18-item measure that assesses functional performance with a prosthesis: the ability of the amputee to complete daily activities, the speed of the performance, the movement quality, skillfulness of prosthetic use, and independence. 13 Higher AM-ULA scores indicate better performance. The AM-ULA has excellent test–retest reliability, inter-rater reliability, internal consistency, and demonstrated known group validity. 13
UNB Test of Prosthetic Function for Unilateral Amputees. It includes a spontaneity of prosthetic use (Spontaneity) and a skillfulness of prosthetic use (Skill.) scale. 24 Higher scores indicate better performance. We used a subtest of the UNB designed for 11–13 year olds 11 that included wrapping a parcel, sewing a button on cloth, cutting meat, drying dishes, and sweeping floors. Analyses of the UNB test found that the subtests had acceptable internal consistency, test–retest and inter-rater reliability, and preliminary evidence of validity. 11
UEFS. It is from the Orthotics and Prosthetics Users Survey (OPUS).9,18 The UEFS items ask clients to evaluate the ease of performing 23 activities, including self-care and instrumental daily living tasks, using a 5-point scale from “1” very easy to “5” cannot perform. We used a modified 22-item version of the UEFS, omitting the one item related to washing and recalibrated the scores using WINSTEPS. Lower scores indicated higher functioning. Test–retest reliability of the modified UEFS was 0.80. 12
The UEFS questionnaire also asks respondents to indicate whether or not they usually perform each of the activities using their prosthesis (or orthosis). We scored the UEFS Use scale by calculating the proportion of activities that the subject indicated that they performed using the prosthesis. To increase the validity of self-report, we collected the UEFS after subjects had attempted to perform all activities in the measure. Thus, we did not gather it at baseline testing with the DEKA Arm, only at 10 of training and final testing.
PSFS. It asks subjects to identify up to five activities that they have difficulty performing due to their condition. 19 Subjects then rated the amount of limitation they have in performing these activities on a scale of 0–10, with “0” being unable to perform the activity and “10” being able to perform the activity with no problem. Therefore, higher scores indicate better functioning. Individual items were scored separately.
Statistical analyses
Descriptive analyses examined the mean and standard deviation (SD) of scores for each outcome measure using matched subject data for each testing period. We calculated the effect sizes (ESs) which are the difference of two means divided by an estimate of the SD (the pooled SD in this case) for the population with 95% CIs for each outcome measure. In our calculations, we used data from the entire sample and then repeated the calculations, stratified by level of amputation. To determine responsiveness of measures over the course of prosthetic training, we examined ES from baseline to 20-h training, from baseline to 10-h training, and from 10-h training to final testing. An ES of 0.80 was considered large, 0.50 moderate, and <0.50 small. 25
Results
A total of 39 subjects were fit with a DEKA Arm (Table 2). Of these, 12 were fit with an RC DEKA Arm, 13 were fit with an HC, and 14 were fit with an SC. All 39 completed some baseline testing. 38 subjects completed some measures after 10 h of training, and 32 subjects completed some of the measures at final testing. On occasion, a subject failed to complete all outcome measures at each testing session, due to fatigue or scheduling constraints. Therefore, the numbers of subjects included in each comparison are specified in Table 3.
Characteristics of subjects trained with the DEKA Arm N = 39.
SD: standard deviation; RC: radial configuration; HC: humeral configuration; SC: shoulder configuration.
Means and SD of outcome measures used in calculation of effect size, all levels of amputation.
SD: standard deviation; UNB: University of New Brunswick; AM-ULA: Activities Measure for Upper Limb Amputation; UEFS: Upper Extremity Functional Scale; PSFS: Patient-Specific Functional Scale.
Means and SDs of scores at each testing period are shown in Table 3. ESs and CIs for change between baseline and final testing for the entire sample are shown in Figure 1. Dexterity, as measured by the BB test and the JTHF light and heavy can tests, improved from baseline to final testing, as did UNB skill, UNB spontaneity, AM-ULA, and PSFS. The largest ESs were observed for the PSFS (ES: 1.59, CI: 1.00, 2.14), the AM-ULA (ES: 1.33, CI: 0.73, 1.90), and the UNB prosthetic skill test (ES: 1.18, CI: 0.61, 1.73).

Effect size and 95% confidence intervals for change between baseline and final testing: all subjects.
Figures 2 and 3 show that responsiveness of measures between baseline and final testing training varied by level of DEKA Arm. The JTHF page turning had a large ES for SC users (ES: 1.02, CI: 0.21, 1.78) but a moderate ES (0.65, 0.60), with CIs that crossed 0, for RC and HC subjects, respectively. All JTHF tests (ES: −0.12 to 0.67), the UNB spontaneity test (ES: 0.98), and the AM-ULA (ES: 0.78) had CIs that crossed 0 for RC subjects. The JTHF heavy can test showed a large effect for HC subjects (ES: 1.35, CI: 0.12, 2.41). The AM-ULA (ES: 1.96, CI: 0.88, 2.89) and the PSFS (ES: 1.63, CI: 0.73, 2.43) had the largest ES for SC users.

Effect sizes and 95% confidence intervals for Box and Block and JTHF: Comparisons by prosthetic level, baseline–final testing, baseline to 10-h training, and 10 h to final testing.

Effect sizes and 95% confidence intervals for JTHF, UNB, AM-ULA, and PSFS: Comparisons by prosthetic level, baseline–final testing, baseline to 10-h training, and 10 h to final testing.
Figure 4 shows the ES and CIs for all measures calculated from data from all subjects at baseline and 10 h of training, and Figure 5 shows this ES from 10 h and final testing. For all subjects combined, the BB was moderately responsive to change from baseline to 10 h training (ES: 0.74, CI: 0.26, 1.20), but not responsive from 10 h to final testing (ES: 0.12, CI: −0.38, 0.60). A moderate ES for UNB skill was observed in the first 10-h training (ES: 0.68, CI: 0.17, 1.17) and between 10-h and final testing (ES: 0.53, CI: 0.01, 1.104). A moderate ES for PSFS was observed in the first 10 h of training (ES: 0.64, CI: 0.16, 1.11) and a large ES observed in the second half of training (ES: 0.89, CI: 0.39, 1.37). The UEFS (ES: −0.24, CI: −0.73, 0.25) and UEFS Use scales (ES: 0.41, CI: −0.09, 0.90) were not responsive to change between 10 h of training and final testing for all levels combined (Figure 5).

Effect size and 95% confidence intervals for change between baseline and 10 h of training: all subjects.

Effect size and 95% confidence intervals for change between 10 h of training and final testing: all subjects.

Effect size and 95% confidence intervals for change of the UEFS and UEFS Use scales between 10 h of training and final testing by level.
The pattern of measure responsiveness over the course of training varied by level of DEKA Arm configuration (Figures 2 and 3). For SC users, the JTHF heavy can test had an ES of 0.44 (CI: −0.32, 1.18) between baseline and 10 h of training, and an ES of 0.89 (CI: 0.09, 1.64) between 10 h and final testing. UNB skill showed a large ES for SC users between baseline and 10 h of training (ES: 0.90, CI: 0.10, 1.65), and 0.43 ES (CI: −0.33, 1.17) after 10 h of training. The AM-ULA had a large ES between baseline and 10 h of training for SC users (ES: 1.51, CI: 0.51, 2.39) and a 0.59 ES (CI: −0.28, 1.42) after 10 h of training. The PSFS had a large ES between baseline and 10 h training for RC users but not for other levels (ES: 1.03, CI: 0.11, 1.88), and a large ES for SC users between 10 h and final testing for SC users, but not for other levels (ES: 1.17, CI: 0.34, 1.93).
Discussion
We examined the ability of a set of outcome measures to measure functional change in upper limb amputees after occupational therapy training to utilize the DEKA Arm. Ours is the first study of its kind to evaluate responsiveness of measures of upper limb prosthetic rehabilitation. We found that for all prosthetic levels combined, the BB, JTHF light and heavy can tests, UNB skill and spontaneity tests, AM-ULA, and PSFS were responsive to change. However, the remaining JTHF tests and the UEFS were not responsive.
Some of the differences we observed in responsiveness of measures by type of DEKA Arm may be attributable to the differing complexity of learning to use each configuration. Users of the RC DEKA Arm must memorize and utilize controls for 6 degrees of freedom (hand open/close, wrist flexion/extension, and wrist pronation/supination in addition to the control for grip selection. In contrast, users of the HC Arm must learn to utilize these same controls plus controls for elbow flexion/extension, humeral internal/external rotation, as well as a mode selection to switch between hand/wrist movements and arm movements. Finally, users of the SC Arm had the greatest number of controls to master, as they must learn to control the hand/wrist movements, elbow in and out, as well as six Endpoint control movements (up/down, forward/backward, left/right).
Generally speaking, dexterity measures were more responsive to change in SC users than in RC and HC users. This may be because RC and HC users were able to master the basic controls necessary to perform the dexterity tests during VRE and brief exposure to live training, which happened prior to initial testing. Therefore, training had little to no impact on dexterity. However, SC users, who had to learn to operate a more complex set of controls, required additional training to maximize dexterity. There may be a similar explanation for the finding that prosthetic spontaneity and activity performance, as measured by the UNB spontaneity and AM-ULA tests, did not significantly improve after baseline testing for RC subjects. Another explanation is that RC users had higher scores on tests of spontaneity and activity performance with their existing prostheses as compared to SC and HC users (findings reported elsewhere). 26 Because they were already accustomed to utilizing a prosthesis in bilateral and everyday tasks, there was less room for improvement.
Our study provided interesting information about the timing of gains made in dexterity and activity performance over time for each level of DEKA Arm user. Users of all levels achieved greater change in dexterity (Box and Block), skill, and activity performance in the first 10 h of training than they did in the second 10 h (as indicated by bigger ES of measures). This finding suggests that the greatest gains in dexterity, skill, and activity performance are made in this early training period. However, perceived difficulty in patient-specific activity performance (PSFS) improved the most for RC users during the first 10 h and during the second 10 h for SC users, demonstrating the added value of more hours of training for patients with the most proximal level amputation.
Our findings have implications for the choice of measures for clinical trials and practice. They can also inform clinicians about the amount of training that may be necessary to maximize specific outcomes with the DEKA Arm. For the DEKA Arm, it appears that dexterity is maximized very early in training, whereas the performance of self-selected functional activities continues to improve with more training visits. However, this finding is not surprising, and is, in fact, consistent with the emphasis used in our training protocols and those recommended by others that emphasize simple fine motor tasks at the initial phases of rehabilitation, progressing to more complex activities, and then self-selected activities as patients gain skill proficiency. 27 Further research is needed to determine whether a similar phenomenon is observed when training users with other prostheses.
Our study has several limitations. First, our sub-analyses of ES by level of the DEKA Arm involved small samples, resulting in wide CIs around ES estimates, which may have led us to erroneously conclude that some ESs were not significant. That said, none of the ES which had CIs containing zero were considered large, leading us to conclude that none of these measures were among the most responsive.
Second, there may be limits to generalizability of findings of measure responsiveness for training with upper limb prosthetic devices other than the DEKA Arm. The DEKA Arm has some unique features, including multiple grip patterns and the need to toggle between grips to select the appropriate one, which may affect dexterity potential. Additional studies need to be performed to corroborate our ES findings during prosthetic training with other types of devices.
Another limitation is that our subjects had a short introduction to the DEKA Arm controls using VRE and brief “live” training prior to baseline testing with the DEKA Arm. This orientation was conducted in order to ensure safety of users before testing with a novel device. Therefore, our subjects were not absolute naive users of the DEKA Arm. Thus, the initial training effect may not have been captured and ESs estimated using baseline testing data and subsequent testing data are likely to be underestimated.
Finally, we were unable to calculate the ES for the UEFS and UEFS Use scales from baseline to end of training, because we did not administer this measure at baseline testing. Therefore, our estimates of ES reflected only those changes from 10-h training to final testing, and therefore, we recognize that these are likely underestimates of the full training effect.
Conclusion
We found that the BB, JTHF light and heavy can tests, UNB skill and spontaneity tests, AM-ULA, and the PSFS were responsive to change during prosthetic training, and the PSFS, the AM-ULA, and the UNB prosthetic skill test were the most responsive to change. However, the responsiveness varied by prosthetic level. None of the JTHF tests (ES: −0.12 to 0.67), nor the UNB spontaneity test or the AM-ULA were responsive to change in transradial amputees. Dexterity measures appear to be most responsive during the early phase of prosthetic training, while measures of activity performance and skill were responsive throughout the entire period of prosthetic training.
More study is needed to examine the responsiveness of the UEFS. These findings have implications for choice of measures for research and practice and inform clinicians about the amount of training necessary to maximize outcomes with the DEKA Arm.
Footnotes
Author contribution
Linda Resnik obtained funding, was the study Principal Investigator, designed the study, directed the analyses, and participated in drafting the article. Matthew Borgia participated in the data analysis, manuscript preparation and review.
Declaration of conflicting interests
None declared.
Funding
This research was supported by VA RR&D, VA RR&D A6780 and VA RR&D A6780I DEKA’s support of the VA optimization studies was sponsored by the Defense Advanced Research Projects Agency and the U.S. Army Research Office.
