Abstract
The International Classification of Functioning, Disability and Health (ICF) has been recommended as a framework for evaluation of aspects of health. The aim of this study was to compare the contents of outcome measures for upper limb prosthesis users by using the ICF. Measurement focus and psychometric properties of these measures were also investigated. Outcome measures that used upper limb prosthesis users as subjects in their development and psychometric evaluations were selected. The psychometric studies (n = 14) were reviewed and scored and the items in the measures were linked to the ICF. One measure for all ages (ACMC), five paediatric measures (CAPP-FSI, CAPP-PSI, PUFI, UBET and UNB) and two adult measures (OPUS and TAPES) were selected. The concepts extracted (n = 393) were linked to 54 categories in the ICF. The ACMC, CAPP-FSI, UBET, UNB and PUFI measure categories mostly under the ICF component ‘Activity and participation’. The TAPES and OPUS also measure ICF categories that describe the emotional and social status of a person. The main conclusion is that the use of a mixture of outcome measures would give a better picture on the aspects of our clients. Measures that focus on the social interaction in paediatric users are required.
Introduction
In recent years, researchers and clinicians have become increasingly interested in functional outcomes among users of upper limb prostheses. Evaluations of these outcomes have been performed from different perspectives, including prosthetic management and usage, 1–3 prosthesis satisfaction, stump pain and prosthesis use in different activities 4–7 and effectiveness of prosthetic training. 8–10 However, in only two of the above studies were validated outcome measures for upper limb prostheses used in the evaluations of the functional outcomes, 8,9 the other studies relied on study-specific surveys or tests that had not been tested psychometrically on upper limb prosthesis users. An issue arises here, as also discussed earlier by Wright 11 and Biddiss and Chau. 12 The study-specific questionnaires that are designed for specific clinical purposes do not always have proven psychometric properties and hence it is not certain whether the scores obtained are valid and truly reflect the client's status. In their review on prosthesis use over the last 25 years, Biddiss and Chau 12 emphasized the importance of future use of standardized outcome measures with well-defined vocabularies to study this diverse population. This was also acknowledged in two recent reviews of outcome measures. 13,14
The use of outcome measures that are psychometrically sound is important, as the conclusions drawn from quality measurements influence our service, prescribing, policy making, and the expenditure of public funds. 15 The quality of an outcome measure lies in its ability to produce consistent results (reliability) that reflect the conceptual basis that we intend to measure (validity). 16 The reliability and validity of an outcome measure are sample dependent and hence it is important to examine its psychometric properties in different samples before any conclusion on the psychometric evidence can be drawn about the outcome measure concerning a particular diagnostic group. 15 Furthermore, the ability to detect clinical change, i.e. the responsiveness of an outcome measure, is useful for clinicians that would like to measure the changes in the client's status after a certain type of treatment or after the prescription of a new prosthetic device. The scores obtained should be able to help the clinicians to make clinical decisions. 17 In other words, what does the score tell us about a client's status? How much change in a score is needed to indicate that there is a real change in the client's status? Another important factor that influences the clinical utility of an outcome measure is the length of time required to administer the outcome measure. A time-consuming test, even with good psychometric properties, will be less useful in the busy clinical routine nowadays. 14
The interpretations of the terminologies used in the outcome measures can also influence the selection of an outcome measure. The newly established Upper Limb Prosthetic Outcome Measures (ULPOM) group commented on the different interpretations of terminologies by different professions. Occupational therapists consider ‘function’ to be the person's ability, but the prosthetists and engineers refer to ‘function’ as the technical performance of a device. 14 The developers of two different measures may use the same term “function”, but with different interpretations. Hence, before choosing a test to measure the prosthetic ‘function’ of our clients, it is important for the clinicians to study carefully the aim of the measure and the ‘prosthetic function’ that the test is trying to measure.
One way to compare outcome measures that use the same terminologies is to link the content of the measures to the International Classification of Functioning, Disability and Health (ICF). 18 This was suggested earlier in studies on outcome measures for upper limb prostheses users. 13,14 The ICF was developed by the World Health Organization with the aim of creating a common language for different professionals to describe health and health-related status. 18 ICF classifies human functioning into four components: ‘Body functions and structures’, ‘Activities and participation’, ‘Environmental factors’ and ‘Personal factors’. The first three components are well subclassified into domains and categories. Clinicians from different fields that have linked similar outcome measures to the ICF language commented that the linking process can facilitate the selection of an appropriate outcome measure and identify the aspects of health that are measuring or lacking in those measures. 19–21
In the last decade, several questionnaires and assessments have been designed and validated on persons with upper limb prostheses. To encourage the use of validated outcome measures in prosthetic rehabilitation, there is a need for a review that takes the above-mentioned aspects into account and suggests relevant recommendations for the future development and use of outcome measures in upper limb prosthetics. In a recent review, comments on the clinical utility of the tests were given but these comments were not based on systematic evaluations. 14 This makes the results more difficult to interpret. In another recent review, 17 outcome measures that different researchers had used to evaluate the outcomes in upper limb amputees and/or upper limb prosthesis users were included. 13 However, some of these measures were developed by researchers using humans with normal hand function 22 whereas others were originally designed for human hand but used to evaluate the outcomes of upper limb prosthesis. 23,24 It is a common knowledge that the types of grasps available in a prosthetic device are quite different from a human hand. Thus, the measures that are originally designed for human hands might not necessary be appropriate to measure the functional outcomes in upper limb prosthesis users. This has been acknowledged by the developer of one of the above mentioned tests, the Assisting Hand Assessment (AHA). This instrument is originally designed to measure human hand function but is currently undergoing the development of new items that is suitable for upper limb prosthesis users and the psychometric evaluation of this version is under way. 25 Therefore, with this review, we would like to focus only on the instruments that are originally designed and psychometrically tested on upper limb prosthesis users.
The overall aim of this literature review was therefore to identify and compare the contents of outcome measures for upper limb prosthesis users. Specific objectives of this review were:
To identify the primary focus, target client group, and clinical utility of different types of outcome measures; To examine the psychometric properties of the identified outcome measures; To link the items or questions of the outcome measures to the respective ICF categories.
Method
According to the Swedish law SFS: 2003:460, an ethical review is required for research involving humans or biological material from humans (§1–3). As this is a literature review, the Regional Ethics Committee review board of Uppsala, Sweden, decided that no ethical approval is needed for this study.
Search strategy and selection criteria
Free-word and MeSH searches for scientific publications on upper limb outcome measures were performed in the databases AMED, CINAHL and MEDLINE. Truncations and different combinations of the following key words were used: Prosthesis, artificial limb, upper limb, upper extremity, instrument, assessment, outcome, measure, scale, survey and questionnaire. All English language scientific publications studying upper limb prostheses dated from 1985 onwards were considered. Conference proceedings were excluded from considerations. A total of 211 studies were found in the electronic search, including duplicated studies. Sixty-eight studies remained after removal of the duplicates.
The next step was to identify validated outcome measures from the 68 studies found in the electronic search. The selections were based on two criteria:
The outcome measure should be designed to assess or evaluate the functional outcome among upper limb prosthesis users. Functional outcome could include factors such as function, acceptance, usage, satisfaction, adaptation, and ability; and The psychometric properties of the outcome measure should have been evaluated on upper limb prosthesis users. Outcome measures that did not use upper limb prosthesis users as subjects in their psychometric evaluations were excluded.
Eight outcome measures were identified (see Results). For seven of them the psychometric results have been published in peer-reviewed journals. These studies were included in this review. For the eighth outcome measure the psychometric results was published in the test manual. Test manuals were downloaded or obtained from the developers. A total of 14 psychometric studies and five test manuals were used in this review. 24,26–43
Data extraction
The type of outcome measure, its primary measurement focus, target client group, and clinical utility were extracted from the studies and manuals. The clinical utility was considered to be indicated by the content, instructions, and duration of administration. The psychometric properties of the outcome measures were extracted from the 14 psychometric studies. Different types of reliability and validity were evaluated in these studies and they are defined briefly as followed:
Test-retest reliability is used to assess the consistency of a client's score from one time to another. A reliability coefficient, e.g., the intra-class correlation coefficient (ICC), is often used to assess this type of reliability.
44,45
Inter- & intra-rater reliability is used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon and to determine whether consistent results are produced by one and the same rater. ICC, kappa, or Pearson coefficients are often used to assess this type of reliability.
44,46
Internal consistency is used to assess the degree to which all the items are related to each other. Cronbach's alpha (in classical statistical theory) or principal components analysis plus fit statistics (in modern test statistics) are often used to evaluate this property.
47,48
Content validity concerns the questions whether the outcome measure contains a comprehensive sample of items or questions that completely assesses the domain.
44
Construct validity is defined as the degree to which the outcome measure adequately measures the underlying construct. Discriminant validity is a type of construct validity in which the instrument is able to discriminate between different groups. There is no single measure of construct validity; rather, it is evaluated through an accumulation of tests.
44
Responsiveness refers to the ability of an outcome measure to detect a real change. This can be evaluated using different statistical parameters, such as the Minimal Detectable Change (MDC), minimally important clinically difference (MICD), effect size and standardized response mean.
44
A critical appraisal form by MacDermid 17 was used to assess the research methodologies used in the psychometric studies. The 12 questions in the form were scored on a three-graded rating scale (2, 1 and 0). Each grade comes with a detailed description for each question to guide the user in deciding the score. Two independent reviewers (1st and 2nd authors) appraised the studies separately and any disagreement in the assessments between the two reviewers was discussed with the third author until consensus was achieved. The interpretation of the MacDermid rating scale was that 2 = acceptable, 1 = questionable, and 0 = not acceptable. In order to decide on an overall impression of the psychometric merit of each outcome measure, we decided that for an instrument to be recommended the majority of the questions should be scored a ‘2’, i.e., achieve 18–24 points, or 75–100% of the total.
Linking the content of the outcome measures to the ICF categories
There are 438 items/questions in the eight outcome measures and some items are common in the different age or parent/child versions of the same measure. To reduce the numbers, the first author started with removing the duplicate items/questions. After this, 369 items/questions remained. The linking process was then conducted according to the linking rules suggested by ICF 18 and Cieza et al. 49,50 The first author started with the extraction of meaningful concepts from every item/question. Thereafter, to ensure that the concepts were linked to the most relevant ICF categories, the extracted concepts were confirmed by the third author. According to the linking rules, more than one meaningful concept can be extracted from one item/question. An example is the item in the ACMC ‘coordinating both hands during grasping’. 28 Two meaningful concepts were extracted from this item: ‘hand coordination’ and ‘grasping’.
Since the prosthetic handgrip is a replacement of the human hand, the first and the third authors decided to link ‘grasping’ with the use of a prosthetic hand to the ICF category ‘d4401 Grasping’, which is a function of a human hand. As further pointed out by Dr Cieza, 51 the primary focus of an outcome measure should be considered first before the extracted concepts are linked to any ICF category. For example, many items in the identified outcome measures are “child play activities” such as assembly of toys, separation of Play-Dough or riding a Scooter. These items are play activities but they are not designed primarily to assess whether one person can engage in play or not but to evaluate the ability to use the prosthetic handgrip in simple tasks such as assembly of toys. Hence, these items are not suitable to link to the ICF category ‘d9200 Play’ since ‘Play’ is defined in the ICF as ‘engaging in games with rules or unstructured or unorganized games and spontaneous recreation.’ These “child play activities” are therefore linked to three ICF categories: ‘d2100 undertaking a simple task’, ‘d440 fine hand use’, and ‘d445 hand and arm use’.
Results
Measures for children and adolescents
Six paediatric outcome measures were identified: Assessment of Capacity for Myoelectric Control (ACMC), 26–29 Child Amputee Prosthetics Project-Functional Status Inventory (CAPP-FSI), 30–32 Child Amputee Prosthetics Project-Prosthesis Satisfaction Inventory (CAPP-PSI), 33 Prosthetic Upper Extremity Functional Index (PUFI), 24,36–38 Unilateral Below Elbow Test (UBET), 24,41 and University of New Brunswick test of Prosthetic Function (UNB). 42,43
Type of outcome measure, primary focus, client group, and clinical utility
The type of outcome measure, primary focus, client group, and clinical utility of each outcome measure are presented in Table I. The ACMC, UNB and UBET are observational measures designed for clinicians to assess the user's ability to use the prosthetic handgrip. The UBET 24,41 and UNB 42,43 assess the use of the handgrip with age-appropriate bimanual tasks, whereas the ACMC 26–29 evaluates the use of the handgrip in any client-chosen everyday activity. The ACMC has been reduced to 24 items with a revised rating scale structure, 28 and new tasks have been added to the UNB. 43
An overview of the instruments that are developed and validated to measure the outcomes of upper limb prostheses.
Both CAPP-FSI 30–32 and PUFI 24,36–38 are designed for clients under age 18 and they are quite similar in terms of their activities. However, the three age-versions of CAPP-FSI are only answered by the parent, whereas PUFI is answered by either the parent or the child, or both. Furthermore, the prosthesis use in the CAPP-FSI is expressed in the percentage of time, whereas in the PUFI scores it is expressed in the degree of difficulty and usefulness. The CAPP-PSI 33 measures the clinical service provided and it focuses on the delivery, repair and instructions given to clients.
Measures for adults
Three outcome measures were identified: Assessment of Capacity for Myoelectric Control (ACMC), 26–29 Orthotics and Prosthetics Users’ Survey (OPUS), 34,35 and Trinity Amputation and Prosthesis Experience Scales (TAPES). 39,40
Type of outcome measure, primary focus, client group, and clinical utility
The ACMC is the only instrument that is suitable for both paediatrics and adults because the client is allowed to choose any bimanual activity for the assessment. The multi-dimensional OPUS and TAPES are constructed to measure different areas in prosthesis users. 34,35,39,40 The TAPES is constructed with special focus on the adjustment of adult amputees. Among other things, the OPUS measures the clinical service. The questions in OPUS focus on the prosthetist's courtesy and respect, and involvement of the client in decision-making. 35
Most of the client-rated questionnaires take less time as compared with the observational measures. All of them provide clear instructions for the raters and the clients. The ACMC provides a course for training clinicians as an ACMC rater. 29
Psychometric properties of the outcome measures
A summary of psychometric properties of the outcome measures is presented in Table II. The number of participants in the reliability studies ranged from 20 24 to 101 39 and the number of participants in the validity studies ranged from 20 24 to 210. 26 The rating scales of the eight outcome measures are either ordinal or nominal or a mixture of both. Both parametric and non-parametric statistical tests were used to analyse the ordinal data.
A summary of psychometric properties of the outcome measures.
Rater or observer reliability was evaluated in the observational assessments ACMC, UBET and UNB. 26,41,42 Internal consistency or test-retest reliability was evaluated in CAPP- FSI all three age versions, CAPP-PSI, PUFI, TAPES and UBET. 30–33,36,39,42 Different types of validity were evaluated in the eight measures. The functioning of rating scale categories was analyzed in ACMC and OPUS-UEFS. 28,35
None of the outcome measures had been systematically evaluated regarding its responsiveness using the statistical parameters normally applied in the evaluation of responsiveness of an instrument. The ACMC was assessed for its ability to measure change among new and experienced prosthesis users over an 18-month period and a change in the ability to control the prosthetic hand was detected mostly among the new prosthesis users. 27
The appraisal of the research methodologies used in the 14 psychometric studies is presented in Table III. The psychometric result of UNB in the test manual was excluded from this appraisal because it is not published in a peer-reviewed journal. Most of the studies presented clearly the relevant background and the measurement procedures. Only in three validity studies was the importance of sample size mentioned 28,37,39 and only in two of these were sample size calculations performed in order to show adequate statistical power for drawing statistical inferences about the validity of the results. 28,37 In three test-retest reliability studies, the participants were appropriately re-tested. 24,36,41 Seven studies used parametric tests such as independent t-tests or intra-class coefficients in some part of their analyses to test the differences between items or groups, which is not the best choice to analyze ordinal data. 24,30–33,36,41 The results of the psychometric grading showed that for three instruments, the ACMC, the OPUS, and the TAPES, all studies received more than 75% of the total scores, and for one test, the PUFI, a majority of the studies received more than 75% of the total scores (Table III). These instruments are, thus, recommended for use.
Critical appraisal of study design for psychometric studies based on suggestions by MacDermid. 17
Higher scores positive. n/a = not applicable; Note: UNB test is not included because it is not published in a peer-reviewed journal.
ICF categories in the measures
A total of 393 concepts were extracted from 369 items/questions, and 388 concepts (98.7%) could be linked to the respective ICF categories. The frequencies with which ICF categories were addressed in the eight outcome measures are listed in Table IV. For example, the ‘OPUS-health related quality of life’ module has 10 items that contain concepts concerning the emotions such as full of life, happy, nervous, depressed, and so on. These concepts are all linked to the ‘b152 Emotional functions’.
Frequencies with which categories in three ICF components are addressed in the items/questions.
A total of 54 ICF categories were used to link the concepts and 45 of these categories come under the component ‘Activity and Participation’. The component ‘Body Functions’ is covered by ACMC, OPUS and TAPES. The ACMC covers the coordination of voluntary movements such as left and right hand coordination. The OPUS-quality of life module and the TAPES-psychosocial adjustment cover ‘b152 Emotional functions’, which include both positive and negative emotions. The TAPES is the only instrument that covers ‘b1801 Body image’.
The concept “prosthesis” is linked to the ICF category “e1151 Assistive products and technology for personal use in daily living”, which is classified under the component “Environmental Factors”. The CAPP-PSI, OPUS and TAPES together have 30 items on prosthesis satisfaction and service and these are all linked to the component “Environmental Factors”.
Fifteen concepts are not covered by the ICF (2.3%) and they are listed in Table V. These concepts are categorized as non-covered (nc) in the ICF. For example, in the ACMC, concepts such as different positions, timing, the need of arm support, and adjustment of grip force are designed specifically to assess how the prosthesis user operates the prosthetic hand in any client-chosen activity. These concepts are too specific to be covered by the ICF. The concept ‘amputee’ is a condition of an individual but not the functioning of a person hence it is not covered by the ICF. The condition or disease of a person, e.g., amputation, is classified under the international classification of diseases (ICD).
Concepts in the outcome measures that are not covered by the ICF.
ACMC = Assessment of Capacity for Myoelectric Control; OPUS = Orthotics and Prosthetics Users’ Survey; TAPES = Trinity Amputation and Prosthesis Experience Scales.
Discussion
This review consists of an analysis of outcome measures that have been developed to evaluate the functional outcomes among upper limb prosthesis users. In short, each of the identified outcome measure is designed with a special focus and no one outcome measure covers all aspects of human functioning as demonstrated in the linking to the ICF categories. Since we have selected only the outcome measures that used upper limb prosthesis users as subjects in their psychometric evaluations, there are much fewer instruments in this review than the two reviews by Wright 13 and Hill. 52 However, the result is in concordance with their reviews.
Although we selected only eight outcome measures for this review, we agree with Wright 13 that many hand function measures, such as the Southampton Hand Assessment Procedure (SHAP), 22 Box and Block test, 53 Jebsen Taylor hand function test 54 and Assisting Hand Assessment (AHA), 55 which are designed primarily for measuring hand function, are potentially useful measures for upper limb prosthetics. Validations of some of these measures on upper limb prosthesis users are underway. For example, a recent study 56 demonstrates the potential of SHAP as being capable of identifying functional abilities in prosthetic hands, which is an indication of future application of this measure. We look forward to future studies on the application of other hand function tests to prosthetic hands.
The MacDermid appraisal form 17 was chosen in this present review to evaluate the studies because the questions are tailor-made for psychometric studies of outcome measures applied in rehabilitation. The questions were very useful in guiding us to investigate how different researchers evaluated the instruments. Most of these studies had small sample sizes. The main reason for this could be that the upper limb prosthetic group is a very small patient group. Multi-centre studies with data from different clinics or countries would increase the sample sizes and is, thus, recommended for future studies. Parametric statistical tests were used in seven studies to analyse their ordinal ranked data. Non-parametric statistical methods are usually used to analyze data that are in rank order (such as no difficulty, some difficulty and so on) because it is not sure if this type of data is normally distributed. The authors of these papers did not state the assumption of a normally distributed data and hence we do not think using ICC or independent t-tests were appropriate. For instance, weighted kappa is a measure of the agreement (reliability) of ordinal data and the intraclass correlation is often used as a measure of reliability in quantitative scales. These two measures are known to be used on the respective scale types and should not be used interchangeably. 57
Development and further testing of the quality of outcome measures for upper limb prosthesis users is ongoing. For example, in a recent publication 58 an interesting application of the PUFI is suggested. This may have an impact on future use of this measure. Clinicians have used the identified measures in this review to evaluate different clinical aspects in their clinics, such as comparing different types of devices, 59 effect of prosthetic training, 9 impact of prosthetic functioning, 60 meaning of amputation. 61 For an outcome measure to be clinically usable, the magnitude of score change between re-assessments should be able to reflect a real change in the client's clinical status. None of the identified outcome measures clearly state this magnitude, i.e., how large the difference should be to demonstrate a true change. Further research on this issue is thus recommended, since this will help the clinician to evaluate any improvement after intervention and to plan further treatment goals.
Primary focus and linking to the ICF
It cannot be assumed that outcome measures that have the same primary focus will provide similar kinds of clinical data. For example, according to the primary focus, both the OPUS and CAPP-PSI have their primary focus on clinical service, but their questions are quite different. Moreover, although it has been validated for upper limb amputees, the TAPES-activity restriction scale contains activities such as walking and climbing. 39 Also, one cannot assume that outcome measures that contain a selection of activities or tasks actually measure the activity or task performance. For example, the UNB test provides a large selection of tasks, but the test actually measures the person's skill and spontaneity in prosthesis use. Thus, before selecting an outcome measure, we would recommend that the clinicians carefully read the items/questions and the scoring structure of the outcome measures in order to understand what the measures are actually measuring.
The list of ICF categories in Table IV shows the areas intended to be measured by the outcome measures. This list helps us to understand the similarities and differences between the measures. For example, the CAPP-FSI, PUFI, UBET and UNB cover mostly the same categories and all are under the ICF component ‘Activity and participation’. Moreover, ACMC, OPUS and TAPES cover more than one ICF component, indicating that these outcome measures cover a wider dimension than the other outcome measures identified in this review. The OPUS and TAPES cover stump pain, employment, and social relationships, which are all relevant issues among upper limb amputees. The TAPES is the only outcome measure that covers ‘body image’, which is an important aspect in the acceptance and continued use of the device. The list in Table IV also suggests the aspects of health that are lacking in the outcome measures. Aspects such as emotional functions, psychosocial adjustment, body image and social interaction are important for both paediatric and adult prosthesis users, but only TAPES and OPUS cover these aspects. The TAPES is designed for adults and the validity of OPUS – health-related quality of life module has only been evaluated in adult lower limb prosthesis users. 34 Since there is a need for such an outcome measure for paediatric users, further validation of the OPUS for paediatric upper limb prosthesis users is thus encouraged.
It is interesting to point out that the term prosthesis is classified under ‘Environmental factors’ in the ICF, according to which an environmental factor can be a facilitator or a barrier to different persons in different situations. 18 An upper limb prosthesis may be a facilitator or a barrier, depending on the activities. The interaction between the upper limb prosthesis and its user is a complex issue and no one outcome measure alone can fully grasp the whole picture of this relationship. Hence, in line with other researchers, 13,52,62 we recommend the use of two or even three validated outcome measures to capture the different aspects of this interaction. Hopefully, this will help us to improve our services for this client group.
In conclusion, there are a few outcome measures with proven psychometric quality for use in evaluation of upper limb prostheses users. Different measures cover different aspects of health and the use of a mixture of outcome measures would give a better picture on the outcome of our clients. For children, the ACMC and PUFI, and for adults, the ACMC and selected parts of OPUS and TAPES are recommended. Measures that focus on the social interaction in paediatric users are required.
Footnotes
Acknowledgements
We would like to express our gratitude to Dr Alarcos Cieza, the team leader at the ICF Research Branch at the Institute for Health and Rehabilitation Sciences (IHRS) in Germany, for her advice in the ICF linking process. Financial support was granted from the Research Committee of Örebro County Council and the Department of Rehabilitation, Episteme-foundation, Örebro County Council.
