Sage Journals: Discover world-class research

Abstract

Background:

The Orthotics and Prosthetics Users’ Survey consists of five modules to assess outcomes of orthotic and prosthetic interventions: lower extremity functional status, upper extremity functional status, client satisfaction with device, client satisfaction with services and health-related quality of life.

Objectives:

To investigate the test–retest reliability and calculate the smallest detectable difference for all modules of the Swedish Orthotics and Prosthetics Users’ Survey.

Study design:

Test–retest reliability study design.

Methods:

A total of 69 patients at a Department of Prosthetics and Orthotics completed Orthotics and Prosthetics Users’ Survey on two occasions separated by a 2-week interval, giving 18 answers on lower extremity functional status, 41 on upper extremity functional status, 53 on client satisfaction with device, 12 on client satisfaction with services and 67 answers on health-related quality of life. Raw scores were converted into Orthotics and Prosthetics Users’ Survey units on a 0–100 scale. Intra-class correlation coefficients, Bland–Altman plots, common person linking plots and t-tests of person mean measures were used to investigate the reliability. The 95% confidence level smallest detectable differences were calculated.

Results:

The intra-class correlation coefficients ranged from 0.77 to 0.96 for the modules, and no systematic differences were detected between the response occasions. The smallest detectable differences ranged from 7.4 to 16.6 units.

Conclusions:

The test–retest reliability was satisfactory for all Orthotics and Prosthetics Users’ Survey modules. The smallest detectable difference was large on all modules except the health-related quality of life module.

Clinical relevance

The Orthotics and Prosthetics Users’ Survey modules are reliable and, thus, can be recommended for repeated measurements of patients over time. Relatively large changes are needed to achieve statistical significance when assessing individual patients.

Keywords

Outcome assessment reproducibility of results activities of daily living quality of life

Background

The rehabilitation of individuals with impairments of the musculoskeletal system often involves an interdisciplinary team. The team’s certified prosthetist and orthotist is responsible for the prescription, manufacturing and application of external prostheses and orthoses. The outcomes of these interventions have often been studied by measuring the device’s effects on the musculoskeletal system. Although such laboratory studies are highly informative, they do not tell how the user of the device experiences the outcomes of the intervention in his or her everyday life. In order to capture these aspects of rehabilitation, self-report instruments often are used. One of the few instruments that have been developed to assess the outcomes in both prosthetic and orthotic users is the Orthotics and Prosthetics Users’ Survey (OPUS).¹ The OPUS comprises five modules for self-report: lower extremity functional status (LEFS) and upper extremity functional status (UEFS), which measure the person’s perceived ability to perform activities involving the extremities; client satisfaction with device (CSD); client satisfaction with services (CSS) and health-related quality of life (HRQoL).¹ The content of the modules taps into the components body functions, activity and participation and environmental factors of the International Classification of Functioning, Disability and Health (ICF).² Thus, OPUS covers different aspects of relevance in rehabilitation and has a potential for both clinical and scientific use. For example, OPUS could be used in combination with the rehabilitation cycle, as a means to identify the person’s problems and to evaluate the outcome of the intervention.³ The instrument was developed in the United States and has been translated from English into Swedish.^1,4 The Swedish version demonstrated good linguistic and internal validity for users of prosthesis, orthosis, shoe insoles and orthopaedic shoes.^4,5 However, this Swedish version’s psychometric properties related to repeated measurements over time have not been investigated.

In general, health-care interventions aim either to improve a state or to prevent a state from deteriorating.⁶ Thus, instruments used for repeated measurements must demonstrate two qualities related to time: (a) stability of test scores when the measured dimension is stable (test–retest reliability) and (b) change of test scores when the dimension changes (sensitivity to change).⁷ Better reliability implies higher precision of individual measurements, and thus, a higher potential to detect changes in test scores.⁸ This potential can be estimated by calculating the smallest detectable difference (SDD), which is the smallest change in a person’s test scores that can be interpreted as a real difference, that is, a difference exceeding measurement error.⁷ The aim of this study was to investigate the test–retest reliability and calculate the SDD for the Swedish version of OPUS.

Methods

Subjects and procedure

The Regional Ethics Committee Review Board approved this study before initiation. The data collection was conducted as a part of the data collection for a validation study on OPUS at the outpatient clinic at the Department of Prosthetics and Orthotics, Örebro County Council, Sweden.⁵ Patients visiting the department between November 2007 and March 2008 received oral and written information about the study and were asked to participate. Those who agreed received the applicable OPUS modules and a pre-addressed envelope to leave in a box in the waiting room or to return by mail. In addition, because of the low frequency of clinical visits, information about the study and UEFS, CSD, CSS and HRQoL were mailed to 100 users of upper limb prosthesis in January 2008. They were asked to answer CSS only if they had visited the department during the last 12 months. In all cases, if no response was received in 3 weeks, a reminder was mailed. The individuals who answered the first survey were asked to answer the modules a second time to assess the test–retest reliability of OPUS. Upon agreement, they were sent the same OPUS modules 2 weeks later by post. A reminder was sent after another 2 weeks if they failed to return the retest-surveys. Individuals below 18 years of age and those who could not be expected to understand the questions correctly (due to insufficient language skills, dementia, etc.) were not included.

A total of 177 individuals agreed to participate in the retest study and 152 returned the retest survey. Test–retest reliability and SDD should be assessed on individuals whose state can be expected to be stable during the test period.⁹ Thus, on LEFS, UEFS, CSD and HRQoL, individuals who had some intervention (a new device or an adjustment or return of an old device) between the response occasions were excluded, on the grounds that functioning, device satisfaction and health-related quality of life may be influenced by the intervention. On CSS, individuals who had a visit between the response occasions were excluded, because the responses on both occasions should refer to the same visit. After excluding these individuals and modules with ≥ 50% invalid answers (i.e. missing answers or answers in category ‘not applicable’), data from 69 individuals were included in the analysis. As all modules were not administered to all participants, the number of answers per module varied (18 on LEFS, 41 on UEFS, 53 on CSD, 12 on CSS and 67 on HRQoL; Table 1).

Table 1.

Participants in the study.

	LEFS (n = 18)	UEFS (n = 41)	CSD (n = 53)	CSS (n = 12)	HRQoL (n = 67)
Male	6 (33.3%)	22 (53.7%)	29 (54.7%)	7 (58.3%)	33 (49.3%)
Median age (min–max), years	57 (28–78)	34 (18–81)	39 (18–80)	38 (21–80)	45 (18–81)
Upper limb prosthesis	0	41	41	12	47
Lower limb prosthesis	2	0	2	0	2
Lower limb orthosis	3	0	4	0	4
Shoe insoles	9	0	3	0	8
Orthopaedic shoes	4	0	3	0	6

LEFS: lower extremity functional status module; UEFS: upper extremity functional status module; CSD: client satisfaction with device module; CSS: client satisfaction with services module; HRQoL: health-related quality of life module.

Instrumentation

We used a Swedish version of OPUS that was used in a previous validation study.⁵ In this version, eight new items were added to LEFS, and the response categories on some modules were modified.⁵ The LEFS (28 items) and UEFS (23 items) are rated on a 5-point Likert scale ranging from ‘very easy’ to ‘cannot perform activity’. On these modules, the response category ‘easy’ was changed to ‘slightly easy’ in the Swedish version. The CSD (11 items) and CSS (10 items) are rated on a 4-point Likert scale ranging from ‘strongly agree’ to ‘strongly disagree’. The HRQoL is rated on 5-point Likert scales ranging from ‘not at all’ to ‘extremely’ (first 12 items) and from ‘all of the time’ to ‘none of the time’ (last 11 items).^1,5 ‘Not applicable’ was added as a response option on CSD, CSS and HRQoL (first 12 items) in the Swedish version. Responses in the category ‘not applicable’ were treated as missing in the analysis since this category does not justify any conclusions about the person’s position on the latent variable.

Analysis

The raw scores on each module were converted to OPUS units on a 0–100 scale by means of Rasch analysis. Estimates of item measures and rating scale structures from a previous study on OPUS⁵ were used as anchors for this conversion.

Different methods were used to investigate the reliability. The intra-class correlation coefficient (ICC) version 1,1 (one-way random effects model for single measures) was calculated, and an ICC ≥ 0.75 was considered acceptable.^10,11 We calculated the difference between the person mean measures on the two response occasions (second minus first) to investigate the presence of a systematic difference.¹² Two-tailed paired t-test was used to calculate the 95% confidence interval (CI) for the difference. A difference $\geq | \pm 0.5 |$ logits, on each module converted to its equivalent on a 0–100 unit scale, was used as a criterion for a noticeable difference.¹³ Common person linking plots were constructed, and if the 95% confidence bands included at least 95% of the respondents, the module was considered invariant to test occasion.^14,15 To further investigate the presence of systematic differences between the response occasions, we constructed Bland–Altman plots that plot the difference between the response occasions to their average, with 95% limits of agreement.¹⁶ The linear regression coefficient was calculated for the relation between the differences and the averages. If the 95% CI for the coefficient did not overlap zero, a systematic difference was concluded.

To investigate the SDD, the standard error of measurement (SEM) was calculated from a one-way analysis of variance table as the square root of the pooled mean square (time) and mean square (person × time). The SEM was used to calculate the SDD, using the formula SEM × 1.96 × √2.^7,17 When measuring individuals, a difference larger than the SDD is statistically significant on the 95% confidence level.^7,17 WINSTEPS version 3.70.0.3 (Winsteps, Beaverton, Oregon, USA) and SPSS version 15.0 (SPSS Inc., Chicago, Illinois, USA) were used for the analyses.

Results

The ICC ranged from 0.77 to 0.96 on the modules (Table 2). The LEFS, UEFS, CSD and CSS had stable mean person measures between the response occasions (Table 2). Only on HRQoL did the CI not overlap zero, but the difference between the mean measures was too small to be noticeable. Between 91% and 100% of the respondents were located within the confidence bands in the common person linking plots (Table 2, plots not presented). The limits of agreement in the Bland–Altman plots ranged from (mean difference) ±7.2 to ±15.7 OPUS units for the modules (Figure 1). The linear regression coefficients in the Bland–Altman plots ranged from −0.08 to 0.15, and all CIs overlapped zero, indicating that no systematic difference of mean measures was present (Figure 1). The SDD ranged from 7.4 to 16.6 units on the modules (Table 2).

Table 2.

Summary of indicators for test–retest reliability and SDD.

	LEFS (n = 18)	UEFS (n = 41)	CSD (n = 53)	CSS (n = 12)	HRQoL (n = 67)
Mean difference^a (95% CI), OPUS units	0.52 (−2.64, 3.68)	0.26 (−2.15, 2.68)	0.53 (−1.59, 2.65)	−3.50 (−8.61, 1.60)	−0.91 (−1.81, −0.01)
Acceptable interval for mean difference, OPUS units^b	−3.03, 3.03	−4.02, 4.02	−4.31, 4.31	−3.43, 3.43	−3.88, 3.88
Respondents inside 95% confidence bands in common person linking plots	17 (94.4%)	38 (92.7%)	52 (98.1%)	12 (100%)	61 (91.0%)
ICC (version 1,1) (95% CI)	0.96 (0.89, 0.98)	0.89 (0.80, 0.94)	0.82 (0.71, 0.89)	0.77 (0.39, 0.93)	0.91 (0.86, 0.94)
SDD, OPUS units	12.1	14.8	15.0	16.6	7.4

OPUS: Orthotics and Prosthetics Users’ Survey; LEFS: lower extremity functional status module; UEFS: upper extremity functional status module; CSD: client satisfaction with device module; CSS: client satisfaction with services module; HRQoL: health-related quality of life module; ICC: intra-class correlation coefficient, version 1,1 (one-way random model for single measures); CI: confidence interval; SDD: smallest detectable difference.

Mean difference between response occasions. A positive value indicates a higher mean measure on the second occasion and a negative value a higher mean measure on the first occasion.

Values equivalent to −0.5; 0.5 logits.

Figure 1.

Bland—Altman plots for the modules of the Orthotics and Prosthetics Users’ Survey. a) lower extremity functional status module; b) upper extremity functional status module; c) client satisfaction with device module; d) client satisfaction with services module; e) health-related quality of life module. x-axis: mean of first and second response occasion; y-axis: difference between occasions (second minus first); solid lines: group mean differences; dotted lines: 95% limits of agreement. B: linear regression coefficient (95% confidence interval).

Discussion

This study investigated the test–retest reliability and SDD of the five modules of the Swedish OPUS. The results generally support the reliability of the modules. All ICCs were above 0.75, and there were only minor deviations from the other reliability criteria. Some authors¹⁸ recommend that higher cut-off values than 0.75 should be used for clinical measurements, but since OPUS is not intended for high-stakes decisions, we believe that the cut-off used here is reasonable. In a previous study, Resnik and Borgia¹⁹ investigated the test–retest reliability for LEFS, CSD and HRQoL, and reported ICCs of 0.67, 0.50 and 0.85, respectively. The ICCs calculated in the present study were higher for LEFS and CSD and may reflect the fact that different versions of LEFS and different samples were used. The study by Resnik and Borgia included only lower limb prosthesis users, who may experience more day-to-day variability as a result of skin abrasions, phantom limb pain and so on. In addition, the more homogenous sample in their study may have resulted in lower between-subject variance. This can result in lower ICCs, since ICCs are based on the ratio of between-subject variance to total variance (sum of between- and within-subject variances).²⁰

This study is the first to investigate the SDD of OPUS. Ideally, the SDD should not exceed the minimal important difference (MID), that is, the instrument should be able to detect all changes of clinically important magnitude. The MID should preferably be calculated from empirical data, but this has yet to be performed for OPUS. When assuming MID to be 10% of the module’s measurement range,²¹ equivalent to 10 units on a 0–100 unit scale, the SDD was too large on all modules except HRQoL. Thus, when assessing individuals, LEFS, UEFS, CSD and CSS could miss clinically important changes, and their use should therefore be confined to situations where relatively large changes are expected. The HRQoL demonstrated a better, that is, a lower, SDD. This is important because the impact of prosthetic and orthotic devices on health-related quality of life may be small in clinical practice. However, SDD indicates the potential to detect changes within individuals, and smaller changes can be detected on a group level, depending on the size of the sample. Furthermore, SDD is not equal to sensitivity to change, which reflects an observed change of test scores in a situation where a change of the dimension is expected. This is an important avenue for future evaluations of OPUS.

Our study has some limitations, mainly related to the data collection. The number of respondents using certain types of devices was small in this sample, which could limit the generalisability of the results. Although different types of devices were represented in the sample, all respondents on UEFS and CSS were users of upper limb prosthesis. Furthermore, the small sample resulted in a wide CI for ICC on CSS. Therefore, the results should be considered as preliminary and need to be confirmed in larger studies. The time interval between the two response occasions was 2 weeks; this interval has been used in other studies,^22–24 and we believe it is a reasonable trade-off between an interval long enough for the respondents to forget their answers given on the first occasion and short enough to ensure stability of the measured dimensions.⁹ However, 20 respondents (29%) completed the retest modules after the reminder had been sent out after another 2 weeks. Thus, changes in the investigated dimensions may have occurred between the response occasions, resulting in an underestimation of the test–retest reliability and overestimation of the SDD. We believe that these limitations are not substantial, but that the study results need to be confirmed in larger studies conducted at other sites.

Conclusion

This study provides evidence in support of the test–retest reliability of all modules of the Swedish OPUS. Thus, OPUS is reliable to be used for repeated measurements over time. On all modules but HRQoL, relatively large changes are needed to achieve statistical significance when assessing individuals. The instrument with its corresponding tables and ‘key forms’ to convert the raw scores to Rasch-measures is available from the corresponding author.

Footnotes

Acknowledgements

The authors acknowledge Britt-Marie Engwall, Inga-Lill Freijd, Emma Rosander and Kenneth Karlsson, at the Department of Prosthetics and Orthotics, Örebro County Council, Sweden, for help with the data collection. The authors also thank all respondents participating in this study.

Conflict of interests

The authors declare that there is no conflict of interest.

Funding

This study was supported by grants from the Centre for Rehabilitation Research and the Research Committee at Örebro County Council, Sweden, and the Norrbacka-Eugenia Foundation, Sweden.

References

Heinemann

Bode

O’Reilly

. Development and measurement properties of the Orthotics and Prosthetics Users’ Survey (OPUS): a comprehensive set of clinical outcome instruments. Prosthet Orthot Int 2003; 27(3): 191–206.

Lindner

Nätterlund

Hermansson

. Upper limb prosthetic outcome measures: review and content comparison based on International Classification of Functioning, Disability and Health. Prosthet Orthot Int 2010; 34(2): 109–128.

Stucki

Ewert

Cieza

. Value and application of the ICF in rehabilitation medicine. Disabil Rehabil 2002; 24(17): 932–938.

Jarl

Norling Hermansson

. Translation and linguistic validation of the Swedish version of Orthotics and Prosthetics Users’ Survey. Prosthet Orthot Int 2009; 33(4): 329–338.

Jarl

Heinemann

Norling Hermansson

. Validity evidence for a modified version of the Orthotics and Prosthetics Users’ Survey. Disabil Rehabil Assist Technol 2012; 7(6): 469–478.

Rothstein

Echternach

Riddle

. The Hypothesis-Oriented Algorithm for Clinicians II (HOAC II): a guide for patient management. Phys Ther 2003; 83(5): 455–470.

Beckerman

Roebroeck

Lankhorst

. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res 2001; 10(7): 571–578.

Hopkins

. Measures of reliability in sports medicine and science. Sports Med 2000; 30(1): 1–15.

Streiner

Norman

. Health measurement scales: a practical guide to their development and use. 4th ed. Oxford: Oxford University Press, 2008.

10.

Cicchetti

. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assessment 1994; 6(4): 284–290.

11.

Shrout

Fleiss

. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979; 86(2): 420–428.

12.

Lexell

Downham

. How to assess the reliability of measurements in rehabilitation. Am J Phys Med Rehabil 2005; 84(9): 719–723.

13.

Linacre

. Sample size and item calibration stability. Rasch Meas Trans 1994; 7(4): 328.

14.

Bond

Fox

. Applying the Rasch model: fundamental measurement in the human sciences. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates, 2007.

15.

Wright

Masters

. Rating scale analysis: Rasch measurement. Chicago, IL: MESA Press, 1982.

16.

Bland

Altman

. Measuring agreement in method comparison studies. Stat Methods Med Res 1999; 8: 135–160.

17.

Schreuders

Roebroeck

Goumans

. Measurement error in grip and pinch force measurements in patients with hand injuries. Phys Ther 2003; 83(9): 806–815.

18.

Portney

Watkins

. Foundations of clinical research: applications to practice. Upper Saddle River, NJ: Pearson/Prentice Hall, 2009.

19.

Resnik

Borgia

. Reliability of outcome measures for people with lower-limb amputations: distinguishing true change from statistical error. Phys Ther 2011; 91(4): 555–565.

20.

Tesio

. Outcome measurement in behavioural sciences: a view on how to shift attention from means to individuals and why. Int J Rehabil Res 2012; 35(1): 1–12.

21.

Osoba

Bezjak

Brundage

. Analysis and interpretation of health-related quality-of-life data from clinical trials: basic approach of The National Cancer Institute of Canada Clinical Trials Group. Eur J Cancer 2005; 41(2): 280–287.

22.

Hagberg

Brånemark

Hägg

. Questionnaire for Persons with a Transfemoral Amputation (Q-TFA): initial validity and reliability of a new outcome measure. J Rehabil Res Dev 2004; 41(5): 695–706.

23.

Keenan

McKenna

Doward

. Development and validation of a needs-based quality of life instrument for osteoarthritis. Arthritis Rheum 2008; 59(6): 841–848.

24.

Holmefur

Aarts

Hoare

. Test-retest and alternate forms reliability of the assisting hand assessment. J Rehabil Med 2009; 41(11): 886–891.

Test–retest reliability of the Swedish version of the Orthotics and Prosthetics Users’ Survey