Abstract
Introduction:
The Lower Extremity Measure (LEM) was developed to provide a specific instrument to detect changes in physical function in patients with hip fracture. Of 29 questions, 3 have a valid “not applicable” answer option. The goal of this study was to validate the LEM in German and to determine the added value to the physical functioning (pf) subscale of the Short Form 36 (SF-36).
Materials and Methods:
The LEM was translated according to published guidelines and administered to patients with hip fracture (31 A1-A3 and 31 B1-B3) shortly after surgery (baseline), at 3 months (3M), and for reliability testing at 3 months plus 1 week (3M+). The reproducibility, internal consistency, floor and ceiling effects, construct validity, and responsiveness of the German LEM were assessed.
Results:
A total of 106 patients completed the LEM and SF-36 (mean age 75.5; 67% women) at baseline (mean of 4.9 days after operation), and 88 completed both questionnaires at both the 3M and 3M+ assessments. At each assessment time point, between 6% and 23% of the patients answered 7 questions as “not applicable.” Reproducibility and internal consistency were high (intraclass correlation coefficient = 0.93; Cronbach's α = .96). No floor effect (0%) and a minor ceiling effect (7.87%) were found for the total LEM score. The strongest correlation was found between the LEM and the SF-36 subscale pf (Spearman ρ = .93). Responsiveness was similar for the SF-36 pf subscale and the LEM when using effect size (SF-36 pf 0.71 vs LEM 0.72) and better for the LEM when using standardized response mean (SF-36 pf 0.65 vs LEM 0.76).
Discussion:
The German LEM is a reliable, valid, and responsive measure for the self-assessment of patients after hip fracture surgery. As a number of questions are not applicable to elderly patients, the added value of this lengthy questionnaire in these often frail, sometimes cognitively impaired patients is still open for debate.
Introduction
Evaluation of hip fracture surgery has traditionally focused on clinical or surgeon-defined measures, such as the Harris Hip Score or the Charnley score. 1,2 Although there are a number of patient-reported outcome measures (PROMs) and generic quality-of-life instruments validated for hip osteoarthritis and other hip-related disorders, 3 –6 there are no validated functional PROMs specific to hip fractures. 7 The Lower Extremity Functional Scale, a self-reported questionnaire that has been validated in the context of traumatic injuries of the lower extremities in general, has not been specifically designed for hip fractures and includes questions regarding vigorous activities, such as running and hopping, which are typically not applicable to elderly patients. 8,9 The Musculoskeletal Function Assessment (MFA) or its short version, the SMFA, has been used in patients with a wide spectrum of musculoskeletal problems including fractures but is also not hip fracture specific. 10 The Lower Extremity Measure (LEM) is a PROM that was developed based on the Toronto Extremity Salvage Score. 11 Emphasis was put on designing a short, simple questionnaire considering the advanced age of most patients with hip fracture. The LEM, which is available in French and in English, was shown to be a reliable, valid, and responsive tool to evaluate function in patients with a hip fracture. 12 As no such validated tool is available in German, the goal of the present study was to validate the LEM in German to quantify its psychometric properties and thereby to determine the added value to the physical functioning subscale of the Short Form 36 (SF-36) when assessed in patients with hip fractures.
Materials and Methods
Lower Extremity Measure and Cross-Cultural Adaptation
The English LEM consists of 29 items on activities of daily living rated from 1 (impossible) to 5 (not at all difficult), including a “not applicable” option. A total score from 0 to 100 is calculated, with higher scores indicating higher levels of function. 12 Three questions (Qs) have a valid “not applicable” option: Q4 (Showering), Q20 (Using public transportation), and Q24 (Gardening/yard work). If the task is not applicable, it does not contribute to the total score. We performed the cross-cultural adaptation of the LEM into German with forward and backward translations, pretesting, and agreement meetings according to the established guidelines. 13 –15 The German LEM (Untere Extremitäten-Fragebogen LEM-D) is shown in Appendix A.
Validation study
The validation study was performed in 2 clinics after the approval of the applicable ethics committee. A total of 106 patients with a mean age of 75.5 ± 13.3 years (67% women) having a fresh hip fracture (AO classification type 31 A1-A3 and 31 B1-B3) gave their written informed consent. Only patients who were able to read and understand German language and to understand the patient information and sign the consent form themselves were included. Proxy answers by relatives were not permitted. The patients were surgically treated with an intramedullary nail (58%), a partial or total hip prosthesis (30% and 1%, respectively), or hip screws (11%). The German versions of the LEM and the SF-36 were completed by the patients at a mean of 4.9 ± 3.3 days after operation (baseline), at 3 months (3M), and at 3 months plus 1 week (3M+) for reliability testing. The time point for the 3M follow-up visit was chosen based on the standard follow-up schedule for patients with hip fractures at the selected clinics. Additional follow-up visits were not scheduled to avoid additional burden to these elderly patients. The mean test–retest interval was 8.0 ± 4.0 days. The questionnaires were self-administered after explanation by a trained nurse or doctor. Because 2 patients died and 16 withdrew their informed consent, 88 (83%) patients were available for both the 3M and 3M+ follow-up assessments.
Reproducibility and Internal Consistency
Reproducibility (test–retest reliability) was assessed using the intraclass correlation coefficient (ICC) between the German LEMs completed at the 3M and 3M+ follow-up visits. At these time points, patients have already achieved a stable clinical condition that is necessary to assess reliability between repeated measurements without the influence of factors that might be related to symptom changes in the early rehabilitation or acute postoperative phase. An ICC ≥.75 was deemed to indicate reliability. 16 We expected values of >.85. 12 Cronbach's α (CA) was used to determine internal consistency. In addition, item–total correlations were calculated. We expected CA values of >.8. 17
Floor and Ceiling Effects
Floor and ceiling effects of the LEM were assessed by calculating the percentage of answers scored 1 (for floor effect, worst clinical result) or 5 (for ceiling effect, best clinical result) at the 3M assessment. A proportion of more than 15% was regarded as effect. 18
Construct Validity
Spearman rank correlation coefficients were used to examine construct validity of the LEM relative to the SF-36 v2 at the 3M assessment. The SF-36 is a widely used instrument to assess quality of life that has previously been validated in German and demonstrated its good psychometric properties. 19 –21 As scales of similar content should demonstrate a convergent validity, we hypothesized a strong (|r| ≥ .60) correlation between the LEM and the physically dominated subscales (physical functioning [pf] and role-physical [rp]) and the Physical Component Summary (PCS) of SF-36. 12 In contrast, we hypothesized a divergent validity with low or only moderate correlations (|r| < .60) between the LEM and the SF-36 role-emotional (re) subscale, mental health (mh) subscale, and Mental Component Summary (MCS).
Responsiveness
Effect size (ES) and standardized response mean (SRM) were used to assess responsiveness between the baseline and 3M assessments when compared to the SF-36 pf subscale. We expected ES and SRM values of >0.9. 12
Statistical Analysis
The LEM and SF-36 scores were reported as mean ± standard deviation values. A P value of .05 was defined as level of significance. The “not applicable” response was treated as missing for all questions except for the 3 valid options of the original English LEM. For patients with more than 3 missing values, the total score was not calculated. For patients with 3 or less missing values, the mean of the respective item of the overall study population was used. The statistical analyses were performed using STATA 13 (StataCorp LP, College Station, Texas).
Results
Cross-Cultural Adaptation and Basic LEM Data
During the translation procedure, Q21 (Preparing light meals) required particular discussion because the German expression of “light meal” means more a healthy than a simple meal. Therefore, the German expression for “simple” (=“einfach”) was used.
More than 3 missing items were found for 17% (18 of 106) of the patients at baseline, 10% (9 of 89) of the patients at the 3M assessment, and 8% (7 of 88) of the patients at the 3M+ assessment. The Q24 (Gardening/yard work) was answered as “not applicable” by 35% of the surveyed patients. At each assessment time point, between 6% and 23% of the patients chose “not applicable” for the following questions: Q2 (Getting in/out bathtub), Q21 (Preparing light meals), Q22 (Tidying, dusting, washing dishes), Q23 (Doing laundry, vacuuming), Q25 (Food shopping), and/or Q29 (Participating in usual leisure activities).
Reproducibility and Internal Consistency
Test–retest reliability was excellent (ICC = 0.93 [confidence interval = 0.89-0.95]). The CA was .96 for the total score and if single items were removed (Table 1).
Floor and Ceiling Effects, Internal Consistency, and Reliability of Individual Lower Extremity Measure items.
Abbreviations: ICC, intraclass correlation coefficient; 95% CI, 95% confidence interval for ICC; SD, standard deviation.
aEach item is scored based on a Likert-type 1-5 point scale: 1 = impossible to do; 2 = extremely difficult; 3 = moderately difficult; 4 = a little bit difficult; 5 = not at all difficult.
bItem-total correlation: Spearman rank correlation between each item and the total score.
cCronbach's α if item removed.
Floor and ceiling effects
No floor effect (0%) and only a minor ceiling effect (7.87%) were found for the total score of the LEM. Large ceiling effects were observed for all single items except Q11 (Getting up from kneeling). A floor effect was found for 5 single items (Table 1).
Construct Validity
The LEM showed strong correlations with the SF-36 subscale pf, subscale rp and PCS, and only moderate correlations with the SF-36 re subscale, mh subscale, and MCS (Table 2). All hypotheses were confirmed.
Construct Validity Between the Lower Extremity Measure and the SF-36.
Abbrevaitions: SD, standard deviation; SF-36, 36-Item Short Form Health Survey.
aNumber of patients contributing to the calculation of the Spearman rank correlation.
bSpearman rank correlation between SF-36 subscale/summary score and Lower Extremity Measure (LEM) total score.
cTwo-sided tests for each SF-36 subscale/summary score and LEM total score are independent.
Responsiveness
Responsiveness (from baseline to 3M) was similar for the SF-36 pf subscale and the LEM when using ES (SF-36 pf 0.71 vs LEM 0.72) and better for the LEM when using SRM (SF-36 pf 0.65 vs LEM 0.76; Table 3).
Responsiveness of the SF-36 Physical Functioning Subscale and the Lower Extremity Measure (LEM) at 3 Months Following Surgery for Hip Fracture.
Abbreviations: SD, standard deviation; ES, Effect size; post-op, postoperative; SRM, Standardized response mean; SF-36, 36-Item Short Form Health Survey.
aNumber of patients contributing to the analysis.
bEffect size: Calculated as the difference between the mean post-operative score and the mean 3 months score, divided by the standard deviation of the post-operative score.
cStandardized response mean: Calculated as the mean of the patient-level change from post-op to 3 months divided by the standard deviation of this change.
Discussion
The LEM has been validated in German, showing good reliability, validity, and responsiveness in patients with a hip fracture. The mean score of the German LEM at 3 months lies between the scores for 6 weeks and 6 months of the English version and is therefore comparable. The test–retest reliability of the German version is excellent as is that of the original English version. 12 As in the English version, no floor effect was found. While none of the patients showed a ceiling effect in the English version, we observed a minor effect for the total score at 3 months. The slightly younger age of our patient population may have contributed to this finding.
The CA values of .96 for the German LEM total score and all items show very high internal consistency. However, this indicates a possible redundancy of items, that is, a too narrow questionnaire with too many questions repeatedly asking the same content in a different way. 22,23 Although CA values have not been reported for the original, English version and, therefore, they cannot be compared, such high CA values suggest that the questionnaire should be shortened.
Construct validity of the German LEM in relation to the SF-36 was similar to that of the English version, with the strongest correlation observed between the LEM and the pf subscale of the SF-36. 12 Only moderate correlations were found for the SF-36 subscales and summary score that measure a different construct, that is, re, mh, and the MCS.
Although the LEM contains a number of hip-specific questions, greater sensitivity to change compared to the SF-36 pf subscale was only observed when using SRM. The ES values between the baseline (postoperative) and 3M assessments were similar for the LEM and the SF-36 pf subscale. In comparison with the English version, the German LEM demonstrates a smaller effect. 12 The reason might be that the assessment time points are different between the 2 versions. This would also explain the lower responsiveness of the SF-36 in our study compared to data reported for the English version. As in the original validation, our results support the usability of the SF-36 pf subscale in patients with hip fractures. 12
The assessment of patient's ability to understand and correctly answer to questions was left to the investigator during the informed consent procedure, and no specific test to assess cognitive impairment was used. Although this is a limitation of our study, it was skipped to limit patient's burden in the postoperative phase. Other limitations of the study are its small sample size and the fact that the patients answered more than the 3 valid questions as “not applicable.” Although this is not allowed in the original questionnaire, it shows the needs and the varying characteristics of an elderly patient population. Patients of older age are often dependent on their relatives, live in nursing homes, or receive community-based support. Consequently, they need a suitable questionnaire that allows an adapted assessment of lower extremity function with clear rules to calculate a total score with missing or “not applicable” answers. The answers to the single items can show the clinician which individual tasks the patient is able to perform and to what degree. But in its current version, the total score of the German LEM shows a high redundancy of items and does not result in better responsiveness than the pf components of the SF-36.
Based on its psychometric properties, the German LEM can be used as a self-assessment outcome measure in German-speaking patients with hip fractures and to provide answers regarding specific activities in the rehabilitation phase. However, the added value of this 29-item questionnaire relative to the SF-36 in these often frail, sometimes physically and cognitively impaired patients should be better scrutinized. Further research is needed to develop a comprehensive outcome instrument for patients with hip fractures that acknowledges their individual limitations by allowing a “not applicable” option for all task questions and that provides clear calculation rules considering the influence of its use on the total score.
Footnotes
Appendix
Acknowledgments
We are grateful to Philip Perry (†) for the primary statistical analyses and Irene Santi for pursuing his work. We also want to thank Jan Ljungqvist, Elise Kaegi, and Susann Drerup for the support in the data collection and data checks. The presented study was funded via the AO TK Trauma Network.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
