Abstract
Background. Tools used to measure poststroke functional status must include basic and instrumental activities of daily living and reflect the patient’s and the clinician’s perspective of the disease and its effect on daily living performance. Objective. The authors combined the Functional Independence Measure (FIM) and the Nottingham Extended Activities of Daily Living (NEADL) to create a scale providing a comprehensive evaluation of ADLs functional status in patients with stroke. Methods. The study participants were 188 patients completing the FIM and the NEADL. The psychometric properties of the combined measure were examined with Rasch analysis. Results. A 3-point scale and a dichotomous scale were suggested for use in the FIM and the NEADL, respectively. The combined 40 items worked consistently to reflect a single construct, and “bladder management” and “bowel management” were highly related. After “bowel management” was removed from the combined scale, all but 3 items fit the model’s expectations, and the 39-item scale showed reasonable item difficulty hierarchy, with high reliability. The 3 misfit items were removed, and no differences in unidimensionality, differential item functioning, and reliability were found between the 36-item and 39-item scales. Conclusions. The combined measure of the FIM and the NEADL provides a comprehensive picture of ADLs. It extends the utility of the FIM and the NEADL and is recommended for use to measure the independence of patients after discharge home.
Stroke survivors often experience residual limitations in daily life, which are a burden to individuals and their families. 1 The main objective of stroke rehabilitation and the major focus of research in stroke recovery are to increase independence of stroke survivors and to assist them in returning to normal life. Assessments of independence in everyday activities are critical for clinical management and evaluation for quality of rehabilitation services and research.
The Functional Independence Measure (FIM) 2 is the most widely used functional measure in stroke rehabilitation3,4 and has been recommended by the Agency for Health Care Policy and Research Post-Stroke Rehabilitation panel for assessing basic activities of daily living (ADLs).5,6 The FIM has well-established reliability 7 and validity 8 and is sensitive to change in patients with stroke. 9 However, a return to normal living requires the ability to perform not only basic ADLs (eg, bathing, toileting, or eating) but also complex activities (eg, shopping, meal preparation, and driving). At early stages after stroke, the FIM is valuable for indicating a person’s disability when assessment of basic self-care activities is paramount. 10 After patients return home, ADLs become more complex and require increasing interactions with the environment. 11 Instrumental ADLs (IADLs) scales become more capable than basic ADLs of capturing independent living in the community and at home after patients have been discharged home. 12 However, the FIM items assess independence of basic ADLs and lack information to reflect higher levels of physical function, resulting in ceiling effects as recovery progresses after stroke. 13
The FIM also lacks the patient’s perspective regarding difficulty or improvement after treatment. In clinical research, the patient’s perspective of the disease and its impact on daily living performance have drawn increasing attention from the society and become an important tool for determining the effectiveness of treatment. 14 Judgments of severity and its impact as well as the evaluation of improvement after treatment are often based on professionals’ experience, 15 but patients often evaluate differently from professionals.16,17 How the patient perceives his or her health status and the effect of the treatment can provide insight to clinicians. 18 It is important to include subjective measures for clinical severity evaluation to have a comprehensive understanding from both the clinicians’ and patients’ perspectives.
The Nottingham Extended Activities of Daily Living Scale (NEADL) is a popular IADL scale in stroke research. 19 It was developed to assess performance that extends beyond basic self-care skills for follow-up in patients with stroke after discharged from the hospital. The NEADL takes the patient’s perspective into account to determine performance in IADL. 10 However, an IADL assessment is not sufficient by itself to delineate the comprehensive picture of ADLs functional status in patients with stroke. A combined scale of the NEADL and the basic ADL might provide a wider range of information and capture more significant losses and improvement in independence after stroke than either of the individual measures. 20
The present study aimed at creating a new scale by combining the FIM and the NEADL to reflect performance in basic and complex activities. Because the FIM and the NEADL scores are ordinal and the items of individual instruments are scored on different rating systems, the study used a Rasch measurement model to analyze the data. The Rasch model uses a nonlinear transformation to convert ordinal data into interval scores in the logit scale. Then, subsequent conventional statistics based on logit scores are used to yield precise estimates. In addition, Rasch analysis provides information about appropriateness of scoring points to determine whether patients can be differentiated by their responses as clearly as the rating systems intend. Accordingly, this study conducted a Rasch analysis to investigate the appropriateness of levels of scaling of the individual measures and determine the internal construct validity and reliability of the combined scale.
Methods
Participants
This study was a secondary analysis of data from previous21-23 and ongoing randomized controlled trials of stroke rehabilitation therapies. Participants were recruited from physical rehabilitation departments at 6 hospitals from October 2006 to November 2011. The present study included 188 patients with no missing values for the 2 outcome measures. The inclusion criteria were (1) first-ever stroke, (2) Brunnstrom stage II or above for the proximal and distal upper extremity, 24 and (3) no severe physical conditions and medical problems. The exclusion criteria were (1) cognitive impairment (Mini-Mental State Examination [MMSE] < 21), 25 (2) excessive spasticity at any joint of the arm (Modified Ashworth Scale score > 2.5),23,26 and (3) participation in experimental rehabilitation or drug studies within the past 6 months. The institutional review board at each participating site approved the study, and all participants signed a consent form before entry into the study.
Procedure
Randomization procedures were similar in each trial. For each trial, participants were randomly assigned to the treatment group (ie, constraint-induced therapy, bilateral arm training, robot-assisted therapy, or mirror therapy group) or the control group. Therapy in both groups was for 90 to 120 minutes every weekday for 3 to 4 weeks. Three evaluators with intensive training in clinical assessment and blinded to the group assignments administered the clinical evaluations before and after patients received rehabilitation treatments. Data obtained before treatment were used in the present study.
Instrument
The FIM and NEADL were the major outcomes in the present study. The FIM consists of 18 items, of which 13 reflect disability in motor function and 5 assess cognitive function. 27 Patients are evaluated on a 7-point scale by professionals, such as physical therapists, nurses, or occupational therapists. A score of 7 indicates complete independence, and 1 suggests total assistance in extent of assistance, supervision, and use of adaptive equipment. An alternative scoring method (1-1-2-2-2-3-3) would be used to indicate different levels of dependence 27 : complete dependence on a helper (initial scores 1 and 2), modified dependence on a helper (initial scores 3 to 5), and no helper (initial scores 6 and 7). Higher scores are associated with a greater degree of independence.
The NEADL items ask patients to rate levels of difficulty in conducting 22 tasks relevant to motor function and kitchen, domestic, and leisure activities. Items of the NEADL were initially scored dichotomously, where 0 is not independent and 1 is independent in IADLs. 10 To improve the sensitivity to change, all NEADL items have been scored on a 4-point scale (range, 0-3 points), with higher scores indicative of higher independence: 0 (unable), 1 (with help), 2 (on my own with difficulty), and 3 (on my own). The NEADL has been recognized as an easily administered IADL scale 28 and has excellent face validity, excellent concurrent (Spearman’s ρ = −0.41 for gender and ρ = 0.69 for the Barthel index) 29 and construct validity (reproducibility >0.9 and scalability >0.6), 29 and good reliability (Cronbach α = .72 to .94 and test-retest reliability = 0.81-0.9). 19 It was also more sensitive to changes after patients received stroke interventions than the Frenchay Activities Index, another frequently used IADLs instrument. 12
Data Analysis
The Rasch measurement model was used to examine psychometric properties of the combined scale. It calculates mean square (MnSq) and the corresponding T fit statistics (Zstd) as indicators of how much the residuals vary relative to the expected variance. The weighted (in-fit) and unweighted (out-fit) MnSq are used to summarize unexpected responses, and the ideal in-fit or out-fit MnSqs are approximately 1. This study used the partial credit model to examine the combined scale because the FIM and the NEADL were scored on a 7-point and a 4-point scale, respectively.
Researchers have suggested various scoring scales for FIM administration27,30 and left the discussion open regarding the use of rating systems. For the use of the NEADL scaling systems, no studies have provided empirical evidence to support which scale performs better. Accordingly, this study first examined appropriateness of the FIM Scale and NEADL. When the categories are assigned in the intended way, threshold estimates are ordered, and the out-fit MnSqs of the individual scoring category are less than 2. 31 Also, the average person measure of each rating category should increase as the rating score increases, which means that the score 1 represents the lowest ability, whereas the score 3 indicates the highest ability. 31 If the scale did not function as clearly as the points allow, recalibrating would be considered.
After the rating category diagnostics, we examined if the combined FIM and NEADL reflected a unidimensional structure. Rasch analysis assumes an underlying unidimensional structure, where the easier the item, the more likely that it will be negotiated successfully, and the more able the person, the more likely that he or she will complete an item successfully compared with a less able person. 32 Principal components analysis of residuals is used to examine if all items of a scale consistently access a single construct and identifies possible multidimensionality. When 3 criteria were met, the combined FIM and NEADL items worked consistently to measure a unidimensional construct—ADLs function: (1) the first dimension explained more than 50% of the variances of the data, (2) the eigenvalue of the first residual factor was less than 3, and (3) the first residual factor explained less than 7% of the variance. 32
The Rasch model is based on the specification of “local independence.” It assumes that after removing the contribution of the Rasch model, residuals of items are not related to one another. One correlation between 2 residuals of items above 0.30 indicates minimum local dependence, and a large positive correlation suggests that the 2 items share more than 50% of their random variance and that only 1 of the 2 items is needed for measurement. 33
Goodness of fit of an item is determined by the item-fit statistics using MnSq and Zstd values. High fit statistics (eg, >1.5) indicate noise in responses, and low fit statistics (eg, <0.5) suggest too little variation compared with what the Rasch model predicts. Because there is no consensus regarding the acceptable MnSq fit ranges34-36 and the criteria related to sample size, 37 this study used a conservative criterion to detect item misfit. If the item had in-fit and out-fit MnSqs outside the range of 0.7 and 1.3, with Zstd beyond ±2, it was considered a misfit. Further investigations, such as removing the item and re-examining the psychometric properties of a measure, would be needed.
Differential Item Functioning (DIF) analysis is a test for whether patients with the same level of functional status respond differently to an item. Potential factors such as sex, the time since stroke onset, or age might influence patients’ responses to an item, and therefore, the combined scale might not be accurate for assessing ADLs functional status. This study conducted the DIF analysis to examine if responses to the 40 items were influenced by sex, age (≥65 years and <65 years), hemispheric laterality, and the time since stroke onset after controlling for patients’ ADLs functional status. The DIF contrast is the difference in the item difficulty between the 2 subgroups and should be at least 0.5 logits to be noticeable at the .0013 (P = .05/40) level using Bonferroni correction. 33 The item-person map was plotted to depict the relation between item difficulty and person ability. For a well-targeted measure (neither too difficult nor too easy), the mean person ability and the mean item difficulty should be relatively close (termed as targeting). 38 Also, items need to be well spread over the entire range of patients’ ADLs. Test reliability was estimated using person (separation) reliability and person separation. Person reliability is equivalent to traditional test reliability (eg, Cronbach α). A value of .90 represented an excellent level, .80 was moderate, and .70 was acceptable. 39 Person separation (G) estimates how many strata (groups) participants can be divided into using the formula (4G + 1)/3. 40
Results
A total of 188 participants completed the FIM and NEADL items, and their data were used in the present study; of these, 126 (67%) were men, 89 (47.3%) had right hemiparesis, and 99 (52.7%) had left hemiparesis. The average age of the 188 participants was 55.96 years (standard deviation [SD], 11.7 years), and the mean time since stroke onset was 19.45 months (SD, 15.96 months). The mean MMSE score was 27.33, indicating that our participants had the cognitive function to perform the assessments. Their average National Institutes of Health Stroke Scale (NIHSS) score of 2.39 indicated that the participants were suffering from a mild stroke.
Rating scale diagnostics showed that rating categories of 1 and 2 were infrequently used, and disordered threshold measures and average person measures were detected. The initial FIM scores were recoded as 1-1-2-2-2-3-3 to create a uniform 3-point scale to indicate degrees of independence: 1 (complete dependence on a helper), 2 (modified dependence on a helper no helper), and 3 (no helper). 27 The NEADL showed the same problematic threshold measures and infrequently used rating categories of 0 and 1. Therefore, the response categories were collapsed into a dichotomy. The highest score 2 category of the items in the NEADL was rescored as 2, indicating full independence. The rest of the score 2 categories were rescored as 1 to indicate dependence. Then, the combined FIM and NEADL items rated in new rating scales were analyzed in the subsequent data analysis.
When the Rasch partial model was applied to the 40 items together, the principal components analysis of residuals showed that 63.4% of raw variance was explained by the Rasch dimension, and the eigenvalue of the first residual was 2.9, which explained 2.6% of the variance. We concluded that the 40 items constituted a unidimensional construct. The residual correlation between bowel management and bladder management was as high as +0.81, indicating that only 1 of the 2 items was needed in the combined scale. We then examined the rating category diagnostics and found that the item bowel management showed disordered threshold measures, and therefore, it was removed from the combined scale. Reanalysis indicated no remaining pairs of items had high correlations.
Item-fit statistics of the 39 items showed that 3 items (washing dishes, reading books or newspaper, and writing a letter) had both in-fit and out-fit MnSqs outside the acceptable range at the statistically significant .05 level (Zstds > 2.8; Table 1). However, no significant DIF related to sex, age, hemispheric laterality, and the time since stroke onset was found. Responses to the 39 items were not influenced by participants’ demographic or clinical characteristics.
Fit Statistics and Revised Response Categories of the Combined 39-Item Scale
Abbreviations: MnSq, mean square; Zstd, z-standardized.
Figure 1 shows that the most difficult item was driving a car or riding a scooter, whereas the easiest item was bowel management. The item difficulty hierarchy of the 39 items indicated that items of IADLs (the original NEADL tasks) were overall more difficult than those of ADLs (the original FIM tasks). The overall average ADLs at pretreatment was 1.31 logits (SD = 0.46), and compared with the average difficulty of items (0.00 logit), the overall scale was considered as slightly easy for our participants. The item difficulties were spread out along the continuum (from −3.65 to 4.5 logits), and the person abilities ranged from −3.7 to 4.9 logits.

Item-person map of the combined 39-item scale: the column of numbers to the left is logit; M is the mean, S is the standard deviation, and T is 2 standard deviations. The symbol “#” to the left of the centerline represents 2 participants’ activities of daily living (ADLs). The symbol “·” to the left of the centerline represents a participant’s ADLs. The most able people and the most difficult items are at the top and vice versa. Items plotted along the centerline are based on the average difficulty of the items. Item contents are listed in Table 1.
A person separation index of 3.24 indicated that the 39 items separated the 188 participants into 4.65 statistically distinct ability levels (strata) by ADLs function. Person reliability (.91) confirmed high reliability of the new scale. We further excluded the 3 misfit items and examined the psychometric properties of the remaining 36 items. The 39-item and 36-item scales did not differ in unidimensionality, DIF, and reliability.
Discussion
This study aimed at combining objective and subjective measures to provide comprehensive information to understand patients’ functional status, which is particularly meaningful for patients who have been discharged from the hospital after a stroke. Using Rasch analysis, we were able to create such an assessment tool by combining items of the FIM and the NEADL with 2 scoring systems. After bowel management was removed, all items worked consistently to reflect a single and unidimensional ADLs construct, and the item difficulty positions lay along a continuum encompassing the entire range of ADLs functional status in patients with stroke. Removal of 3 misfit NEADL items that did not meet the model’s expectations did not jeopardize the psychometric properties of the combined measure. The NEADL items requiring a great level of cognitive and motor function were rated more difficult compared with the FIM items. With the evidence of high reliability and no item biases, the combined scale is recommended for use to measure independence of patients after being discharged home.
There have been few empirical investigations into the appropriateness of rating systems for the FIM and the NEADL, and researchers have not come to a definite agreement.27,30,41 We found that the 3-point scale 27 and a dichotomous scale worked effectively to reflect responses to the FIM and the NEADL items, respectively. A 3-point scale or a scale with only dichotomous items is much more convenient and efficient to administer and is thus recommended for use in the combined scale of FIM and NEADL items for patients with stroke.
Previous Rasch reports on the FIM used a 7- or 5-point scale to examine the psychometric properties of the FIM and found violations of unidimensionality and some misfit items.15,30,31 One study on the NEADL that used the 4-point system reported that items of the NEADL did not lie along a continuum and suggested that it might be a result of confusing interpretations of the rating categories 1 and 2. 42 In this study, we used a 3-point scale 27 for the FIM items and the original dichotomous scaling for the NEADL, and the results showed that the unidimensionality of the combined measure was maintained. Future studies could replicate our results to validate the appropriateness of the 2 scoring indices for the FIM and the NEADL separately and reexamine the unidimensionality of the combined FIM and NEADL items in patients with stroke.
The present study extended the utility of the FIM by adding items assessing complex ADLs. Items of the FIM are designed to assess performance in basic and easy activities, and the study findings support the concept and content of the FIM9,43 with the evidence that the FIM items were located in the lower part of item difficulty. The NEADL items, however, were in the upper part of item difficulty, indicating the NEADL items measure a high level of independence. The 2 measures were supplemented by one another. We further examined reliability and person separation index of the FIM and the NEADL and found that reliability of the new scale was significantly improved from 0.49 (of the FIM) and 0.85 (for the NEADL) to 0.91 in the new scale. The value of the person separation index increased from 0.99 for the FIM and 2.42 for the NEADL to 4.65 for the combined measure. The combined measure covered a substantial range of ADLs functional status in stroke patients and provided more useful and inclusive information than the individual measures.
Lundgren Nilsson and Tennant 41 found that 2 items in the FIM—bladder management and bowel management— violated the assumption of local independence in Rasch analysis. The authors combined the 2 items into a testlet, and reanalysis showed significantly improved fit and reduced reliability. Because a high, positive residual correlation may imply that the 2 items assess very similar motor functions, 33 this study removed bowel management from the combined measure. Reanalysis indicated that removing bowel management did not influence item fit, targeting, and reliability but did solve the problem of local dependence. The decision to remove bowel management was supported and recommended for use in future research.
The 3 NEADL items of washing dishes, reading books or newspapers, and writing letters did not seem to be appropriate activities for our participants. Most (81%) reported that they did not wash dishes at home, possibly because most of them stayed with their families and caregivers or family members performed chores (eg, washing dishes) for the patients, 44 and 75% were elderly persons, so writing letters and reading books or newspapers could have been difficult as a result of age-related vision declines or visual impairments (eg, blur and diplopia) after stroke. In addition, the participants had a low level of education, and reading and writing were not their usual activities. Therefore, those tasks might not be appropriate items for assessing their ADLs functional status and were found to be misfits in this study. Future studies could recruit patients from different cultures or with different lifestyles or create a design with equal sample sizes of young and old to validate our findings; for example, participants from a Western culture or with a higher level of education could be included. People from a Western culture might place more emphasis on individual independence in domestic activities44,45 than those from an Eastern culture. Future studies could also recruit people with a higher level of education who have the habit of reading and writing to validate the combined scale.
One limitation is related to the generalizability of our findings. Although this study recruited patients with a wide range of stroke onset time and times since discharged from the hospital, our findings may not be broadly applicable to patients with different characteristics, such as patients who were just discharged from the hospital. Cultural or environmental factors may also influence participants’ responses to ADL scales. Future research using different stroke populations or those from different countries would be useful to verify the current results.
Conclusions
The current study provides empirical evidence to support the use of the combined FIM and NEADL measures, with the exclusion of 1 FIM item and 3 NEADL items. A 3-point scale is recommended for the FIM and a dichotomous scale for the NEADL. The combined scale was found to measure a unidimensional construct, and responses to those items were not biased by patients’ demographic or clinical characteristics. Also, the combined scale had high reliability and covered a substantial range of basic ADLs and IADLs functions. These 2 measures can be combined to assess comprehensive ADLs functional status in patients after stroke.
Footnotes
Authors’ Note
Hui-fang Chen and Ching-yi Wu contributed equally to this work.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or pubication of this article: This project was supported in part by the National Health Research Institute, the National Science Council, and the Healthy Aging Research Center at Chang Gung University in Taiwan.
