Abstract
The World Health Organization's (WHO) Disability Assessment Scale II (WHO-DAS II) is a generic health-status instrument firmly grounded in the WHO's International Classification of Functioning, Disability and Health (WHO-ICF). As such, it assesses functioning for six domains: communication, mobility, self-care, interpersonal, life activities, and participation. Domain scores aggregate to a total score. Because the WHO-DAS II contains questions relevant to hearing and communication, it has good face validity for use as an outcome measure for audiologic intervention. The purpose of the present study was to determine the psychometric properties of the WHO-DAS II on a sample of individuals with adult-onset hearing loss, including convergent validity, internal consistency, and test-retest stability. Convergent validity was established by examining correlations between the WHO-DAS II (domain and total scores) and the Abbreviated Profile of Hearing Aid Benefit (APHAB) and the Hearing Aid Handicap for the Elderly (HHIE), two disease-specific measures, as well as with the Short Form-36 for veterans (SF-36V), a second generic measure. Data on all four measures were collected from 380 older individuals with adult-onset hearing loss who were not hearing aid users. The results of the convergent validity analysis revealed that the WHO-DAS II communication domain score was moderately and significantly correlated with scores on the APHAB and the HHIE. WHO-DAS II interpersonal and participation domain scores and the total scores were also moderately and significantly correlated with HHIE scores. These findings support the validity of using the WHO-DAS II for assessing activity limitations and participation restrictions of adult-onset hearing loss. Several WHO-DAS II domain scores and the total score were also significantly and moderately-markedly correlated with scores from the SF-36V. These findings support the validity of the WHO-DAS II as a generic health-status instrument. Internal consistency reliability for all the domain scores was adequate for all but the interpersonal domain. Test-retest stability for all the domain scores was adequate. Critical difference values were calculated for use in clinical application of the WHO-DAS II. From these findings, we concluded that the WHO-DAS II communication, participation, and total scores can be used to examine the effects of adult-onset hearing loss on functional health status. Further work examining the utility of the WHO-DAS II as an outcome measure for hearing aid intervention is warranted.
Introduction
For audiologists, the term outcome measure refers to those methods and tools that can be used to evaluate the results of audiologic intervention (Abrams and Hnath Chisolm, 2000). The general goal of outcome measurement is to provide objective information about the benefit(s) of audiologic intervention to patients and to promote data-driven decision making by health-care administrators and third-party payers. The importance of determining the effectiveness of audiologic intervention has received considerable attention in recent years (e.g., Abrams and Hnath Chisolm, 2000; Johnson and Danhauer, 2002). Perhaps the most important reason for obtaining outcome measures is to develop the evidence upon which to make future clinical decisions and to allow the development of a core set of practice guidelines to be used by audiologists.
The practicing audiologist currently has available a variety of widely used self-report outcome measures to determine the effectiveness of hearing aid intervention (see Abrams and Hnath Chisolm, 2000; Johnson and Danhauer, 2000; for reviews). Most, however, share one limiting characteristic: they are disease specific. That is, they measure the effectiveness of hearing aid intervention in terms of the impact on consequences of the hearing loss alone. One might wonder why this is a limiting feature. After all, disease-specific instruments are clinically sensible; that is, the questions are similar to those used by a hearing-health-care practitioner when talking to a patient. As a result, disease-specific instruments tend to be sensitive to the effects of treatments that are directed toward alleviating specific problems associated with hearing loss.
The use of disease-specific instruments exclusively creates problems, however, when attempting to compare treatment benefits across populations or conditions as must be done in an environment of competition for limited health-care resources. To make such comparisons, generic instruments are needed. Generic instruments do not focus on any particular disorder or treatment but rather assess the self-perceived overall health status of an individual.
Indeed, one of the emerging trends in outcome assessment for clinicians, researchers, and health-care policy makers is measuring the effectiveness of treatments using instruments that allow for comparisons across diseases and disorders. This approach, however, has a potential drawback: generic instruments allow for comparisons across populations or conditions, but they are not necessarily sensitive to a particular disorder or treatment.
Several previous investigations used generic instruments in the assessment of hearing aid intervention. These include the Sickness Impact Profile (SIP) (Bergner et al., 1981), the Self Evaluation of Life Function (SELF) (Linn and Linn, 1984), the Medical Outcomes Study-Short Form 36 (SF-36) (Ware and Sherbourne, 1992), the Dartmouth COOP Functional Health Assessment Charts (Nelson et al., 1987) and the EuroQOL (The EuroQOL Group, 1990).
In reviewing the use of these instruments to document hearing aid effectiveness, Bess (2000) found that the generic instruments were insensitive to clinically meaningful improvements in hearing performance. He also noted that none of the generic instruments probed communication function. Bess concluded that there was a need for the development of a functional health-status instrument that included items examining the consequences of hearing loss.
The conclusion reached by Bess (2000) regarding the relative insensitivity of available generic health-status instruments to hearing aid intervention has been supported by more recent studies. For example, no effects of hearing aid fitting on generic quality of life as measured by the EuroQoL were found in 80 adults, ages 18 years and older, fitted with hearing aids for the first time (Joore et al., 2003). Similarly, Stark and Hickson (2004) found no significant changes as a result of hearing aid fitting for 93 older adults with hearing loss on any of the eight subscales of the SF-36 (i.e., physical function, role-physical, bodily pain, general health, vitality, social function, role-emotional, mental health).
In contrast, Abrams et al. (2002) found significant changes on the mental component summary (MCS) scale score (Ware et al., 1994) of the Short Form-36 for veterans (SF-36V) (Kazis et al., 1999) in a group of 105 older veterans fitted with hearing aids for the first time. The Abrams et al study was designed to compare outcomes in patients who received hearing aids only (HA) with those who received hearing aids and participated in a group aural rehabilitation program (HA+AR). Collapsed across treatment groups, the mean change in MCS scores pre- and postintervention (mean change, 2.0) was statistically significant. Although the pre- vs postintervention mean change (3.0) in MCS scale scores for the HA+AR group (mean change, 3.0) was more than twice that of the HA group (mean change, 1.4) the interaction between treatment group and test interval (i.e., pre- and postintervention) was not statistically significant.
Thus, it was not possible to separate the effect of hearing aid use from the combined effects of hearing aid use and aural rehabilitation on changes in SF-36V MCS scale scores, leading Abrams et al (2002) to conclude that there was still a need to examine the utility of other generic instruments that might be used as an outcome measure to assess hearing aid intervention. Such an instrument should include questions specific to hearing and communication.
Recently, the WHO developed the Disability Assessment Schedule II (WHO-DAS II) (WHO, 1999) a generic functional health-status instrument grounded in the WHO's framework for the Classification of Functioning, Disability, and Health (WHO-ICF) (WHO, 2001). The WHO-ICF is a model of functioning and disability that allows for examination of the consequences of a disease or disorder in three dimensions: body function and structure (symptoms and impairments), activities (related to tasks and interactions by an individual), and participation (involvement in life situations). Several researchers suggested that the WHO-ICF is a useful conceptual framework for assessing the impact of adult-onset hearing loss and the effectiveness of intervention (e.g., Abrams and Chisolm, 2000; Kiessling et al., 2003; Cox, 2003; Worrall and Hickson, 2003; Stephans, 2003). If the WHO-ICF is a useful conceptual framework for adult-onset hearing loss, then the WHO-DAS II should be a useful general health-state-assessment measure for use in examining the effects of hearing loss and the effectiveness of relevant rehabilitation.
The WHO-DAS II questionnaire consists of 36 items organized into six domains designed to assess health status related to communication (i.e., understanding and communicating with the world), mobility (i.e., moving and getting around), self-care (i.e., attending to one's hygiene, dressing, eating, and staying alone), interpersonal (i.e., getting along with people), life activities (i.e., domestic responsibilities, leisure, and work), and participation in society (i.e., joining in community activities). The WHO-DAS II assesses difficulties with functioning and disability in each of the six domains over the past 30 days. The domains of communication, mobility, and self-care reflect the WHO-ICF dimension of activity, and the interpersonal, life activities, and participation domains reflect the WHO-ICF dimension of participation.
Because the WHO-DAS II is a generic measure, it can be used within and across disorders to determine the relative impact of the disorder, the relative effectiveness of the interventions, and the relative costs associated with managing those disorders. In these respects, the WHO-DAS II holds promise for measuring the generic health impact of hearing loss and comparing those findings with those obtained for other disorders.
The utility of the WHO-DAS II as a generic outcome measure has been explored in at least two previous studies. van Turbergen et al (2003), in a study among patients with ankylosing spondylitis (AS), demonstrated that the WHO-DAS II total score was correlated with scores on disease-specific instruments (i.e., the Bath AS Disease Activity Index; and, the AS Quality of Life Questionnaire) and select scales of the SF-36. In addition, the WHO-DAS II total score was found to be sensitive to the changes in physical functioning among patients being treated for ankylosing spondylitis.
Similarly, Chwastiak and Von Korff (2003) concluded that the WHO-DAS II was a useful health-status questionnaire for measuring disability associated with physical (back pain) and mental (depression) disorders among individuals being followed in a primary care setting. This conclusion was based on the findings of
a high level of internal consistency for domain and total scores in both populations;
evidence of convergent validity between domain and total WHO-DAS II scores and subscale and summary scale scores on the SF-36, as well as with two disease-specific measures (Work Limitations Questionnaire for assessing the impact of health problems on work functioning, and Patient Health Questionnaire used to predict major depressive episodes); and,
findings of responsiveness to change, defined as the “ability of an instrument to detect important clinical changes over time,” for both populations.
Although the WHO-DAS II might be useful in audiology, no study to date has examined its psychometric properties in individuals with adult-onset hearing loss. Review of the WHO-DAS II reveals good face validity for use with adults receiving audiologic intervention because it includes has two specific questions: one question asks how well an individual can generally understand what people say; the other asks how well an individual can start and maintain a conversation.
Of course, before the WHO-DAS II is used to assess the effects of hearing aid intervention or other types of audiologic intervention for hearing loss, other psychometric properties related to validity and reliability need to be examined. For example, an important aspect of validity is construct validity, or the consistency between the assessment instrument and the theoretic constructs the instrument purports to assess. One way to examine construct validity is to assess convergent validity, or the degree to which scores on instruments designed to assess the same construct are correlated with each other. In the present context, convergent validity can be examined by determining if WHO-DAS II scores reflect scores on commonly used disease-specific instruments such as the Abbreviated Profile of Hearing Aid Benefit (APHAB) (Cox and Alexander, 1995) and the Hearing Handicap Inventory for the Elderly (HHIE) (Ventry and Weinstein, 1982).
Cox (2003) pointed out that the APHAB provides a measure of the residual hearing-related activity limitations experienced by adults using hearing aids, as it allows for the assessment of continued difficulties hearing aid users may experience in tasks such as understanding speech and localizing sound. Recall that the WHO-DAS II subscales of communication, mobility, and self-care reflect the WHO-ICF dimension of activity. Although there is no logical reason to expect an acquired hearing loss to have a strong influence on mobility and self-care activities, a hearing loss would be expected to have a negative impact on activities of communication. Thus scores on at least the WHO-DAS II communication domain should be significantly correlated with APHAB scores.
In the area of participation, audiologists often use the HHIE to assess residual participation restrictions (i.e., the problems or barriers encountered during situations of daily life in which hearing plays a role) experienced by the hearing aid wearer (Cox, 2003). Significant correlation of HHIE scores would thus be expected with at least some of the WHO-DAS II domains that reflect the WHO-ICF dimension of participation. These are the interpersonal, life activities, and participation domains.
Finally, because the WHO-DAS II is a generic health-status instrument, significant correlation of the domain and total scores with other generic health-status instruments such as the SF-36V would be expected. One goal of the present study was to examine the relationships between WHO-DAS II scores and scores from the APHAB, the HHIE, and the SF-36V.
In addition to examining issues related to validity, the reliability of an instrument needs to be assessed within a specific population. Intrinsic reliability or internal consistency is one aspect of reliability that is important to examine for the WHO-DAS II. As Demorest and Walden (1984) point out, internal consistency is important because it can provide justification for generalization from an observed score on one set of items to a predicted score for an equivalent set of additional items.
In addition, if a measurement is to be useful in assessing treatment outcomes, it must not only be valid and have good internal consistency but also must demonstrate adequate test-retest stability. Knowledge of test-retest stability, particularly for the time interval over which intervention effects are to be assessed, is critical for determining if a change in score by an individual is due to intervention rather than measurement error (Demorest and Erdman, 1988).
To examine issues related to the validity, internal-consistency reliability, and stability of the WHO-DAS II for use with individuals with adult-onset hearing loss, the following research questions were addressed in the present study:
Are WHO-DAS II domain and total scores correlated with APHAB scores, HHIE score, and/or SF-36V scores?
Is the internal-consistency reliability of the WHO-DAS II domain and total scales sufficient for the use of the WHO-DAS II in the population of individuals with adult-onset hearing loss?
Is test-retest stability sufficient for the use of the WHO-DAS II domain and/or total scales as an outcome measure for hearing aid intervention?
Methods
Participants
The data used to examine the psychometric properties of the WHO-DAS II in individuals with adult-onset hearing loss were obtained during a larger multisite project designed to examine the effects of hearing aid intervention on quality of life. Four Veterans Affairs medical centers (VAMC) were involved in data collection: James H. Quillen VAMC, Mountain Home, TN; Tennessee Valley Healthcare System, Nashville, TN; VA Pittsburgh Healthcare System, Pittsburgh, PA; and James Haley VAMC, Tampa, FL. The study recruited 384 veterans with adult-onset sensorineural hearing loss and no prior hearing aid experience from the regular audiology clinics at all four VAMCs. Four subjects were withdrawn before completion of baseline measures due to inability to pass screening criteria. A total of 380 participants who were eligible for hearing aids through the national VA hearing aid program were enrolled.
Participants exhibited at least a mild, high-frequency sensorineural hearing loss as evidenced by a pure-tone average of 30 dB HL or more at 1,000, 2000, 3000, and 4000 Hz in the better ear. The mean audiogram, collapsed across ears, is shown in Figure 1. Participation also required a passing score on the Mini Mental State Exam (MMSE) (Folstein et al., 1979), a widely used screening tool for cognitive status. All participants lived in the community and had access to a telephone, no known neurologic or psychiatric disorders as determined by chart review, and no known comorbid diseases that would preclude completion of the study. Participants were excluded for conductive or retrocochlear pathology, as well as asymmetry of either pure-tone thresholds or speech-recognition scores in quiet.

Mean audiogram for the right and left ears of the participants. Hearing thresholds were collapsed across ears since t-tests revealed no statistically significant differences between ears for any frequency. The standard deviations are indicated by the vertical bars.
Upon recruitment and passing the screening criteria, the outcome measures described in the next section were administered to all participants so that correlations between the WHO-DAS II and the other measures could be examined. In addition, the data obtained on the WHO-DAS II were used to examine internal-consistency reliability. To examine test-retest stability, only data from a subgroup of participants were used. This subgroup consisted of the participants who were randomized into an immediate treatment (IT) group for the larger project examining the effects of hearing aid intervention on quality of life. Although half of the participants were randomized upon recruitment into the project to the IT group and the other half to a delayed treatment (DT group), the failure of four participants to pass screening criteria resulted in 189 in the IT group and 191 in the DT group. At 2-weeks postrecruitment, baseline preintervention measures were obtained from the IT group to assess short-term, test-retest stability. At 10-weeks postrecruitment, baseline preintervention measures were obtained from the DT group to assess long-term, test-retest stability. Demographic information for all participants and as a function of treatment group is summarized in Table 1.
Demographics for All Participants and for Participants as a Function of Treatment Group
IT = immediate treatment group; DT = delayed treatment group.
Outcome Measures
WHO-DAS II. The WHO-DAS II, a 36-item instrument was administered. It provides six domain scores: communication, mobility, self-care, interpersonal, life activities at home and work, and participation, as well as a total score (WHO, 1999). If participants do not work, only 32 items are administered and the life activities score is based only on participation in home-related activities. Because most of the study participants were retired, the 32-item version was used in analyses. For each question, an individual is asked “In the last 30 days how much difficulty did you have in …?” Responses are given on a 5-point Likert-type scale from 1 (none) to 5 (extreme/cannot do). Raw scores are transformed into standardized scores, with 0 indicating the highest level of functioning and 100 indicating the lowest level of functioning.
APHAB. The APHAB is a 24-item questionnaire in which individuals report the amount of difficulty they have with communication or noises in various everyday situations (Cox and Alexander, 1995). The APHAB produces four subscale scores: ease of communication (EC), listening in background noise (BN), listening in reverberant conditions (RV), and aversiveness of sounds (AV). In addition, a global score, consisting of responses on the EC, BN and RV subscales was calculated. Scores for all subscales and the global score range from 0 to 100. For the EC, BN and RV subscales and the global score, lower scores indicate better performance and higher scores indicate poorer performance. The AV scale quantifies the negative reactions individuals have to aversive environmental sounds, with higher scores indicative of greater negative reactions and lower scores indicating fewer negative reactions.
HHIE. The HHIE is a 25-item questionnaire consisting of 13 emotional and 12 social/situational questions that produce three scores: an emotional score, a social score, and a total score (Ventry and Weinstein, 1982). Higher scores indicate greater perceived difficulties, and lower scores indicate less difficulty. Scores range from 0 to 52 for emotional, 0 to 48 for social, and 0 to 100 for the total score.
SF-36V. The SF-36V (Kazis et al., 1999) is the veteran's version of the SF-36 (Ware and Sherbourne, 1992). The instrument consists of 36 items that provide eight subscale scores, with 2 to 10 items in each. The subscales are physical function, role-physical, bodily pain, general health, vitality, social function, role-emotional, and mental health. In addition, two summary measures can be calculated with aggregate scores on the subscales. These are the physical component summary (PCS) scale score and the mental component summary (MCS) scale score. Subscale scores may be reported as raw scores or standardized scores based on a linear T-score transformation with a mean of 50 and a standard deviation (SD) of 10 (Ware et al., 2000). The standardized scores make it possible to meaningfully compare scores across subscales and to the PCS and MCS scale scores. The PCS and MCS summary scales are reported only as standardized scores that have a mean of 50 and a SD of 10 in the general, healthy, United States population (Ware et al., 1994).
Procedures
Participants were recruited over an 18-month period at each of the four sites. All participants were randomly assigned to the IT group or the DT group. The data used to examine convergent validity and internal-consistency reliability were obtained during one session with two parts. In the first part of the session, each participant was consented, administered the MMSE, and then a standard clinical audiologic assessment was completed. Each participant was counseled regarding the degree and type of hearing loss, hearing aid options were discussed, and earmold impressions were made.
Participants were then required to take a break of at least 30 minutes before the second part of the session to minimize any potential effects of fatigue on performance. The second part of the session involved baseline administration of the four outcome measures. Face-to-face administration was used for outcome assessment. The examiner read aloud each question to the participant who looked at an easel displaying all possible response alternatives for a specific item. The participant verbally responded to each item, and the examiner keyed the answer into a customized study database. Participants used a pocket talker during the initial questionnaire administration if the examiner observed that they had difficulty hearing. The order of outcome measure administration was randomized across participants to control for order effects.
IT group participants were seen 2 weeks after the initial session to obtain short-term retest data for the WHO-DAS II, and participants in the DT group were seen at 10 weeks after the baseline session to obtain long-term, retest data. Within the larger project examining the effect of hearing aid intervention on quality of life, the 2-week and 10-week retest visits also involved the fitting of hearing aids after outcome measure administration. Details of the hearing aid fitting can be found in McArdle et al (2005).
Analyses
All statistical analyses were performed using SPSS for Windows Version 13.0. First, descriptive data for all outcome measures administered at baseline were calculated. In addition the WHO-DAS II data at baseline and at the 2-week and 10-week retest were calculated, for the IT and DT-groups, respectively. Note also that SF-36V raw scores and standardized scores were calculated. Raw scores were calculated for descriptive purposes only. Standardized scores were used in all analyses.
Convergent validity was assessed by examining Pearson product-moment correlations between each of the domain and total scores from the WHO-DAS II and the subscale and global scores from the APHAB; the subscale and total scores from the HHIE; and, the subscale and summary scale scores for the SF-36V. Internal-consistency reliability was assessed by calculating Cronbach's α, which provides an estimate of the average reliability coefficient that would be obtained from all possible splits (Nunnally & Bernstein, 1994), for each of the WHO-DAS II domain scores and for the total score.
Short-term, test-retest stability for each WHO-DAS II domain score and for the total score was determined by computing intra-class correlation coefficients using a two-way random effect model between scores obtained at baseline and the 2-week retest for the IT group. For assessing long-term, test-retest stability, the data obtained at baseline and at the 10-week retest for the DT group were used. Intra-class correlation coefficients are often recommended in standard texts on psychometrics to estimate stability of measures over time (Nunnally and Bernstein, 1994).
Results and Preliminary Discussion
Descriptive Analysis
Table 2 shows the baseline scores (mean and SD) from the four outcome measures. In terms of the WHO-DAS II data, it can be seen that the greatest perceived problems occurred in the communication domain. This finding was not surprising. Interestingly, the next highest score was for problems in the mobility domain. These findings may reflect the demographics of the current sample whose average age was 68.7 years for the IT group and 70.2 years for the DT group. Comorbid conditions for participants in both groups included a high prevalence of arthritis (31%) and cardiovascular disease (32%) that would likely affect responses to questions in the mobility domain. The third highest domain score was for participation. This domain included items such as, “How much of a problem did you have joining in community activities?” and “How much of a problem did you have because of barriers or hindrances in the world around you?” The domain exhibiting the lowest average score (lowest perceived difficulties) was self-care. This is not surprising, as the items in this domain would be minimally affected by hearing loss.
Means and Standard Deviations Obtained at Baseline for the World Health Organization's Disability Assessment Scale II (WHO-DAS II), the Abbreviated Profile of Hearing Aid Benefit (APHAB), the Hearing Handicap Inventory for the Elderly (HHIE), and the Medical Outcomes Survey—Short Form 36-Veteran's Version (SF-36V)
EC = ease of communication; BN = background noise; RV = reverberant condition; AV = aversiveness of sound; PCS = physical component summary; MCS = mental component summary.
Standard scores are shown with raw scores in parentheses.
To determine if there were significant differences in the level of perceived difficulty as a function of domain, the data for each domain were analyzed by general linear model repeated-measures. Results confirmed a significant main effect of domain (F [5, 1895] = 117.19, p < .001) with a partial eta-squared value of 0.24. Post-hoc testing using t tests with Bonferroni corrections for multiple comparisons revealed the scores for the communication domain were significantly higher than for any other domain except mobility. Similarly, mobility was also significantly different than all other domain scores except communication. All other comparisons between domain scores reached statistical significance, except that between the household and participation domains.
To better understand the APHAB data shown in Table 2, comparisons were made with the normative equal-percentile data reported by Cox (1995) and Cox and Alexander (1995). The percentile levels indicate the percentage of individuals whose scores on a particular subscale were equal to or lower than the score associated with that percentile. For example, the 65th percentile for the EC subscale in the unaided mode indicates that 65% of successful hearing aid wearers yielded scores of 74 or less on the subscale before the hearing aid fitting. Equal-percentile profiles are available for older individuals with “none or mild subjective hearing problems” who do not use hearing aids (Cox, 1995) and for successful “users of linear hearing aids” in unaided and aided modes and as a function of the benefit from hearing aid use (Cox and Alexander, 1995).
In the present study, the mean EC, BN and RV scores collected at baseline for the participants were found to be higher than that obtained for 95% of older, unaided individuals with “none to mild subjective hearing problems,” suggesting that, comparatively, the present participants were experiencing a relatively great deal of difficulty listening in these environments. When the mean baseline scores for participants in the current study were compared with the unaided scores of “users of linear hearing aids,” the scores for the participants on the EC, BN and RV subscales were just slightly lower than the scores obtained for 20% of the hearing aid users. This suggests that the degree of difficulty experienced by the participants in the present study was on the lower end of that reported by successful hearing aid users.
Lower AV scores indicate less difficulty. The mean AV score of 24.06 found for the participants in the present study was slightly higher than the mean scores obtained for 65% of the normative sample with “none or mild subjective hearing problems” (mean, 21) and equivalent to that obtained for 65% of the normative sample of the successful “users of linear hearing aids—unaided” (mean, 24).
The mean data for the HHIE scores obtained in the present study were very similar to those obtained for individuals with similar, moderate degrees of hearing loss (i.e., 41 to 55 dB HL) by Ventry and Weinstein (1982). The means and SD for the data collected by Ventry and Weinstein were 42.7 (22.1), 24.2 (14.7), and 18.5 (8.5), for total, emotional, and social/situational scales, respectively, whereas those for the present participants were 41.4 (23.6), 20.4 (13.1) and 21.0 (10.5), respectively.
Table 2 shows both the raw SF-36V subscale scores and the standardized scores. The raw scores were included so that the SF-36V subscale data could be compared with the data of Stark and Hickson (2004) for 131 participants between the ages of 47 and 90 years old with mild-to-moderate hearing loss (88 men and 43 women). Stark and Hickson reported raw SF-36 scores (means and SD) of 22.67 (5.37) for physical function, 6.23 (1.79) for role-physical, 8.53 (2.80) for bodily pain, 17.74 (4.50) for general health, 15.56 (4.11) for vitality, 8.00 (2.11) for social function, 4.92 (2.11) for role-emotional, and 23.98 (4.53) for mental health. Comparison of these scores to those shown in Table 2 reveals similar mean responses and SD for the present participants.
The mean SF-36V MCS score of 51.17 in the present study indicates that in terms of the mental components of general health, the participants were similar to the general American population. The PCS mean score of 40.77, however, was just within 1 SD of the mean for the general American population (Ware et al., 1994). This finding is perhaps not surprising, because it is very similar to the mean PCS score of 40.9 reported by Abrams et al. (2002) for 105 veterans (67 men, 38 women) with mild-to-moderate hearing losses. For the Abrams et al. sample, the mean MCS scale score was 49.7.
WHO-DAS II Relationship to APHAB, HHIE, and MOS-SF36V
Table 3 shows the Pearson product-moment correlation coefficients obtained for the relationships between each of the WHO-DAS II domain scores and total score and the HHIE and APHAB subscale and summary scores (i.e., HHIE total and APHAB global scores). All correlation coefficients except those shown in italics were statistically significant at p < .01. Finding that most correlation coefficients were statistically significant was not surprising given the large number of paired data points examined.
Pearson-Product Moment Correlation Coefficients Calculated Between WHO-DAS II Scores and the Hearing Handicap Inventory for the Elderly (HHIE), the Abbreviated Profile of Hearing Aid Benefit (APHAB), and the Medical Outcomes Survey—Short Form 36-Veteran's Version (SF-36V)*
EC = ease of communication; BN = background noise; RV = reverberant condition; AV = aversiveness of sound;
PCS = physical component summary; MCS = mental component summary.
All correlation coefficients except those italicized were statistically significant at p < .01. Correlations which are at least moderate are shown in bold.
When convergent validity is examined, it is important to consider the magnitude of the relationships rather than statistical significance alone. As is common in interpretation of correlation coefficients, r values of less than .20 were considered as indicating that little if any relationship existed between the variables, r values of .20 to .40 as indicative of fair relationships, .40 to .60 as indicating moderate relationships, .60 to .80 as marked, and higher than .80 as highly related (Franzblau, 1958). In assessing convergent validity, it is expected that correlations that are at least moderate will be obtained between items measuring similar constructs. As can be seen in Table 3, none of the correlations were higher than marked, but many were in the moderate range (the moderate and marked correlations are bolded in Table 3). It is also relevant to note that according to McDowell and Newell (1987), correlation coefficients for the assessment of convergent validity of health-status measurements are typically .20 to .60 and will almost always be less than .70. Thus, most of the correlation coefficients obtained here are in keeping with the observation by McDowell and Newell for the expected range of correlation coefficients for health-status measures.
For the two hearing-specific instruments, the APHAB and the HHIE, most of the moderate correlations were with the WHO-DAS II communication domain score. Because hearing loss most directly impacts on communication, this finding was not surprising. It was somewhat surprising, however, that a moderate correlation was found with only one of the APHAB subscales: ease of communication. Although the correlations with the other APHAB subscale scores were only fair, the correlation between the WHO-DAS II communication domain and the APHAB global score was moderate, and the correlations between all APHAB scores and the WHO-DAS II communication domain score were statistically significant. These findings suggest that the WHO-DAS II communication domain score would be a valid tool for assessing activity limitations in individuals with adult-onset hearing loss.
Moderate correlations were also found between the WHO-DAS II communication and participation domains and the HHIE emotional, social, and total scores. For the WHO-DAS II interpersonal domain, moderate correlations were found with the emotional and total HHIE scores. Although the HHIE may map most closely to the WHO-ICF participation dimension, the three dimensions of body structure and function, activity, and participation are conceived of as interacting and being influenced by each other, as well as by personal and environmental factors (WHO-ICF, 2000). In fact, Stephens (2003) pointed out that although the WHO-ICF classification of “communication” is considered an activity, it gives rise to the same quantitative results in terms of complaints elicited from elderly people with hearing impairment, as does the WHO-ICF classification of “interpersonal interactions,” which is considered part of the participation dimension. Indeed, the moderate correlations between the HHIE subscale and total scores and the WHO-DAS II participation and interpersonal domain and total scores provides evidence supporting the use of the WHO-DAS II questionnaire in adults with acquired hearing loss.
The magnitude of the correlations coefficients between the WHO-DAS II scores and the SF-36V scores are similar to those reported for two other populations: individuals diagnosed with depression and individuals diagnosed with back pain (Chwastiak and Von Korff, 2003). For example, Chwastiak and Von Korff reported correlation coefficients between the SF-36 subscale scores and the WHO-DAS II communication domain score that ranged from negligible and fair (r = −.17 and r = −.21 for the correlations with SF-36 physical function for the back pain and depression samples, respectively) to moderate (r = −.51 and r = −.56 for the correlations with mental health for the back pain and depression samples, respectively). In the present sample of adults with hearing loss, the correlation coefficient between the WHO-DAS II communication subscale scores and the SF-36V physical function scores was fair (r = −.27), and the correlation with the SF-36V mental health score was moderate (r = −.50).
In the present study, correlation coefficients between the WHO-DAS II total score and the SF-36V subscale scores ranged from r = −.53 to r = −.62. Although the range was a bit narrower, it was relatively similar to that reported by Chwastiak and Von Korff (2003) for the depression sample (r = −.46 to r = −.74) and also for the back pain sample (r = −.55 to r = −.71). A similar range was also reported by van Tubergen et al. (2003) for their participants with ankylosing spondylitis (r = −.46 to r = −.70). 1
Finally, the magnitude of the correlation coefficients of the WHO-DAS II total score and SF-36V PCS and MCS scale scores were r = −.55 and r = −.64, respectively. These values are similar to those reported by Chwastiak and Von Korff (2003) for the correlation of WHO-DAS II total scores with SF-36 PCS (r = −.69 for the back pain sample; r = −.38 for the depression sample) and with MCS (r = −.58 for both samples) scale scores. Given that the SF-36V and the WHO-DAS II are both considered generic health-status instruments, albeit with different emphases in terms of health domains assessed, it is perhaps not surprising that the WHO-DAS II total score was moderately-to-markedly correlated with all SF-36V subscale scores as well as with the SF-36V PCS and MCS scale scores. The correlations between the WHO-DAS II and the SF-36V support the validity of the WHO-DAS II as a generic health-status instrument.
Internal-Consistency Reliability
Table 4 provides the estimates of Cronbach's α and the standard error of measurement for each of the WHO-DAS II domain scores and the WHO-DAS II total score. Cronbach's α values for the domain scores were .68 to .91. This range is similar to that reported by Chwastiak and Von Korff (2003) for the back pain and depression samples, for which ranges of .65 to .91 and .68 to .91 were reported for each sample, respectively. Cronbach's α of .94 obtained in the present study for the WHO-DAS II total score was also similar to the value of .95 obtained in both patient groups by Chwastiak and Von Korff (2003). In interpreting Cronbach's α, Carmines and Zeller (1979) emphasize that if a scale is to be widely used, α should not fall below .80. This criterion is met by the three scores most relevant to the population of individuals with adult-onset hearing loss: communication, participation, and total scores. In fact, the criterion was met by all WHO-DAS II domain scores except those for the interpersonal domain. For this reason, caution should be taken in using the WHO-DAS II interpersonal domain in adults with acquired hearing loss.
Cronbach's α and the Standard Error of Measurement (SEM) for the World Health Organization's Disability Assessment Scale II (WHO-DAS II) Domain and Total Scale Scores
Standard errors of measurement are important to note as they can be used to estimate 95% confidence intervals for the true score of an individual (Demorest and Walden, 1984). For the communication domain, the true score could vary between ±14.40 around the observed scores. The 95% confidence interval for the participation domain is ±11.48 and for the total score it is ±6.54. These confidence intervals can be compared to the 95% confidence interval for the HHIE total score, which was reported to have a standard error of measurement of 6 (Ventry and Weinstein, 1982), resulting in a 95% confidence interval for a single score of ±12.
Test-Retest Reliability
Tables 5 and 6 show the means and SD obtained at baseline and at retest for the participants in the IT and DT groups, respectively. Retest for both groups occurred immediately before the participants were fitted with hearing aids. The IT group was retested 2 weeks after baseline, and the DT group was retested 10 weeks after baseline. The difference in retest intervals allowed for the examination of both short-term (i.e., 2-week retest) and long-term (i.e., 10-week retest) stability. Inspection of the tables reveals that the mean scores were slightly higher than at baseline for both retest periods. These results suggest that, in the absence of hearing aid intervention, both short- and long-term retest scores are likely to increase, showing poorer self-perceived health status.
Health Organization's Disability Assessment Scale II (WHO-DAS II) Domain and Total Scores for the Immediate Treatment (IT) Group Participants Obtained at Baseline and at the 2-Week Retest
World Health Organization's Disability Assessment Scale II (WHO-DAS II) Domain and Total Scores for the Delayed Treatment (DT) Group Participants Obtained at Baseline and at the 10-Week Retest
The intra-class correlation coefficients (ICC), calculated to assess short-term and long-term stability, are shown in Table 7. First, it is important to note that the ICC values calculated for the 2-week retest period for all domain scores and the total scores are at .80 or higher, indicating that the WHO-DAS II has acceptable short-term test stability (Nunnally and Bernstein, 1994). The ICC values for the 10-week retest period indicated acceptable long-term test stability for all scores except interpersonal and household scales. For this reason, caution should be taken in using the WHO-DAS II interpersonal and household domains to assess long-term outcomes of hearing aid intervention.
Intra-class Correlation Coefficients for Short-and Long-Term Test-Retest Stability Estimates for the World Health Organization's Disability Assessment Scale II (WHO-DAS II) Domain and Total Scores
Guidelines provided by Demorest and Erdman (1988) were used to calculate the 90% and 95% critical differences obtained by examination of the distribution of short-term and long-term retest difference scores for the WHO-DAS II domain and total scores. These data are shown in Table 8. For comparison purposes, the 90% and 95% critical differences obtained for the short-term and long-term retest difference score distributions from the HHIE total and APHAB global data from the participants in the present study were calculated. The values shown in Table 8 provide a normative reference for drawing inferences about the benefits of hearing aid intervention. For example, if the change score of an individual exceeds the values for the 90th percentile, then we can conclude with 90% confidence that true benefit was obtained. Similarly if the change score exceeds the 95th percentile, then we can conclude with 95% confidence that true benefit was obtained.
Percentile Points for Short- and Long-Term Retest Difference Distributions for the World Health Organization's Disability Assessment Scale II (WHO-DAS II) Domain and Total Scores, the Abbreviated Profile of Hearing Aid Benefit (APHAB) Global Score, and the Hearing Handicap Inventory for the Elderly (HHIE) Total Score
Inspection of Table 8 shows the short-term and long-term 90% and 95% critical differences for the WHO-DAS II domain scores are very similar to, but generally smaller than, the critical differences obtained for APHAB global and HHIE total scores. The smallest critical differences were obtained for the WHO-DAS II total scores. The larger the critical difference, the more robust an effect of treatment would need to be to conclude that a true change in status has occurred. Disease-specific instruments, such as the APHAB and HHIE, are expected to be more sensitive to the effects of hearing aid treatment than would any generic instrument. Thus, the finding of relatively smaller critical differences for WHO-DAS II domain and total scores was expected, as it is likely that hearing aid intervention will result in smaller change scores on WHO-DAS II than on disease-specific instruments.
Summary and Conclusions
In the current climate of competition for limited health care resources, evidence in the form of quantitative outcome measures that support an improvement in generic health status as a result of hearing aid intervention are needed to compare the outcomes of audiologic intervention with the outcomes obtained for the treatment of other chronic diseases or disorders (Beck, 2000). The present study was designed to examine the psychometric properties of a new generic health-status instrument, the WHO-DAS II, to determine its potential utility in the population of individuals with adult-onset hearing loss.
An important psychometric attribute for any measurement instrument is validity. One aspect of construct validity—convergent validity—was assessed by examining the correlations between WHO-DAS II scores and scores from two widely used disease-specific instruments, the APHAB and the HHIE, and one popular generic measure, the SF-36V. Results support the conclusion that the WHO-DAS II communication and participation domain scores, as well as the total score, are not only valid generic, health-status measures but are also valid measures of the functional impact of adult-onset hearing loss.
Further support for considering the use of the WHO-DAS II communication and participation domain scores as well as the total score as a generic outcome measure for audiologic intervention in individuals with adult-onset hearing loss comes from the demonstration of good internal-consistency reliability and good test-retest stability with reasonable short-term and long-term 90% and 95% critical difference values for a true change in individual scores. In drawing this conclusion, it is important to recall that whereas the communication and participation domain scores provide information about specific areas that contribute to the overall health status of an individual, the questions are not only relevant to individuals with hearing loss but are also relevant to all populations, whether or not a hearing disorder is present or absent. Thus, due to their applicability across populations and conditions, the WHO-DAS II domain scores and the total score to which they aggregate are generic measures of health status.
Although some caution needs to be taken in the generalization of results found in this study because most participants were men drawn from the VA population, the similarity of the mean scores between the present sample and those reported for more general samples by Cox and Alexander (1995) for the APHAB, by Ventry and Weinstein (1982) for the HHIE, and by Stark and Hickson (2004) for the SF-36 suggests that the WHO-DAS II may be useful with many groups of adults with acquired hearing loss.
Finally, the present work did not address the question of whether the WHO-DAS II communication domain, participation domain, and/or total scores are responsive to audiologic intervention. This issue is addressed in a companion report by McArdle et al. (2005).
Footnotes
1
von Turbergen et al. did not report any correlations other than those for the WHO-DAS II total score and the SF-36 subscale scores.
Acknowledgment
We acknowledge the following individuals for their assistance in data collection and management: Gene Bratt, Paige Harden, Joseph Mikolic, Amanda Pillion, Judith Reese, and Maureen Wargo.
