Abstract
Purpose:
The Disabilities of the Arm, Shoulder, and Hand (DASH) is the most widely used patient-oriented outcome measure for the upper extremities in the world, and high reliability and validity of it has already been confirmed. However, there are several problems with using the DASH, some of which are culturally related. We aimed to (1) develop a patient-oriented disease-specific outcome measure for patients with disorders of the hand and elbow, which we call the HandQ and (2) examine the practical applicability, reliability, and validity of the HandQ for any patient with disorders of the hand and elbow.
Methods:
A total of 216 patients were surveyed with the HandQ, as well as the Hand20 and the DASH to assess psychometric characteristics.
Results:
There were no considerable floor and ceiling effects regarding the total HandQ score. Test–retest reliability and internal consistency determined using the intraclass correlation coefficient (0.942) and Cronbach’s α test (0.961) were excellent. The HandQ was well correlated with the Hand20 and the DASH scores. Scree plot showed unidimensionality of the HandQ, and the graphical model showed the questionnaire items of the HandQ had reasonable correlation among each item.
Conclusions:
The HandQ has a sufficient reliability and internal consistency, and an excellent validity, and was shown to be able to be practically applicable in all patients with hand and elbow disorders.
Keywords
Introduction
Recently, among work-associated musculoskeletal disorders, musculoskeletal disorder of the upper extremity has been taking a large concern second only to lumbar back pain as a social problem. 1 In the last few decades, measuring outcome related to quality of life accurately both before and after treatment has been drawing great attention in patients with upper extremity disorder in addition to the functional outcome. The questionnaire, Disabilities of the Arm, Shoulder, and Hand (DASH) was developed in 1996 1 –3 and now it is used as the gold standard outcome measure for patients with disorders of the upper extremity. DASH has been translated into more than 20 languages and is used worldwide in clinical setting along with clinical measures such as grip strength or range of motion. 2,4 –6
However, (1) The DASH includes items that can be affected by whether the disorder side is the dominant hand or not. That might cause floor effects on some items when the suffered disorder is on the nondominant hand side. (2) The items regarding meals, sports, or sexual activities are not fit to the Asian culture, which consequently regularly results in lots of missing values; for the item related to difficulty in sex, about 50% of patients do not answer it among elderly people. (3) Some questions contain items that can be affected by the trunk or lower extremity problems. (4) We cannot use the DASH for patients under 18 years old or over 65 years old, because it was not validated previously in those populations, and it includes a sexual-related question. 2 There is also the Hand20, which is a self-administered questionnaire for upper extremity disorders and is accompanied by illustrations to make items easy to understand. 7 However, this also has problems regarding answer distribution, and some confounding factors among items.
Therefore, we aimed (1) to develop a patient-oriented disease-specific outcome measure of health-related quality of life (HRQOL) for patients with disorders of the upper extremities (the HandQ), (2) to examine the practical applicability, reliability, and validity of the HandQ for any patients with disorders of the hand and elbow, (3) to compare beween DASH, Hand20 and HandQ, and (4) to assess the effect of dominant hand on patient-oriented outcome measure focusing on upper extremities.
Materials and methods
This study was conducted with the approval of the Ethics Committee of our institute (reference number: 3514) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.
Establishment of expert panel/committee
To decide the number of domains and the content of domains, 34 hand surgeons and 5 hand therapists were included in this project.
Dimensionality of construct
First of all, four major domains were decided: (1) pain and numbness, (2) difficulty of activities of daily living affected by pain, dexterity, strength, range of motion, and tactile sense, (3) mental state, and (4) social activities. Although we believe that the pain is a large concern for patients who suffer hand problems, among questionnaires regarding hand and elbow available currently include only a few number of pain questions (only 1 item out of 20 for the Hand20, and only 3 items out of 30 for the DASH). To investigate the significance of each domain in advance, we began by asking a question, “Which domain is of the most concern to you with your hand condition?” One hundred and forty-six patients answered this question, and the result demonstrated that about one-third of them chose pain for the greatest concern. In response to the result of the question, we concluded that items associated with pain should be included more than in previous questionnaires.
Development of HandQ
Among patients who answered the Hand20 or the DASH, some complained that some questions made them waver when they could not perform those activities not because of hand or elbow problems but lower extremity problems, for example, “difficulty of carrying a heavy object” (DASH question #11). Additionally, some patients wavered with the question regarding “difficulty of using a knife to cut food” (DASH question #16), when they suffered nondominant hand disorder, even though the cover page explains how to answer for such a situation. Therefore, we tried to declare clearly in every question that the question is asking for which hand and asking regarding disability only derived from hand or elbow problem. We also tried to make questions concisely, avoided asking about rarely performed action or activity, and removed questions regarding action that depended on the hand dominance.
“Subject matter experts” independently reviewed the scale; a four-member expert team ascertained content validation. The team checked whether items reflected what they were intended to measure, assessed the response options, scoring of the scale as well as its feasibility. Each expert provided an opinion on each question as “appropriate,” “inappropriate,” or “needs modification.” Any remarks or recommendations for each item were also recorded.
We made questions on 185 items at first and chose 29 from among them from the viewpoint described above.
Preliminary pilot testing
A pilot study was performed with 60 patients, and finally, 25 items were chosen by weighting of the item numbers in each domain and from the results of histogram and Akaike information criterion (AIC) network on the pilot study.
Each item of the HandQ is presented in Table 1. The HandQ contains 25 items; pain domain, 4 items; difficulty of daily living activity which included pain, strength, range of motion, dexterity, and tactile sensation to some extent, 14 items; mental state, 5 items; and social activity, 2 items.
The content of the HandQ.a
a Scoring: Each item marked score as 0 (no difficulty) to 4 (unable). HandQ = 25 × (sum up each item score from 1 to 25)/answered question number.
Patient and data collection
Data were collected from November 2011 to July 2014. A total of 216 patients were included (male 102, female114, mean age 58.6 years, SD 17.2, range 10–88). After informed consent was obtained from the patients to participate in this study, they answered HandQ, Hand20, and DASH. A total of 203 patients were right hand dominant, and 95 cases suffered right hand disorders, 85 cases left hand, and 36 cases both hands. Various hand and elbow disorders were included: hand and wrist fracture, carpal tunnel syndrome, cubital tunnel syndrome, trigger finger, osteoarthritis of the first carpometacarpal (CMC) joint or other finger joint, rheumatoid arthritis, tumor, congenital hand, and so on.
Psychometric characteristics of the HandQ
Floor and ceiling effects
To assess floor and ceiling effects, the proportion of answer frequencies with the worst (0) and best possible (4) values on the five-point Likert-type scale was calculated for each of the 25 survey items. Scores with floor or ceiling effects may not detect improvements or deteriorations in the patients because they are already at the lower or upper end of the scale. When mean minus or plus SD of each questionnaire showed much less than the lowest score or much more than highest score, floor or ceiling effect was noted.
To assess the effect of dominant hand on patient-oriented outcome measure focusing on upper extremities, we counted the number of the items that have the floor effect of the HandQ, Hand20, and DASH for the patients who suffered simply left side and excluded left-handed or ambidextrous patients.
Reliability
Internal consistency
Internal consistency of the HandQ was assessed using Cronbach’s α because it provides a measurement of the strength of the relationship among the items in the questionnaire.
Test–retest reliability
In patients from whom informed consent was obtained, we gave a second repeat of the HandQ at 3–5 weeks following the first completion, for test–retest reliability analysis. The second questionnaire included a question asking whether the difficulty had changed since completion of the first questionnaire, and only subjects reporting no change were included in the reliability analysis. The reproducibility (test–retest reliability) of the HandQ was assessed by calculating the intraclass correlation coefficient (ICC). The ICC was calculated between the responses of the first (test) and the second questionnaire (retest) for the total score.
Validity
Criterion validity was assessed by comparison of the HandQ with the Hand20 and DASH rating scales. Correlations were evaluated using Spearman’s correlation coefficient.
Construct validity refers to the extent to which a score measures what it is supposed to measure. We used the scree plot to identify the point where the curve starts to level off. The degree of correlation among the items in the HandQ was evaluated using the factor analysis and AIC network to examine the latent structure of the HandQ construct validity; this is a graphical model to assess the relationship among items using AIC. 8 –11 Graphviz was used for drawing graphs (version 2.38.0, AT&T, Dallas, Texas, USA)
Subsequent validation
Responsiveness
Standardized effect sizes were calculated as a measure of responsiveness following treatment that includes surgery, external fixation, steroid injection, medication, and so forth using the longitudinal data. Effect size was defined <0.20 as trivial, 0.20–0.50 as small, 0.50–0.80 as moderate, and >0.80 as large.
Statistical analysis
Data analysis was done using IBM SPSS Statistics version 24.0. (IBM Corp., Armonk, New York, USA).
Results
Patient characteristics
Characteristics of the study population is presented in Table 2. A total of 216 patients were included for the HandQ, of which 209 patients and 201 also took the Hand20 and DASH, respectively. Mean age was 58.6 (SD 17.2) years, which included 102 males and 114 females. Dominant hand, suffered side, and diagnosis for hand problem were described. Percentages of missing values in each item are presented in Table 3. The DASH has several items that had relatively high missing value rate, especially question 8 which was asking the difficulty of gardening (percentage of missing value was 11%), and question 21 which was asking the difficulty of sexual activity (percentage of missing value was 22%). The HandQ and Hand20 did not include the items that had much more than 2% of percentage of missing value.
Descriptive characteristics of the study population.
Missing value.
DASH: Disabilities of the Arm, Shoulder, and Hand.
The cumulative numbers of each answer regarding each questionnaire are shown in Figure 1. The HandQ showed a good distribution, however, the Hand20 showed an odd distribution, both edge numbers 0 and 10 were chosen more, and midpoint 5, and 3 and 7 were also chosen more than other numbers. As the Hand20 answer form includes four face illustrations below 0, 3, 7, and 10, this face image might affect those odd distributions.

The cumulative number of each answer regarding each questionnaire. HandQ showed a good distribution, however, Hand20 showed an odd distribution, its answer includes four face illustrations below 0, 3, 7, and 10, and these faces might affect those odd distributions.
Floor and ceiling effects
The mean scores of the HandQ, Hand20, and DASH were 42.0 (SD 23.4), 48.6 (SD 25.5), and 40.5 (SD 23.1), respectively. There were no considerable floor and ceiling effects regarding the total score of the HandQ, Hand20, and DASH (Table 4).
Mean score, SD, floor and ceiling effects in the HandQ, Hand20, and DASH.
DASH: Disabilities of the Arm, Shoulder, and Hand.
a When mean–SD obtained the highest (100) or more, or lowest (0) or less score, an effect was noted.
To assess the effect of dominant hand on patient-oriented outcome measure focusing on upper extremities, we counted the number of the items that have the floor effect of the HandQ, Hand20, and DASH for the patients who suffered simply left side, and excluded left-handed or ambidextrous patients (Table 5). The HandQ had two items that had floor effect, Hand20 had two items, and DASH had seven items that had floor effect.
Floor effect regarding right-handed patients who suffered left upper extremity disorders.
DASH: Disabilities of the Arm, Shoulder, and Hand.
Internal consistency and reliability
A total of 52 patients were included for the test–retest analysis. Mean duration between test and retest was 25.2 days (SD 19.2). Cronbach’s α for the HandQ, Hand20, and DASH were 0.961, 0.954, and 0.970, suggesting a high level of internal consistency for each test. Summary of the test–retest data is presented in Table 6. The overall ICC value of the HandQ, Hand20, and DASH were 0.942, 0.954, and 0.940, respectively, indicating adequate reproducibility for those questionnaires.
Summary of test–retest data for each questionnaires.a
ICC: intraclass correlation coefficient; DASH: Disabilities of the Arm, Shoulder, and Hand; CI: confidence interval.
a The total score of 100 points, worse score; 0, best score.
b Mean duration between test and retest was 25.2 days (SD 19.2).
Criterion validity
The criterion validity for the HandQ was compared with Hand20 and DASH (Table 7). Overall, the total HandQ score had significantly strong correlations with total Hand20 and DASH scores (Spearman’s ρ = 0.909 and 0.819, and both p values were <0.001, respectively).
Criterion validity.
DASH: Disabilities of the Arm, Shoulder, and Hand.
Construct validity
Construct validity was evaluated to prove whether the HandQ measures the construct it was claimed to be measuring.
The appropriate number of dimensions was considered to be one from the scree plot (Figure 2). This result means that we can simply sum up all the items of the HandQ to evaluate the hand overall. Factor analysis demonstrated that items could be divided into four domains (Figure 3).

Scree plot with eigenvalue of the HandQ. The eigenvalue of the first component indicated 53% of total variance.

Factor plot of HandQ in 3D spaces. Factor analysis shown items were clustered in four domains; pain, difficulty of daily living activity, mental state, and social activity.
In addition to this conventional method, the degree of correlation among the items in the HandQ was evaluated using the AIC network to examine the latent structure of the HandQ construct validity; this is a graphical model used to assess the relationship among the items. 10,11 Based on the spatial association of the calculation of each item (AIC network), the questionnaire items in the HandQ had ideal relationships and distances, respectively. Moreover, there were no items with confounding relationships with many other items, suggesting excellent construct validity of the HandQ (Figure 4). The AIC network showed the four domains of the HandQ: (1) items related to pain (the upper middle region, brown items); (2) items related to the difficulty of daily living activity which seemed to be affected by pain, dexterity, strength, range of motion, and tactile sense to some extent for each item (the left region, light blue items); (3) items related to mental state(the upper middle region, blue items); and (4) items related to social activities (the right region, pink items). Regarding the AIC network of the Hand20 (Figure 5), the questionnaire items of the Hand20 showed there were three items (Q05, Q09, and Q13) that had heavily confounding relationships (Q05, Q09, and Q13; 7, 7, and 10 confounding relationships, respectively) with many other items, suggesting less acceptable construct validity of the Hand20. With regard to the AIC network of the DASH (Figure 6), the eight items on the middle right region of this figure and Q03 were disconnected with the other 22 items. Nine items demonstrate single degree, which decreases efficiency of the network.

The items of the HandQ on the AIC network showed ideal relationship, suggesting excellent construct validity of the HandQ. The AIC network showing the four domains of the HandQ: pain, difficulty of daily living activity, mental state, and social activities. AIC: Akaike information criterion.

The questionnaire items of the Hand20 on the AIC network showed there were three items that had heavily confounding relationships (Q05, Q09, and Q13; 7°, 7°, and 10°, respectively) with many other items, suggesting less acceptable construct validity. AIC: Akaike information criterion.

The questionnaire items of the DASH on the AIC network showed the eight items on the middle right region and Q03 were disconnected with the other items. Nine items demonstrate single degree, which decreases efficiency of the network. DASH: Disabilities of the Arm, Shoulder, and Hand; AIC: Akaike information criterion.
Responsiveness
A total of 120 patients took a second survey after treatment (Table 8). Mean duration between the first survey and the second survey was 68.2 days (SD 39.7). Effect size of the HandQ, Hand20, and DASH were −1.47, −1.49, and −1.12, respectively. The standard response mean of the HandQ, Hand20, and DASH were −1.71, −1.63, and −1.43, respectively.
Responsibility was shown as effect size and standardized response mean.a
DASH: Disabilities of the Arm, Shoulder, and Hand.
a Mean duration between test and retest was 68.2 days (SD 39.7).
Discussion
Recently, there has been a growing interest in not only functional outcome but also HRQOL after surgery. 12 The HRQOL consists of a whole body QOL questionnaire and a body part-specific questionnaire. 13 Other important domains including body image, mental status, or social activities seem mandatory in the evaluation of the HRQOL of patients with disorder of upper extremity.
The Hand20 is accompanied by illustrations to help imagine the activity of each item. 7 However, the Hand20 includes only one question regarding pain, and three items having lots of confounding with other items which was shown using the AIC network graphical model. The reason for confounding with other items was assumed that those items were asking about relatively complicated actions. For example, item 9 “Peel an apple using an knife” is constructed with the following action: holding the apple, rolling the apple, pushing the knife, and requiring dexterity as well. Moreover, the answers of the Hand20 are constructed with a 0–10 numerical rating scale with four face scales drawn below numbers 0, 3, 7, and 10. The cumulated answer number shows numbers 0, 3, 5, 7, and 10, especially 0 and 10 are higher than other numbers. Usually, both edge number and midpoint (e.g. 0, 5, and 10) tended to be chosen, however, as the Hand20 has two additional faces below the number of 3 and 7, 7 it might be a reason for this odd distribution of chosen cumulative number.
The DASH is the most widespread questionnaire regarding the upper extremities 1,2,4 –6,14 which has already been shown to have reasonable validity and reliability. However, the DASH had lots of missing values in our study. This is probably due to the difficulty in understanding redundant sentences that cannot be understand intuitively, moreover, gardening- and sexual activity-associated items were frequently avoided by Asian patients because of cultural differences. Additionally, limiting to right-handed patients and just left side suffered patients, seven items showed floor effect contrary to the HandQ and Hand20, both of which had just two items with floor effect. Thus, the DASH may include more dominant hand dependent questions. The graphical model showed eight items on the lower right region of this figure and Q03 were disconnected with the other 22 items. Nine items demonstrate single degree, which decreases efficiency of the network.
The measurement properties of the HandQ appeared to be excellent, as the total score varied considerably and no floor or ceiling effects were seen. Moreover, the internal consistency appeared to be sufficient. We evaluated the criterion validity of the HandQ by comparing it with the gold standard measures of the DASH. The HandQ had significant correlations with associated scores as expected. These results suggested the HandQ can evaluate a different aspect of HRQOL.
There are several questionnaires regarding upper extremities in Asian country. 7,15,16 The Patient Reported Wrist Evaluation (PRWE) includes a relatively larger number of pain items, and this reflects patient motivation to go to hand clinics. 17,18 However, the PRWE was developed just to focus on the wrist, and that is why, we did not include this questionnaire in our study.
Our study included a couple of limitations. First, though we selected the questionnaire items which were considered not to be affected by difference in cultures or lifestyles in the different countries, we tested this questionnaire only among the patients in our country with Japanese version. Regarding the English version of the HandQ, it was carefully translated a Japanese version to create an equivalent version in English. First, a translation was made by a bilingual translator whose mother tongue was Japanese. This was corrected by an independent bilingual translator whose mother tongue was English. Then, the English version was translated back into Japanese by another independent translator to check the inconsistency and ambiguity. Second, we did not compare the HandQ to the Patient-Reported Outcome Measurement Information System which was established with funding from the National Institutes of Health in 2004, 19 the items of which were pooled with item response theory, and the examination is performed with a computer adaptive test.
In conclusion, we have successfully developed and validated a disease-specific measure, the HandQ, to evaluate not only physical function but also various aspects of HRQOL in patients with upper extremity disorder. The HandQ had sufficient reliability and internal consistency, and excellent validity. The HandQ was shown to be able to be practically applicable in all patients with disorders in the upper extremities.
Footnotes
Acknowledgment
We would also like to thank Mr Larry Frumson for helping to edit this manuscript and for suggesting the naming of the “HandQ” that was originally named JHand in Japanese version.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
