Abstract
Introduction:
The Endometriosis Health Profile-30 is a disease-specific patient-reported outcome measure of health-related quality of life. Cross-cultural validation of the Endometriosis Health Profile-30 has been performed for several translated versions. The aim of this study was to evaluate the measurement properties of a Norwegian version Endometriosis Health Profile-30.
Methods:
This study was designed as a cross-sectional anonymous postal questionnaire study. A total of 157 women with endometriosis were included during a period from 2012 to 2013. Women aged 18–45 years were recruited from the Norwegian Endometriosis Association. Principal components analysis with varimax rotation was used to assess construct validity. Short Form-36 was used to determine convergent validity. Cronbach’s alpha was used to measure internal consistency. Intraclass correlation coefficients and paired t-tests were used to evaluate test–retest reliability. Floor and ceiling effects were estimated.
Results:
Factor analysis resulted in a three and five-factor model for the core and modular questionnaire, respectively. Factor analysis could not support construct validity of the scales self-image and treatment. The Norwegian version Endometriosis Health Profile-30 demonstrated acceptable internal consistency and test–retest reliability, except for the scale relationship with children. Floor effects were observed for the scales self-image (20.1%), work life (33.9%), relationship with children (34.2%), and medical profession (20.5%).
Conclusion:
The construct self-image does not seem to be measured appropriately by the Norwegian version Endometriosis Health Profile-30, suggesting a lack of cross-cultural validity of the Endometriosis Health Profile-30. With multinational studies increasing, adequate translation, cross-cultural adaptation, and cross-cultural validation of instruments are essential to ensure equivalence in languages and cultures other than the original.
Keywords
Introduction
Chronic diseases such as endometriosis can affect health-related quality of life (HRQoL). 1 HRQoL is a multidimensional concept that refers to the patient’s general perception of the effect of her disease and treatment on physical, psychological, and social aspects of daily life.2– 4 HRQoL is commonly assessed as a patient-reported outcome, that is, a clinical outcome reported directly by the patient.3,5 A patient-reported outcome measure (PROM) of HRQoL can be generic, applicable to patients with a variety of conditions, or disease-specific. 6 Disease-specific instruments may detect change in important aspects of certain conditions not accessible by generic instruments. 7 The Endometriosis Health Profile-30 (EHP-30) is a disease-specific PROM of HRQoL consisting of a core and modular questionnaire.8,9 The original English version was developed in the United Kingdom and first presented in 2001. 8 The items, or questions, were generated from in-depth interviews of 25 patients with endometriosis visiting a gynecology clinic at a large tertiary referral hospital in Oxford. 8
The EHP-30 is available in many languages. Evaluation of measurement properties, that is, reliability, validity, and responsiveness, has been performed for several of these, however primarily for the core questionnaire.10–15 With multinational and multicultural studies increasing, adequate translation, cross-cultural adaptation, and cross-cultural validation are essential to ensure equivalence of a PROM in languages and cultures other than the original. 16 The Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) group has developed user-friendly and easily applicable checklists to evaluate the methodological quality of primary studies on measurement properties. 17 According to these checklists, few, if any, of the EHP-30 validation studies have included adequate sample sizes for test–retest reliability analysis. 18 Test–retest reliability is an important aspect of reliability, ensuring that changes detected by an instrument are not random. 3 However, analysis depends on patients being in stable condition. Although endometriosis is sometimes characterized by disease fluctuation, it is also thought to be stable for longer periods of time. Fewer may be in stable condition among patients attending secondary and tertiary referral centers compared with members of patient registries and patient associations.
The aim of this study was to evaluate the measurement properties of the Norwegian version EHP-30 (NO-EHP-30) and thereby its suitability for future use in endometriosis research in Norway or as part of multinational studies.
Methods
Participants, study design, and data collection
Women with endometriosis were recruited from the Norwegian Endometriosis Association. Inclusion criteria were 18–45 years of age and surgically confirmed diagnosis. Cross-sectional data collection was performed from 2012 to 2013. A set of two anonymous postal questionnaires was sent to potential participants. Each questionnaire included questions on background information, NO-EHP-30, and Short form-36 version 2 (SF-36v2). 19 Participants were asked to fill in the second questionnaire 1 month after completing the first questionnaire, for test–retest reliability analysis. A period of 1 month between the test and retest was chosen to minimize memory effects. A period of 1 month was also thought to increase the chances of the respondents being in the same phase of their menstrual cycle, which in turn may be relevant regarding endometriosis complaints and reporting of HRQoL.
Background information
Background information included age, height, and weight. Diagnostic delay was recorded as year receiving diagnosis minus year the participant started having symptoms. Furthermore, a multiple choice question on organs/anatomic locations affected by endometriosis and two open questions inviting free description of previous and present treatment were included. Finally, the participants were asked whether they had experienced dysmenorrhea, pelvic pain, dysuria, and/or dyschezia during the 4 weeks prior to answering the questionnaire.
EHP-30
The responses are based on patient experiences during the 4 weeks prior to answering the questionnaire. The core questionnaire is composed of 30 items grouped into five scales: pain (11 items), control & powerlessness (6 items), emotional well-being (6 items), social support (4 items), and self-image (3 items). The modular questionnaire is composed of 23 items grouped into 6 scales: work life (5 items), relationship with children (2 items), sexual intercourse (5 items), medical profession (4 items), treatment (3 items), and infertility (4 items). The modular questionnaire is characterized by the possibility of responding only to scales which the patient deems relevant to her. All scales can achieve a minimum score of 0, indicating low disability, and a maximum score of 100, indicating high disability. All items of a scale must be answered to be able to calculate a scale score. The only exception is the scale sexual intercourse, where each item may be relevant independently of the other items of the same scale. Thus, the scale score for the scale sexual intercourse is calculated by omitting items which are not relevant.
Translation and cultural adaptation of the Norwegian version EHP-30
The Norwegian language has two distinct written varieties, “bokmål” and “nynorsk.” 20 “Bokmål” is the most commonly used variety. The EHP-30 was therefore translated to “bokmål.” The translation and cultural adaptation of the NO-EHP-30 was conducted by Oxford outcomes according to recommended guidelines, 21 (Supplementary material 1).
SF-36v2
The Short form-36 is a generic PROM of HRQoL composed of 36 items, one item assessing health change and 35 items assessing eight health concepts representing eight scales: physical functioning (10 items), role limitations due to physical problems (4 items), bodily pain (2 items), general health perceptions (5 items), vitality (4 items), social functioning (2 items), role limitations due to emotional problems (3 items), and mental health (5 items).19,22 All scales can achieve a minimum score of 0, indicating worst possible health, and a maximum score of 100, indicating best possible health. QualityMetric Health OutcomesTM Scoring Software 4.5 from OptumInsight Life Sciences, Inc, was used to score SF-36v2.
Sample size calculation
Correlation coefficients play a central role in this study. We used Fisher’s z transformation to estimate 95% confidence interval for a correlation coefficient r. 23 The confidence interval for a correlation coefficient r is widest when r = 0.50. We consider it sufficient with a precision of ±0.10, that is, when the length of the confidence interval for r is at most 0.20. 10 For a correlation coefficient of 0.50 with a sample of 150 patients, this confidence interval will be 0.40–0.60. We therefore decided to include at least 150 women with endometriosis in our study.
Psychometric evaluation and statistical analysis
Construct validity, reliability, and interpretability of the NO-EHP-30 were assessed. We used the taxonomy, terminology, and definitions of measurement properties suggested by the COSMIN study. 24 Hypotheses-testing was specified as assessment of convergent validity where it could be misinterpreted as hypotheses-testing associated with factor analysis. Reliability was specified as test–retest reliability where it was thought to increase clarity. All analyses were performed with IBM SPSS Statistics, version 22.
Construct validity
Structural validity
Exploratory factor analysis was used to assess structural validity. 25 Principal components analysis with varimax rotation was used to identify the different potential components with eigenvalues greater than 1. 26 Items with factor loadings ⩾0.40 in a factor were included in the factor.
Hypotheses-testing
SF-36v2 was used for hypotheses-testing to assess convergent validity.17,27,28 We hypothesized the strongest correlations between EHP-30 pain and SF-36v2 bodily pain, and EHP-30 emotional well-being and SF-36v2 mental health. We further expected a strong correlation between EHP-30 social support and SF-36v2 social functioning, and EHP-30 work life and SF-36v2 role-physical. After obtaining the results of the factor analyses, we hypothesized a strong correlation between EHP-30 control & powerlessness and SF-36v2 bodily pain, and EHP-30 relationship with children and SF-36v2 role-physical. Associations between scales of the EHP-30 and the SF-36v2 were calculated by Spearman’s rho correlation coefficient. There are no widely accepted criteria for defining a strong versus moderate versus weak correlation. 29 Values 0.20–0.39 were considered to indicate weak correlations, values 0.40–0.59 moderate, values 0.60–0.79 strong, and values 0.80–1.00 very strong correlations.
Reliability
Internal consistency
Cronbach’s alpha and corrected item-total correlations were used to measure internal consistency. Cronbach’s alpha above 0.70 were considered to indicate acceptable internal consistency reliability for group comparisons, and values above 0.90 for individual comparisons. 28 Item-total correlations were corrected for overlap by omitting the item from the parent scale total. Item-total correlations above 0.40 were considered to indicate acceptable internal consistency. 30
Test–retest reliability
Intraclass correlation coefficients for agreement and paired t-tests were used to measure test–retest reliability. Intraclass correlation coefficients above 0.70 were considered to indicate acceptable reliability for group comparisons, and values above 0.90 for individual measurements over time.28,31 Significant differences in mean scores (p < 0.05) were considered to indicate poor reliability. No significant differences in mean scores were considered to indicate acceptable reliability.
Interpretability
Data completeness, mean scores and standard deviations, floor and ceiling effects, and skewness of score distribution were used to describe the distribution of item responses. 17 Floor or ceiling effects were considered present if more than 15% of respondents scored the minimum value of 0 or the maximum value of 100, respectively. 31
Ethical approval
This study was approved by the Regional Committee for Medical and Health Research Ethics, division south-eastern Norway (trial registration number: 2011/2213/Regional Committee for Medical and Health Research Ethics, division south-eastern Norway B).
Results
Initially, 150 sets of questionnaires were sent to a random sample of members of the Norwegian Endometriosis Association. Of these, 60 questionnaires were successfully completed and returned. Based on this preliminary response rate, an additional 225 sets of questionnaires were sent to a second random sample of members of the Norwegian Endometriosis Association not contacted in the first round. In total, 162 of 375 questionnaires were successfully completed and returned. Five of these were from women with endometriosis who reported that their diagnosis had not been confirmed surgically. These were excluded. Among the 157 included respondents, 94 completed and returned a second questionnaire at a later date. Of these, 10 reported change in treatment or starting new treatment since completing the first questionnaire. Excluding these, test–retest reliability of the NO-EHP-30 could be assessed in 84 of the respondents. The median number of days between answering the first and second questionnaire was 34 (range 7–168). Of the 84 respondents, 61 reported either having menstruation when answering both questionnaires or not having menstruation when answering both questionnaires. Of the 84 respondents, 15 reported having menstruation when answering one questionnaire, and not having menstruation when answering the other. The characteristics of the participants are presented in Table 1.
Basic characteristics of the participants (n = 157).
BMI: body mass index; SD: standard deviation.
Construct validity
Structural validity
Factor analysis of the 30 items of the core questionnaire suggested three factors, explaining 70.2% of the total variance. The three-factor model resulted in 20 items loading on the hypothesized scales and 10 items loading on alternative scales (Table 2). Factor analysis of the 23 items of the modular questionnaire suggested five factors, explaining 100% of the total variance. The five-factor model resulted in 15 items loading on the hypothesized scales and 8 items loading on alternative scales (Table 3).
Factor analysis of the 30 items of the EHP-30 core questionnaire suggesting a three-factor model.
EHP-30: Endometriosis Health Profile-30.
Principal components analysis with varimax rotation. Only factor loadings ⩾0.40 are shown.
In the original EHP-30, items 1–11 belong to the scale “pain,” items 12–17 to the scale “control & powerlessness,” items 18–23 to the scale “emotional well-being,” items 24–27 to the scale “social support,” and items 28–30 to the scale “self-image.”
Factor analysis of the 23 items of the EHP-30 modular questionnaire suggesting a five-factor model.
EHP-30: Endometriosis Health Profile-30.
Principal components analysis with varimax rotation. Only factor loadings ⩾0.40 are shown.
In the original EHP-30, items A1-5 belong to the scale “work life,” items B1-2 to the scale “relationship with children,” items C1-5 to the scale “sexual intercourse,” items D1-4 to the scale “medical profession,” items E1-3 to the scale “treatment,” and items F1-4 to the scale “infertility.”
Hypotheses-testing
Correlations between scales of the EHP-30 and the SF-36v2 ranged from −0.63 to −0.81 (Table 4). The correlations are negative because the EHP-30 and the SF-36v2 are scored in opposite directions. All hypotheses were confirmed.
Convergent validity. Correlations between some EHP-30 scales and relevant SF-36v2 scales.
EHP-30: Endometriosis Health Profile-30; SF-36v2: Short Form-36 version 2.
Reliability
Internal consistency
Cronbach’s alpha ranged from 0.87 to 0.96 for the original scales of the core questionnaire and from 0.78 to 0.94 for the original scales of the modular questionnaire (supplementary material 2). The corrected item-total correlation coefficients ranged from 0.45 (item 23) to 0.91 for the original scales of the core questionnaire and from 0.55 to 0.89 for the original scales of the modular questionnaire.
Test–retest reliability
Intraclass correlation coefficient for test–retest agreement ranged from 0.80 to 0.85 for the scales of the core questionnaire, and from 0.67 to 0.91 for the scales of the modular questionnaire (Table 5). The mean scale scores did not differ significantly between the first and second measurements. Test–retest reliability analysis including only the 61 respondents reporting either having or not having menstruation when answering both questionnaires, did not alter the general findings (data not shown).
Test–retest reliability and intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) for test–retest agreement. Comparison of mean scale scores at time 1 and time 2 with p-values.
ICC: intraclass correlation coefficient; EHP-30: Endometriosis Health Profile-30; CI: confidence interval.
Each ICC was significantly different from zero (p < 0.001).
Paired samples t-test, significance two-tailed.
Interpretability
The results are presented in Table 6. Data completeness of at least 97.5% was achieved for all EHP-30 scales. The proportion of participants to whom each scale of the modular questionnaire was relevant, varied from 39.4% (the scale infertility) to 87.2% (the scale sexual intercourse). Floor effect was only found for the scale self-image (20.1%) in the core questionnaire, and for the scales work life (33.9%), relationship with children (34.2%), and medical profession (20.5%) in the modular questionnaire. No ceiling effects were observed. Skewness was low for all the scales.
Interpretability. Data completeness, mean scores and standard deviations (SD), floor and ceiling effects, and skewness of score distribution.
N/R: not relevant; N/A: not applicable; EHP-30: Endometriosis Health Profile-30.
Number of participants for whom the scale was not relevant (only applicable for the modular questionnaire).
Discussion
Factor analysis suggested a three-factor model for the EHP-30 core questionnaire, in contrast to the original five-factor model. Items of the scales pain and control & powerlessness loaded on the same factor. A similar finding was demonstrated in the original, Portuguese, and French version EHP-30.9,14,15 As argued by the developers, it is likely that pain has considerable impact on sense of control and powerlessness. In this study, assessment of convergent validity demonstrated strong correlations between each of the EHP-30 scales pain and control & powerlessness and the SF-36v2 scale bodily pain, supporting this interpretation. Strong correlations were also demonstrated between the EHP-30 scales emotional well-being and social support and the corresponding SF-36v2 scales mental health and social functioning. Thus, the findings in this study support construct validity of four of five scales (pain, control & powerlessness, emotional well-being, and social support) of the core questionnaire.
The fifth scale of the core questionnaire, self-image, consists of three items. The first two items concern the effect of endometriosis on choice of clothing and appearance, and the last item concerns the effect of endometriosis on self-confidence. In factor analysis, the first two items loaded on the scale social support, and the last item loaded on the scale emotional well-being. Thus, the construct self-image does not seem to be measured appropriately by the NO-EHP-30. The lack of association between appearance and self-confidence is likely not exclusive to the Norwegian culture. Subtle differences in exploratory factor analysis technique, that is, performed with or without predefinition of five factors for the core questionnaire, may have masked a similar finding in other translated versions.14,25
Factor analysis suggested a five-factor model for the EHP-30 modular questionnaire, in contrast to the original six-factor model. Factor analysis of the modular questionnaire has been performed for the original and French version EHP-30.9,15 In this study, items of the scales work life and relationship with children loaded on the same factor. A similar finding was demonstrated in the original version, but not in the French version.9,15 These discrepancies may be due to difference in daily patterns of work life and child care in these three countries. In this study, factor analysis could not support construct validity of the scale treatment. The three items of the scale treatment loaded on three separate factors. A tendency of the first item of the scale treatment to load on a different factor than the two latter items has been demonstrated by factor analysis with larger samples in both the original and French version EHP-30.9,15
The NO-EHP-30 demonstrated acceptable test–retest reliability except for the scale relationship with children of the modular questionnaire, which demonstrated an intraclass correlation coefficient of 0.67. Although the time interval between answering the first and second questionnaire likely was long enough to minimize memory effects, it may have allowed changes in the status of the subject. 32 Exclusion of questionnaires from respondents reporting change in treatment or starting new treatment between assessments, probably reduced this effect. Phase of menstruation did not seem to affect the outcome. The scale relationship with children consists of two items. The second item concerns the ability to play with child/children and implies children of younger age. In the case of children of younger age, the score of this scale may depend not only on the health status of the respondent but also on the health status of the child/children. Thus, this particular scale may be less reliable.
This study is the first to evaluate both test–retest reliability and validity of the core questionnaire of the EHP-30 including adequate sample sizes.18,33 Regarding the modular questionnaire, the varying relevance of scales to participants has likely rendered some sample sizes inadequate. To ensure adequate sample size for the least relevant modular questionnaire scale, the general sample size should have been three times larger. On the other hand, these variations in relevance of the scales of the modular questionnaire, would limit the use of the modular questionnaire in most research settings. Another weakness of this study is the lack of representativeness of the endometriosis patient group. Participants were recruited from a patient association. Thus, participants with severe forms of endometriosis are likely overrepresented. 34 Recruiting a representative sample of women with endometriosis is a challenge in almost all research settings. Most, if not all, of the EHP-30 validation studies have recruited participants from patient associations and/or from secondary or tertiary referral centers.10–15 Thus, participants with severe forms of endometriosis are likely overrepresented in all studies, although in varying degree. Moreover, patients attending secondary and tertiary referral centers are more likely to be in active disease and treatment settings, making test–retest reliability analysis difficult. Endometriosis registries would have been a preferable recruitment source to endometriosis associations. However, no endometriosis registry is established in Norway. Furthermore, the responsiveness of the NO-EHP-30 was not evaluated.
The construct self-image does not seem to be measured appropriately by the NO-EHP-30, suggesting a lack of cross-cultural validity of the EHP-30. With multinational and multicultural studies increasing, this study underlines the importance of adequate translation, cross-cultural adaptation, and cross-cultural validation of PROMs.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present study was funded by University of Oslo.
