Abstract
Background
The Epistaxis Severity Score (ESS) is the gold-standard patient-reported outcome measure for evaluating nosebleed severity in patients with hereditary hemorrhagic telangiectasia (HHT). To date, the ESS has been assessed only for content validity and concurrent validity.
Objective
We evaluate the internal consistency and test–retest reliability of the ESS.
Materials and Methods
After receiving institutional review board approval, we sent an online survey battery, including the ESS survey, to 305 (39% male) English-speaking HHT patients ≥18 years old at a single center. Of those, 140 (46%) patients completed the battery, and 110/140 (79%) reported epistaxis. Cronbach's alpha and correlation analyses were used to evaluate internal consistency. For the test–retest reliability evaluation, we recruited 69 HHT patients during HHT clinic to complete 2 self-administered ESS surveys 2 weeks apart. Participants also completed a modified Clinical Global Impression-Improvement scale with readministration of the ESS survey. We calculated the intraclass correlation coefficient in a 2-way mixed model with absolute agreement.
Results
The ESS survey demonstrated low internal consistency (Cronbach's alpha = 0.495), suggesting that it measured multiple unrelated concepts. Factor analysis revealed 3 latent factors with moderate intercorrelation, suggesting the presence of 3 related but distinct constructs underlying the ESS. However, the ESS demonstrated excellent test–retest reliability (intraclass correlation coefficient = 0.955; 95% CI, 0.91-0.98).
Conclusion
Although the ESS demonstrates high test–retest reliability, it may not adequately assess different dimensions of nosebleed severity. Additional correlated survey questions and sub-scores may be needed to increase internal consistency to accurately measure each component of epistaxis severity. It is necessary to acknowledge epistaxis severity from different dimensions and to consider evaluating individual ESS items separately for a comprehensive understanding.
Keywords
Introduction
Hereditary hemorrhagic telangiectasia (HHT) is a heritable disease of aberrant angiogenesis that results in mucocutaneous telangiectasias involving the face, mouth, and nasal mucosa and visceral arteriovenous malformations involving the lungs, gastrointestinal tract, liver, and brain. Recurrent epistaxis is the most common and earliest symptom of HHT.1,2 Up to 96% of HHT patients report epistaxis, and more than 75% experience recurrent epistaxis before the age of 18 years. 3 The degree of epistaxis ranges from mild to severe, with severe instances resulting in significant anemia and a need for regular blood transfusions. Moreover, severe epistaxis produces significant impairments in olfactory function and poor quality of life (QOL). Hence, controlling epistaxis in patients with HHT is essential for lowering illness burden and enhancing QOL.3,4
Patient-reported outcome measures (PROMs) are used to capture valid and reliable data about patient health, symptoms, experiences, and QOL from the patient's perspective, without the need for interpretation by a healthcare provider. 5 PROMs thus enhance clinicians’ ability to assess patients’ well-being and provide efficient and optimal care.6,7 In recent years, there has been an increase in the use of quality PROM instruments in clinical research, but their use in clinical practice has been limited by variability in instrument quality, a lack of validation, and the inability to compare results across measures. Additional work is needed to increase the use and reporting of PROMs in clinical trials. 7
Three primary QOL instruments for HHT-related epistaxis are the Epistaxis Severity Score (ESS) online tool, the Epistaxis Questionnaire Quality of Life, 3 and the Nasal Outcome Score for Epistaxis in Hereditary Hemorrhagic Telangiectasia (NOSE-HHT) questionnaire.4,8 These instruments have undergone different degrees of psychometric validity evaluation, including assessments for content validity, concurrent validity, discriminant validity, responsiveness, minimal clinical importance difference (MCID), 4 test–retest reliability,9,10 and internal consistency (IC).3,11
The ESS is a partially validated survey that assesses individual disease severity and treatment efficacy for HHT-related epistaxis. 4 Currently, the ESS is the gold standard for assessing the severity of nosebleeds in patients with HHT, and it is cited in almost all literature exploring epistaxis severity.3,4,8 It uses 6 measures to assess the severity of epistaxis within the last 3 months: frequency, duration, and intensity of epistaxis episodes, need for medical attention, anemia, and dependence on blood transfusions. 8 Research suggests that the ESS may serve as a good outcome measure of the physical components of epistaxis and its medical therapeutic sequelae. 3 Moreover, an increase in ESS score has been found to be significantly correlated with a decline in health-related QOL (HR-QOL), as measured by the Medical Outcomes Study 36-Item Short Form. Specifically, patients with severe epistaxis had lower scores for the physical and mental component summaries of HR-QOL compared to those with mild epistaxis.4,12 To date, the ESS has been assessed for content validity, concurrent validity, and MCID.3,4 However, several other properties of psychometric validity still need to be explored. In this study, we evaluate the IC and test–retest reliability of the ESS.
Materials and Methods
This study was approved by our institutional review board, and all patients provided informed consent.
IC
Patient Population
We distributed an online-based survey battery, which included demographic questions and the ESS survey, via email to 305 English-speaking patients with a definitive diagnosis of HHT through Curaçao criteria or genetic testing, all over the age of 18 years, at an HHT center of excellence. Survey responses were collected between January 2022 and March 2022. Those who did not respond within 4 weeks of the initial invitation received a follow-up phone call. A total of 140 patients completed the self-administered survey battery.
Survey Description
Data collected included age, gender, genetic testing results, location of visceral arteriovenous malformations, past treatment, family history, and other disease-related history. Past treatments included medical treatments (nasal packing, topical ointments, and oral medications), minimally invasive treatments (coagulation therapy, laser cauterization therapy, arterial embolization, and arterial ligation), and surgical treatments. Descriptive analyses were performed with calculations of means and standard deviations for continuous variables and proportions for categorical variables.
Statistical Analysis
Responses were assessed for their IC—that is, the extent to which items in the instrument were intercorrelated and measured the same underlying construct—which is an abstract concept that captures specific aspects of patient experiences or outcomes that are not easily quantified. Cronbach's alpha was used to evaluate IC. 13 This indicator is reported on a 0 to 1 scale with values as follows: >0.9 is excellent, >0.8 is good, >0.7 is acceptable, >0.6 is questionable, >0.5 is poor, and <0.5 is unacceptable. 14 Principal component analysis was used to simplify the dimensionality of the test into fewer components while representing the majority of the variance of the data. Exploratory and comparative factor analyses were used to examine the degree to which each instrument item measured latent factors. A factor loading of >0.40 suggests a moderate correlation between the item and the underlying factor. 15 Statistical analyses were performed using the computing software program R (R Foundation for Statistical Computing, Vienna, Austria).
Test–Retest Reliability
Patient Population
For the test–retest reliability evaluation, we recruited 69 consecutive English-speaking patients with a definitive diagnosis of HHT through Curaçao criteria or genetic testing, all over the age of 18 years, at an HHT center of excellence from April through October 2022.
Survey Description
Participants completed 2 self-administered ESS surveys; the modified Clinical Global Impression-Improvement (CGI-I) scale was administered with the second ESS survey. The initial survey was self-administered by participants in clinic via an online link. The follow-up ESS survey and CGI-I were sent together via email to participants 2 weeks after completion of the initial survey. Participants were asked to complete the follow-up survey within 7 days.
Statistical Analysis
Test–retest reliability (reproducibility) is the ability of an instrument to show consistent and stable results when repeated by the same participant with the same severity of symptoms over time. Patients with stable symptoms per the CGI-I scale who completed both the initial and the follow-up ESS were included in the analysis. The intraclass correlation coefficient (ICC) in a 2-way mixed model with absolute agreement was used to determine reproducibility. Based on a minimum acceptable ICC of 0.75, a significance level of .05, a power of 0.8, and a conservative 30% dropout rate, at least 15 participants were required for sufficient power. 16 Statistical analyses were performed using computing software program R.
Results
Patient Characteristics
For the assessment of IC, 140/305 (46%) patients participated in the survey battery. Of these, 120 (89%) reported epistaxis, and 109 (78%) completed the ESS survey. The patients were 60% female and had a mean age of 54 ± 16 years. Patients on average reported an ESS of 4.4, indicating moderate severity (Tables 1 and 2). For the test–retest reliability analysis, 69 patients completed the initial ESS survey, and 50 patients completed the follow-up survey. Thirty patients reported no change in ESS severity on their follow-up survey compared with their initial survey and were included in the analysis (Figure 1). Among the 30 patients, the mean initial ESS was 3.0, and the mean follow-up ESS was 2.9 (Table 3).

Participant recruitment and assessment for test–retest reliability analysis.
Patient Characteristics for Internal Consistency Analysis.
Expressed as mean ± standard deviation.
Of 129 patients reporting visceral arteriovenous malformations, 9 patients reported 2 locations, 1 patient reported 3 locations, and 1 patient reported 4 locations.
Patient Epistaxis Severity Scores for Internal Consistency Analysis.
Expressed as mean ± standard deviation.
Patient Epistaxis Severity Scores for Test–Retest Reliability Analysis.
Expressed as mean ± standard deviation.
IC
The ESS demonstrated a Cronbach's alpha of 0.495, suggesting poor IC (Figure 2). Correlation analysis indicated a nonsignificant relationship between each item of the questionnaire and all other items (r < 0.4). The weakest correlation was found between red blood cell transfusion and nosebleed intensity (r = 0.01) (95% CI −0.16 to 0.21) (Figure 3). For the principal component analysis, the first 3 components accounted for 76% of the total variance of the ESS (Table 4). Three latent factors (occurrence and healthcare utilization, nosebleed profile, and hematologic impact) were identified using a scree plot of the eigenvalues and parallel analysis, suggesting that the ESS is measuring at least 3 unique aspects of epistaxis severity. The exploratory factor analysis indicated moderate correlation between each item and the underlying factors (Figure 4).

Cronbach's alpha results for Epistaxis Severity Score of 140 patients with hereditary hemorrhagic telangiectasia.

Correlation matrix of the Epistaxis Severity Score survey items. The higher the value, the more positively correlated the 2 variables are, and the closer the value is to −1, the more negatively correlated they are. No 2 items have a strong association, and the weakest association was found between red blood cell (RBC) transfusion and nosebleed intensity (0.011).

Factor analysis results. Factor analysis identified 3 latent factors that accounted for the covariance among the observed ESS questions. Factor loading values for each item are indicated on each of the straight arrows. Results of the comparative factor analysis are indicated on the curved arrows.
Principal Component Analysis.
Test–Retest Reliability
ICC analysis was performed on the ESS scores at baseline and week 2 among the 30 participants reporting no change on the CGI-I scale. The ICC was 0.955 (95% CI, 0.91-0.98), indicating high test–retest reliability (Figure 5).

Test–retest reliability results. ICC = 0.96 (95% CI, 0.91-0.98).
Discussion
The ESS is a partially validated, standardized tool that is the most frequently used metric for evaluating patient-reported epistaxis severity. 8 The ESS has also been used to assess the severity of disease before and after interventions to measure response to treatment.3,4,8,11,12,17,18 The objective of this study was to assess the IC, intercorrelation, factor analysis, and test–retest reliability of the ESS in HHT patients. To our knowledge, this is the first report evaluating these psychometric properties of the ESS for epistaxis in patients with HHT.
IC is a measure of whether responses to different questions agree, reflecting the extent to which items within an instrument measure the same construct, the role of each item, and the relationship between them. The Cronbach's alpha value is primarily determined by the validity and ambiguity of the individual items. Our study found that the ESS demonstrated an unacceptable alpha coefficient of <0.5, indicating that the ESS likely measures multiple concepts instead of a single, unified construct. Intercorrelation analysis of the ESS items demonstrated weak correlations between all items (r < 0.4), suggesting that the items likely represent multiple constructs.
These findings suggest that there are multiple distinct factors that define epistaxis severity that cannot accurately be summarized by a single score, and that clinicians should consider evaluating each ESS item separately to gain a more comprehensive understanding of epistaxis severity in HHT patients. On further examination, factor analysis revealed that the ESS is composed of at least 3 underlying factors of epistaxis severity, reflecting the complexity of the condition. Moreover, our comparative factor analysis showed moderate intercorrelations between the factors. Moderate intercorrelation between factors suggests that while the factors are related, they represent distinct constructs.
The results of our study highlight the importance of assessing the severity of epistaxis in multiple dimensions. Additional items and sub-scores in the ESS would better elucidate the 3 related constructs identified, but any further development of the ESS to include additional sub-scores would require careful evaluation of the content and construct validity of the instrument, ensuring that it effectively captures the true experiences of individuals with HHT and accurately measures the underlying constructs being measured.15,19
Our study adds to the body of literature evaluating the validity of the ESS. Although the ESS survey proved to have low IC, it demonstrated high test–retest reliability, with an ICC of >0.9.3,9,19 Thus, the ESS is able to produce a stable score in patients with stable symptoms. Considering the intricate nature of epistaxis in HHT and its variation over time, the length of the follow-up intervals significantly impacts research outcomes.6,20 Especially for therapeutic evaluations (for instance, after laser therapy), patients often experience initial heightened nosebleeds during the initial weeks, followed by notable reduction over subsequent months. This consideration is particularly important when evaluating treatment effectiveness and taking into account the rate of growth for mucosal arteriovenous malformations.11,21 Given these dynamics, a 3-month follow-up interval aligns with common practices in HHT-focused studies compared with the 2-week interval used by the NOSE score.3,8,12
A drawback of the ESS is the consideration of blood transfusions. The prevalence of anemia resulting from iron deficiency can occasionally be unrelated to epistaxis and rather be associated with gastrointestinal bleeding.1,20 Anemia is frequently linked to symptoms such as weakness, fatigue, reduced exercise capacity, headaches, irritability, and diminished QOL. However, the extent of anemia's impact in the context of HHT remains uncertain.12,22–24
In contrast, the NOSE-HHT assesses the QOL of individuals with HHT. Although its primary focus is on epistaxis-related symptoms, it also reflects broader aspects (physical problems, functional limitations, and emotional consequences) that can be influenced by other HHT-related clinical manifestations, such as liver, gastrointestinal, or pulmonary arteriovenous malformations.3,18
Using different coefficients for each item before calculating the final score of the ESS can have implications for the IC measure. IC is a measure based on the interrelation and degree of bivariate correlations between different items on a test (or a sub-scale of a composite test) and aims to assess whether these items measure the same underlying construct.10,25,26 As the correlations between these items often exhibit varying magnitudes, using the average inter-item correlation is a straightforward method to capture the correlation degree among different items on an instrument.25,26
Additionally, Cronbach's alpha requires certain conditions to be satisfied for accurate estimates, including the following: (a) no range limitations of interval level data among item scores 27 ; (b) linearity and homogeneity of errors; (c) minimal measurement error and correction for variance and covariance attenuation; (d) consistent item distributions; (e) unidimensionality; (f) absence of systematic errors; (g) independent item content, (h) equal factor loadings, even though congeneric measures with different relationships between items and latent variables might be common; and (i) parallel equivalence.25,26,28–30
Using different coefficients for each item suggests that the items may contribute unequally to the overall factor saturation. If some items have significantly higher coefficients than others (need of blood transfusion 0.31 vs frequency of nosebleeds 0.14), they might have a greater influence on the final score.
While the ESS has strong content validity, concurrent validity, and test–retest reliability, however, its low IC and poor inter-item correlation may render it a less precise clinical tool.19,30 Further, while the ESS is most frequently used as a self-administered tool, patients cannot accurately report their current anemia status, as this question is only answerable with clinical knowledge and laboratory data.8,19,24
Based on a review of the literature, we summarized for comparison the aspects of validation that have been assessed for the NOSE-HHT and ESS. Peterson et al 3 assessed key aspects of content validity, psychometric validity, responsiveness, and a known MCID (Table 5). For the ESS, content validity, convergent validity, test–retest reliability, and MCID have been assessed; however, discriminant validity and responsiveness have not. Furthermore, our study findings showed poor IC for the ESS.
Comparison of ESS and NOSE-HHT Validity Measures Assessed in the Literature.
Abbreviations: ESS, Epistaxis Severity Score; HHT, hereditary hemorrhagic telangiectasia; ICC, intraclass correlation coefficient; MCID, minimal clinically important difference; NOSE-HHT, Nasal Outcome Score for Epistaxis in Hereditary Hemorrhagic Telangiectasia; QOL, quality of life; SD, standard deviation.
Based on discussion by Hoag et al 8 with HHT patients and physicians specializing in HHT care and review of existing literature. Free-response questions from 915 survey responses were also analyzed for content validity.
Based on consultation by Peterson et al 3 with HHT patients and physicians specializing in HHT care and review of existing literature until a point where no new epistaxis illness experience themes were identified.
According to Merlo C. et al, 12 a Pearson's correlation test comparing ESS and SF-36 physical component score (PCS) and mental component score (MCS) demonstrated moderate and weak correlation, respectively. Thus, there is moderate convergent validity between the ESS and SF-36 PCS but weak convergent validity between the ESS and SF-36 MCS, in terms of their relationship with epistaxis severity.
From study by Peterson et al, 3 analysis-of-variance comparison of the NOSE-HHT with the ESS, SF-36, and modified Clinical Global Impression-Severity (CGI-S) scores at baseline. In this study, CGI-S measured patients who reported a baseline global severity rating of epistaxis. Large effect size (η2 = 0.39) between CGI-S and NOSE-HHT scores suggests a relationship between NOSE-HHT and the severity of epistaxis. Pearson correlations (0.61 to 0.70) indicate a significant relationship between NOSE-HHT and ESS sub-scores, and between NOSE-HHT and SF-36 (−0.49 to −0.69).
These aspects of psychometric validity have not been addressed.
In study by Peterson et al, 3 average NOSE-HHT scores were determined within each CGI-I response category and the mean (± SD) change in total NOSE-HHT.
Using the distribution-based method, an MCID of 0.46 is equivalent to 0.56 SDs of the baseline NOSE-HHT scores (mean ± SD, 1.55 ± 0.81), which is consistent with a moderate-to-large change. 3
Length of the instrument and outcome measures.
Described in discussion by Hoag et al, 8 the ESS contains 6 questions with different coefficients, evaluating the physical components of epistaxis and medical therapeutic sequelae. Questions 1 and 2 are 5-point Likert scales, Questions 3 through 6 have dichotomous answers. The score of the response is multiplied by the respective coefficient, and the sum of these gives the raw Epistaxis Severity Score.
From study by Peterson et al, 3 the NOSE-HHT is composed of 29 items 5-point Likert scales type, each ranging from 0 to 4 and a total score ranging continuously from 0 to 4 after dividing by the total number of items answered. It might take longer to complete, posing challenges mostly for older individuals.
Outcome measure and components evaluated in each instrument.
Based on discussion by Hoag et al,4,8 the ESS evaluates frequency, duration, intensity of epistaxis, need for medical attention, diagnosis of anemia, and need for transfusion.
According to Peterson et al, 3 the NOSE-HHT assesses 6 physical problems, 14 functional limitations, and 9 emotional consequences.
Limitations
This study has a few limitations. The patient population for the test–retest reliability analysis was derived from a single academic institution, limiting generalizability. The analytical population for IC was minorly limited by incomplete survey battery responses among participants, with 76% of patients completing the entire survey battery. Our questionnaire was also limited to English-speaking patients with internet access.
Conclusion
Although the ESS is the current “gold standard” for measuring nosebleed severity in HHT, several aspects of validity have not been assessed for the ESS. Furthermore, its low IC and inter-item correlation make it a less than ideal instrument for assessing nosebleed severity. It is necessary to acknowledge epistaxis severity from different dimensions and consider evaluating individual ESS items separately for a comprehensive understanding.
Footnotes
Acknowledgments
For editorial assistance, we thank Denise Di Salvo, MS, Sandra Crump, MPH, and Rachel Box, MS, in the Editorial Services group of The Johns Hopkins Department of Orthopaedic Surgery.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: C.R.W. and A.G. have received a grant from Cure HHT.
IRB Statement
This study was approved by the Institutional Review Board and informed consent was obtained by every patient.
