Sage Journals: Discover world-class research

Abstract

The Abbreviated Profile of Hearing Aid Benefit (APHAB) has been one of the most frequently used patient-reported outcome measures (PROMs) since its inception 30 years ago. For the APHAB, single-valued 95% critical differences have been presented for the identification and interpretation of meaningful benefits in research and in the clinic. A narrative literature review of studies that used the global APHAB score as a hearing-aid outcome measure showed that the average benefit varied directly with the average unaided baseline score for each measure. Next, data from 584 older adults enrolled in our recently completed randomized controlled hearing-aid trial were examined. The same dependence of benefit scores on unaided baseline scores was observed in these data. Regression to the mean made relatively minor contributions to the observed dependence of APHAB scores on baseline unaided scores. These results indicate that the application of a single value for the 95% critical difference is not valid for the interpretation of APHAB scores. Rather, baseline-specific benefit criteria are needed. Based on these results, baseline-specific Minimal Detectable Differences (MDDs; or 95% critical differences) and Minimal Clinically Important Differences (MCIDs) using both distribution-based and anchor-based approaches were generated for the APHAB-global score.

Keywords

hearing-aid outcomes patient-reported outcome measures minimal detectable differences minimal clinically important differences

Introduction

Hearing aids represent the primary intervention for millions of adults with audiometric hearing loss of sensorineural origin ranging in severity from mild to severe. Many patient-reported outcome measures (PROMs) have been developed for the evaluation of hearing aid benefits over the past several decades. Several systematic reviews have consistently identified the abbreviated profile of hearing aid benefit (APHAB; Cox & Alexander, 1995) as among the most widely used hearing-aid PROMs (Granberg et al., 2014; Perez & Edmonds, 2012). In Germany, the APHAB appears to be the most widely used self-report measure due to its use in supporting insurance coverage of hearing aids (Löhler et al., 2017b).

The 24-item APHAB was derived from its longer parent PROM, the 66-item Profile of Hearing Aid Performance (PHAP; Cox et al., 1991; Cox & Gilmore, 1990; Cox & Rivera, 1992). Both PHAP-based measures use a seven-item response scale that asks the frequency with which the respondent experiences various hearing difficulties, with responses ranging from “never (1%)” to “always (99%),” higher scores reflecting more frequently experienced difficulties. Several items, however, are reversed to minimize response bias (Cox & Alexander, 1995; Cox & Gilmore, 1990). Unaided and aided PHAP and Abbreviated PHAP (APHAP) scores are obtained with the aided scores subtracted from the unaided baseline scores to derive the Profile of Hearing Aid Benefit (PHAB) score or the APHAB score for the abbreviated version. In other words, PHAB = PHAP_u-PHAP_a and APHAB = APHAP_u-APHAP_a, where the subscripts “u” and “a” denote “unaided” and “aided” scores, respectively. All PHAP and APHAP scores have been expressed as percentages, tied to the percentages listed with each response. So too have the benefit scores, PHAB and APHAB, been expressed as percentages, the simple difference in frequency-of-occurrence percentages between unaided baseline and aided scores.

Factor analyses of PHAB scores led to the development of seven PHAB subscales, five pertaining to communication difficulties in a variety of listening conditions (ease of communication, EC; familiar talkers, FT; background noise, BN; reverberation, RV; and reduced cues, RC) and two pertaining to the distortion (DS) and aversiveness (AV) of environmental sounds (Cox & Gilmore, 1990). The APHAB retained four of the original seven PHAB subscales, but each scale was limited to six items, three scales pertaining to communication difficulties (EC, BN, and RV), and one focused on the aversiveness of environmental sounds (AV). Several subsequent studies found the communication subscales of the PHAB or the APHAB to be strongly correlated and these subscales were typically reduced to a single score, the PHAB-global or the APHAB-global score (Chisolm et al., 2005; Cox et al., 2007; De Sousa et al., 2023, 2024; Dornhoffer et al., 2020; Humes et al., 2003; Humes et al., 2017; Knoetze et al., 2024; Kochkin, 1997; Sabin et al., 2020). Each global score is the mean for the communication-related items only: 18 of 24 items for the APHAB-global and 47 of 66 items for the PHAB-global. Although the longer PHAB has been used frequently in clinical research, including several clinical trials comparing technologies (e.g., Haskell et al., 2002; Larson et al., 2000; Walden et al., 1998, 1999, 2000) and in recent randomized controlled trials evaluating fitting methods (e.g., Humes et al., 2017; Humes et al., 2025), the shorter APHAB PROM is widely used clinically and has been used in many more studies of hearing-aid outcomes. Hence, the rest of this article focuses primarily on the APHAP/B measures.

In recent randomized controlled trials (RCTs) using hearing-aid PROMs conducted by the authors (Humes et al., 2025; Sabin et al., 2020), we have attempted to identify individuals with substantial improvements using the APHAB-global score. Various definitions of “substantial improvement” have been advocated for use in RCTs (e.g., Klukowska et al., 2024; Norman et al., 2003; Sedaghat, 2019) with most centered on the concept of the minimal clinically important difference (MCID). The MCID is more than a “statistically significant” difference from baseline to post-intervention. It identifies the minimal difference that will likely be important to the individual and affect the individual's function (Jaeschke et al., 1989). Klukowska et al. (2024) identified and reviewed thirteen different methods and Franceschini et al. (2023) seventeen methods used in other fields to generate MCIDs for PROMs. Most approaches to calculation can be categorized into one of two types, distribution-based methods and anchor-based methods (Franceschini et al., 2023; Klukowska et al., 2024; Norman et al., 2003; Sedaghat, 2019). In anchor-based approaches, the anchor is an independent assessment of meaningful change that can be linked to the outcome measure of interest. It can be a simple and direct query about the meaningfulness of the intervention's impact or can be derived from other measures of “success,” such as intervention adherence or daily usage. Both approaches have been used frequently in clinical otolaryngology research (Tripathi et al., 2024).

In some cases, anchor-based methods have asked those completing an intervention to directly rate the impact of the intervention on their everyday function or life, typically using a single query with five- to seven-item responses (e.g., a 5-point Likert scale with responses ranging from “much worse” to “much better”). The PROM of interest is then evaluated for the 5-to-7 subgroups formed from their responses to this query to establish a clinically meaningful difference (Jaeschke et al., 1989; Sedaghat, 2019). Another approach is to use Receiver Operating Characteristic (ROC) curves to assess the sensitivity and specificity of the PROM in identifying those with successful interventions (Franceschini et al., 2023; Klukowska et al., 2024; Sedaghat, 2019). For hearing-aid PROMs, “success” has often been defined based on retention and daily use of the purchased hearing aids (Cox & Alexander, 1995; Cox & Rivera, 1992; Hickson et al., 2010; Humes, 2021; Humes et al., 2003; Humes & Humes, 2004).

When an anchor item has not been obtained in an RCT and clinical guidance has not previously established the clinical importance of differences for a given PROM, researchers have typically resorted to distribution-based methods to establish the MCID. The two most common distribution-based methods for estimating the MCID are based on the standard deviation of the baseline PROM scores (SD_b) from a representative sample of individuals. The most widely used formulas to estimate MCIDs have been MCID = 0.5 × SD_b and MCID = SD_b × (1–r_xx)^0.5, the latter being an estimate of the standard error of measurement (SEM) based on the test‒retest correlation (r_xx). The popularity of these two distribution-based methods of MCID calculation has been confirmed recently for research in otolaryngology (Tripathi et al., 2024). Another widely used distribution-based estimate of the MCID is more often referred to as the minimal detectable difference (MDD; Norman et al., 2003; Sedaghat, 2019), calculated as MDD = 1.96 × 2^0.5 × SEM. The latter has been commonly employed in audiology in the form of 95% critical differences (Cox & Alexander, 1995; Cox & Gilmore, 1990; Demorest & Walden, 1984; Weinstein et al., 1986), Demorest and Erdman (1988) describing an empirical approach to the estimation of MDDs or critical differences directly from the distributions of test‒retest differences. MDDs are the smallest differences that can be reliably detected using the PROM whereas MCIDs are the smallest detectable differences that have clinical importance to those receiving the intervention. MCIDs, therefore, are expected to be slightly to substantially larger than the corresponding MDDs for a given PROM.

For hearing-aid PROMs, MDDs or critical differences most often have been reported as single values, and this holds for the APHAB. For the APHAB, Cox and Alexander (1995) reported 95% critical differences of 26% for unaided and aided APHAP scores and 33% for the APHAB benefit score, all for individual subscales. That is, the aided frequency of speech-communication difficulties was 26 percentage points lower than the baseline unaided frequency of occurrence. For the APHAB-global, Chisolm et al. (2005) reported 95% critical-difference values of 17.8% and 15.9% for test‒retest intervals of 2 or 10 weeks, respectively. Applying the 95% critical difference of about 17% established for the APHAB-global by Chisolm et al. (2005), individuals with APHAB-global scores ≥17% would be considered to have clinically significant benefit regardless of the original unaided baseline APHAP-global score.

The use of a single critical difference, MDD, or MCID value for hearing-aid PROMs is counterintuitive. Consider, for example, two individuals, A and B, who have baseline APHAP-global scores of 17% and 68%. Based on the 95% critical differences for the APHAB-global of 17%, Person A would be required to report an aided frequency of communication difficulties of 0%, a complete elimination of communication difficulties, whereas a change in score of 17 percentage points for Person B would reflect about a 25% reduction from baseline difficulties. Would such widely differing reductions in communication difficulties relative to unaided baseline be considered of comparable clinical importance? Our recent experiences with the APHAB in RCTs suggested that the MCID, MDD, and 95% critical differences should be tied to baseline performance, each increasing as baseline function worsened (scores increased). Baseline-dependent benefit has been noted previously in other fields (e.g., Wang, Hart, Stratford & Mioduski, 2011) as well as for hearing-aid PROMs, including the APHAB-global score (e.g., Kochkin, 1997). Noting the dependence of APHAB-global benefit scores on unaided APHAP-global scores, Kochkin (1997) suggested that the percentage reduction from unaided to aided APHAP-global scores be used rather than the difference score (APHAB-global).

In this article, we build on this proposal of Kochkin (1997) to further explore the dependence of APHAB-global scores on the baseline unaided APHAP-global scores. The goal is to obtain a valid interpretation of the change in APHAP scores from unaided to aided conditions so that meaningful benefits from hearing aids can be better determined.

Interpretation of PHAP-Based PROMs

Despite the popularity and broad use of the APHAB, the interpretation of scores is challenging. The scores, as noted, estimate the frequency of difficulties of various types experienced by adults with hearing difficulties and are represented by percentage estimates ranging from 1% to 99%. This part of the scoring is straightforward, but how are the differences from unaided to aided, the derived-benefit measures, to be interpreted? What does a reduction in frequency of difficulties of 17 percentage points mean for individual hearing-aid wearers with different unaided baseline scores?

To assist in the interpretation of the benefit measures, Cox and colleagues identified groups of 55 to 117 successful hearing-aid users and established normative percentiles for APHAB scores among these groups of successful users (Cox & Alexander, 1995; Johnson et al., 2010). In these normative studies, “hearing aid success” was defined as the use of hearing aids for at least one year and self-reported usage of at least four hours per day. Only the most recent of these studies (Johnson et al., 2010) included percentiles for APHAB-global scores for unaided, aided, and benefit measures. Median APHAP/B-global scores were 70%, 33%, and 35% in Johnson et al. (2010; N = 117) for unaided, aided, and benefit measures, respectively. Clearly, the observed benefit for the median successful hearing-aid user in these normative guidelines greatly exceeds the reported 95% critical difference of 17%.The median APHAB-global score of 35% represents a halving of aided difficulties from the median unaided baseline global score of 70%. We argue here, however, that an APHAB-global benefit of 35% should only be the expected or targeted benefit for those successful hearing-aid users who have an unaided baseline APHAP-global score of 70%.

In all the normative studies of Cox and colleagues, scores for successful users from the 5th to the 95th percentiles were provided. This allowed each respondent's score to be compared to the distribution of scores among successful hearing-aid users to establish some sense of the likelihood of that respondent becoming a successful user. Given the relatively small sample sizes of each normative group, however, the robustness of these other estimated percentiles, perhaps even the medians, is questionable for broad application. Regardless of the specific percentile under consideration, the interpretation of the expected benefit for successful hearing-aid users is problematic when the individual compared to those normative values differs from the successful users in baseline characteristics, including in unaided APHAP-global score.

Rather than interpreting the APHAP/B-global scores relative to the percentiles from successful hearing-aid users, we propose directly interpreting the scores from the labels ascribed to the seven response alternatives used in the APHAP. The first two columns of Table 1 show the seven response categories used in the APHAP (Cox & Alexander, 1995). Using the assigned percentages, lower and upper limits to each response category were established by bisecting the differences in the assigned percentages between successive response categories. These limits are also provided in Table 1. To offer better resolution of responses, response scale steps of ½-category were then generated and appear in the four right-most columns of Table 1.

Table 1.

APHAP Response Categories and Assigned Percentage Scores with Lower (LL) and Upper (UL) Limits to Provide a Range of Percentage Scores for each Response Category. To Provide More Response Resolution, a Half-Step Response Scale was Created and this Response Scale Is Shown in the Right Four Columns.

APHAP response categories	LL (%)	UL (%)	APHAP response categories, half-step	% at midpoint	Lower limit (%)	Upper limit (%)
1 Never (1%)	1	6.5	1 Never	2.4	1	3.75
1 Never (1%)	1	6.5	1.5 Never/seldom	5.1	3.751	6.5
2 Seldom (12%)	6.501	18.5	2 Seldom	9.5	6.501	12.5
2 Seldom (12%)	6.501	18.5	2.5 Seldom/occasionally	15.5	12.501	18.5
3 Occasionally (25%)	18.501	37.5	3 Occasionally	23.3	18.501	28
3 Occasionally (25%)	18.501	37.5	3.5 Occasionally/half-the-time	32.8	28.001	37.5
4 Half-the-time (50%)	37.501	62.5	4 Half-the-time	43.8	37.501	50
4 Half-the-time (50%)	37.501	62.5	4.5 Half-the-time/generally	56.3	50.001	62.5
5 Generally (75%)	62.501	81.5	5 Generally	67.3	62.501	72
5 Generally (75%)	62.501	81.5	5.5 Generally/practically always	76.8	72.001	81.5
6 Practically always (87%)	81.501	93.5	6 Practically always	84.5	81.501	87.5
6 Practically always (87%)	81.501	93.5	6.5 Practically always/always	90.5	87.501	93.5
7 Always (99%)	93.501	99	7 Always	94.9	93.501	96.25
7 Always (99%)	93.501	99	7.5 Always+	97.6	96.251	99

Extreme APHAP-global scores rarely occur. For example, in the NU dataset from our recently completed RCT (Humes et al., 2025; N = 584), global scores falling in the range from “seldom/occasionally” (category 2.5) to “generally/practically always” (category 5.5) represented about 90% of the score categories. For the aided APHAP-global, 90% of the score categories fell between “never/seldom” (category 1.5) to “half the time” (category 4). The infrequent use of the extreme ends of the seven-point response scale basically collapses the scale from seven responses to about four or five. This provides further justification for the use of a ½-step scale to provide better resolution of responses over the range used by most respondents. Across unaided and aided conditions, most of the data available fall in the response range 2.5 to 5.5 for the ½-step response scale (Table 1).

For seven-item Likert scales, frequently used in PROMs, the most common criteria for MCID values have ranged from 0.5 to 1 response units on the seven-item scale with broad use of 0.5 × SD_b as the MCID (Norman et al., 2003). Such a criterion for change results in an MCID that corresponds to a Cohen's-d effect size of 0.5, a medium effect size. If the criterion for change was based on ½-unit changes on a seven-point ordinal response scale, this would result in 0.25 × SD_b as the SD-based change criterion, which corresponds to a Cohen's-d effect size of 0.25. Cohen's-d values from 0.2 to 0.5 have been frequently used in establishing MCIDs of PROMs (Klukowska et al., 2024), representing either small or medium effect sizes.

In the sections to follow, we present evidence from the literature and from our recently completed RCT on the dependence of hearing-aid benefit, captured via the APHAB, on baseline performance. Such dependence suggests that baseline-dependent MCIDs, MDDs, or 95% critical differences, may be needed to appropriately determine clinically meaningful benefits for individuals. These data are then used to develop baseline-specific MDDs and MCIDs for the APHAB-global PROM. Distribution-based estimates of the MCID and MDD, both based on SD_b, are presented first, followed by exploration of an anchor-based method tied to the successful use of hearing aids 6 months after the fitting.

Evidence for the Dependence of Benefit on Baseline

Analyses

We took two main approaches to examining the dependence of APHAB scores on baseline unaided APHAP scores: (a) a targeted narrative review of the literature; and (b) analyses of individual data from an RCT recently completed by the authors (Humes et al., 2025). For the first, we sought several studies from the peer-reviewed literature that reported unaided APHAP scores and at least an aided APHAP score or an APHAB score from adults. If the global scores were not provided, then scale scores for the three scales comprising the global scores (EC, RV, BN) had to be available (and were averaged to generate the global score). The goal of this review was to identify enough studies and participants to permit examination of any association between APHAB scores on the unaided baseline scores.

Twelve studies reported the requisite APHAP/B scores. Details of these 12 studies are given in Table 2. The total sample size across studies was 36,874, but this was largely due to the scores from 35,000 adults reported by Löhler et al. (2017b), the remaining 1,874 coming from the other 11 studies. The grand mean age (not weighted by sample size) was 72.6 years. Most of the participants were older adults with mild-to-moderate sloping sensorineural hearing loss and most were fitted bilaterally. The percentage of new hearing-aid users varied widely (0–100%) across studies. In addition, aided APHAP-global scores were obtained at a variety of post-fitting intervals, ranging from four weeks to several years. The mean unaided baseline scores ranged from about 36 to 76.

Table 2.

Summary of Studies Identified from the Narrative Review.

Study	N	Age (M) years	When outcome measured	% Male	Severity of sloping hearing loss	% New HA user	% Bilateral HA Fit	APHAP-global unaided (M)	SD	APHAB-global benefit (M)	SD
Cox and Alexander (1995)	128	68	6 wks to >10 yrs	70	MiMo	0	42	75.7^a		38.3^a
Kochkin (1997)	521	68	1–2 yrs	59			65	62.5	19.8	26.8	23.5
Newman and Sandridge (1998)	25	69.2	4 wks	52	MiS	0	56	53.3^a		35.25^a
McArdle et al., (2005)	187	69.4	10 wks	98	MiM	100	100	47.6	16.4	29.5
Cox et al. (2007)	205	74	6 mos	77	MiS	58	95	55.9	17.3	28.7	19.3
Johnson et al. (2010)	142	74	6–18 mos	45		0	100	68		33
Löhler et al. (2017b)	35,000	72.8		49				56.5^a		29^a
Sabin et al. (2020)	75	64.1	4–6 wks	48	MiMo	81	100	31.3	15.6	13.3	16.6
Dornhoffer et al. (2020)	95	67	Avg. 7.2 yrs	65	MiMo	0	100	69.2	14.4	35.2^a
De Sousa et al. (2023)	64	63.6	6 wks	52	MiMo	73	100	35.9	24.7	25.75	21.7
Knoetze et al. (2024)	28	60.2	4 wks	50	MiMo		100	45.5		22.25
Humes et al. (2025)	532	69	6 wks	56	MiMo	100	100	39.3	17.7	18.2	17.1

Note. M = mean; SD = standard deviation; wks = weeks; mos = months; yrs = years; avg. = average; for hearing loss severity: Mi = mild; Mo = moderate; S = severe; HA = hearing aid.

Global scores = mean of reported mean scale scores for ease of communication (EC), reverberation (RV) and background noise (BN) scales.

For the analyses of the APHAP-global scores from the RCT of Humes et al. (2025), data from 584 adults for unaided and aided conditions, the NU dataset, were included. Forty-four percent were male. Ages ranged from 50 to 81 years (M = 69.1 and SD = 6.6 years). Better-ear four-frequency (500, 1000, 2000 and 4000 Hz) pure-tone-average, PTA4, ranged from 6.25 to 50.0 dB HL (M = 32.1 dB HL; SD = 9.1 dB HL) and baseline APHAP-global scores ranged from 3.4% to 89.7% (M = 38.8%; SD = 17.6%). The full 66-item PHAP was administered and the APHAP scores were obtained by extracting the pertinent item responses.

Results of Analyses of Baseline Dependence

The mean APHAB-global benefit scores (mean unaided APHAP-global score – mean aided APHAP-global score) from Table 2 are plotted versus the corresponding unaided APHAP-global baseline score in the top panel of Figure 1. The best-fitting line, fitted to the unweighted mean values in the top panel of Figure 1, is also shown and the linear-regression parameters are provided in that panel. It would clearly be inappropriate to apply a single magnitude of benefit, such as 17%, as a meaningful change for all. Such a fixed value would appear to be typical or average for those with baseline unaided APHAP-global scores of 30%–40% but would only represent about half the typical benefit observed for those with baseline APHAP-global scores of 70%.

Figure 1.

The top panel shows the mean APHAB-global scores plotted as a function of the APHAP-global baseline score for several studies. The middle panel shows box plots for the APHAB-global scores from the NU dataset for each of four groups representing the most prevalent baseline-response groups (N = 574 of 584 in these groups). The bottom panel shows the individual data from the NU dataset (N = 584). In all three panels, the solid line (red, online) is the best-fitting line fitted to the data, fitted to the medians in the top two panels and to the individual data in the bottom panel. Regression parameters are shown at the top of each panel.

The middle and bottom panels of Figure 1 show the NU data. Four APHAP-global baseline groups were formed that had large samples (N = 99 to 130; response-category groups 2.5, 3, 3.5, and 4 from Table 1; total N = 574 of 584). Boxplots (middle panel) were generated for each group. As the median baseline score increased, APHAB-global also increased. The best-fitting line fitted to the median values in the middle panel shows a 0.67 change in aided benefit with every one-unit change in baseline score. Thus, the average amount of benefit decreases as the average baseline score decreases.

The bottom panel of Figure 1 shows the corresponding individual APHAP-global data from the 584 adults in the NU dataset. Much higher r² values were obtained, as expected, for the linear regression fit to the medians in the middle panel compared to the fit to the individual data in the bottom panel. Across all three panels, slopes were less than 1, ranging from 0.56 to 0.67. The slopes of the best-fitting equations indicate that APHAB-global is about 55%–67% of unaided APHAP-global score, consistent with Kochkin (1997, 2003). The distribution of the proportional improvements in APHAP-global provided by Kochkin (2003), however, was almost rectangular from 0.05 to 0.75 (Kochkin, 2003). Such large individual variation in APHAB-global scores is reflected in the linear-regression fit to the individual data in the bottom panel of Figure 1 which accounts for only 42% of the variation in APHAB-global.

When the independent variables of age and better-ear PTA4 were added to linear-regression analyses of the individual NU data, the best-fitting model was significant [F(3, 580) = 141.3, p < .001], but only the APHAP-global baseline score was a significant predictor (t = 19.1, p < .001) with p > .39 for each of the other predictors. The correlation between APHAP-global baseline and benefit was r = .65 and this decreased only slightly when controlling the effects of the covariates (part and partial r = .62, .60). Equivalent results were obtained for each gender separately. These findings suggest that the baseline unaided APHAP-global score is the primary determiner of the benefit measured by the APHAB-global PROM, at least among the predictors considered here.

The dependence of APHAB-global scores on unaided APHAP-global baseline scores is not unlike that which can occur from regression to the mean. It is well known that when a test, such as the APHAP, is administered twice, there is a tendency for the scores on the second administration (aided scores here) to regress toward the population mean due simply to chance (Cronbach & Furby, 1970; Nunnally, 1967). As a result, when differences are calculated between the two test scores, patterns not unlike that shown in Figure 1 have been observed. The effects of regression to the mean should always be considered in studies involving pre- and post-intervention assessments with the same test (e.g., Crosby et al., 2003; Edwards et al., 1978; Speer, 1992). The greater the reliability of the test, however, the less the role played by regression to the mean. Depending on the specific measure of reliability, the reliability of the APHAP has been estimated to be .8 < r_xx < .95, all suggesting relatively minor contributions of regression to the mean in the data shown in Figure 1. Regression to the mean is considered in greater detail in the Discussion.

Despite a long history of use of the APHAB, there is relatively little guidance as to how one should interpret the observed unaided APHAP-global scores. Most evaluations of APHAP-global unaided scores have focused on validation of the subscales using other audiologic information, such as pure-tone thresholds and speech-audiometry (Cox et al., 2003; Löhler et al., 2017a). Löhler et al. (2017a) suggested that the unaided APHAP is so dependent on pure-tone hearing loss that it could be used to screen for the presence of such hearing loss. They reported that an unaided APHAB-global score of 15% was the best cut-point for the identification of adults with audiometric hearing loss (PTA4 > 25 dB HL). There appears to have been little evaluation of the functional significance of unaided APHAP-global scores, however, beyond associations with pure-tone and speech audiometry. Rather, the focus in most analyses has been on the derived-benefit measure, the APHAB-global score.

Due to the general lack of research on the functional interpretation of unaided APHAP-global scores, which is desirable for the segregation of individuals into functionally relevant groups, we have opted to form baseline groups tied to the response categories used in the APHAP (Table 1). Others have opted to rely on percentiles or quartiles to determine how a given individual fares in comparison to similar peers but as noted above, this depends critically on the characteristics of the peer group, including unaided APHAP scores. In addition, quartiles and percentiles only provide a perspective on performance or benefit relative to that of the peer group or norms, not the absolute functional performance or benefit. We felt that categorical changes in the PHAP-based measure would better reflect functional changes resulting from intervention than the use of percentile-based norms. For example, an unaided APHAP-global score corresponding to experiencing communication difficulties “half the time” (4) that then improved to a score corresponding to “occasionally” having such difficulties (3) is more directly interpretable than the comparison of each score to normative data.

We next sought to establish baseline-dependent MCIDs for the APHAB. Distribution-based estimates of the MCIDs for the APHAB-global score were estimated first, followed by an anchor-based estimate. The anchor was based on hearing-aid “success”: the retention and daily usage of the devices 6 months after the hearing-aid fitting.

Derivation of MDDs and MCIDs

As noted in the Introduction, both the MCID and the MDD are often derived from the baseline (unaided) standard deviation, SD_b. Establishing good estimates of the baseline SD for the populations for whom hearing-aid PROMs are targeted is critical to deriving appropriate distribution-based MCID values. Figure 2 shows the SD_b values for the APHAP-global score for each of the most prevalent response-category groups. It is well known that the SD depends on the range of scores (e.g., Hozo et al., 2005; Shi et al., 2020), often referred to as the “range rule,” which simply states that the SD = range/4. The solid line in Figure 2 plots this simple approximation using the observed range of scores for each response-category group. The baseline SD is clearly driven, in large part, by the range of scores for each response group. This follows from the partitioning of the results into groups based on identical score ranges (Table 1). Importantly, the one-size-fits-all population estimates at the far right for the APHAP-global agree with the single-value estimates reported in the literature but are clearly much larger than the SD_b values for each baseline group. This too follows from the dependence of SD_b on the range of scores.

Figure 2.

The boot-strapped (N = 1000) standard deviations and 95% confidence intervals for unaided baseline (SD_b) are plotted as a function of baseline response category (in ½-steps) over the range of the most prevalent baseline responses for the APHAP PROM. The solid line is the predicted SD_b based on the range of scores within each response-category group.

The boot-strapped SD_b values from Figure 2 are provided in Table 3 for the APHAP-global, along with the 95% confidence intervals for each estimate. (Boot-strapped estimates generated throughout this article were based on 1,000 repetitions with bias-correction and acceleration of estimates, BCa.) Next, the MCID was calculated in various ways. The simplest estimate, one-half the SD_b value, is provided in the next column of Table 3. This is equivalent to a Cohen's-d effect-size criterion of d = 0.5, a medium effect size, when the baseline SD is used in the denominator (Klukowska et al., 2024; Norman et al., 2003; Sedaghat, 2019). MCID estimates based on d = 0.25, a small-effect criterion, can be generated from SD_b estimates in Table 3 by multiplying SD_b by 0.25 instead of 0.5.

Table 3.

Boot-Strapped Estimates of Baseline SD and Derived Estimates of Minimum Clinically Important Differences (MCID) for the APHAB-Global for Each of Three Different Estimation Methods.

APHAP response categories-one-half steps	SD_baseline (95% CI)	0.5 × SD_baseline	SEM × 2.77; SEM = SD_b × (1−r)^0.5 r_xx = 0.85
2.5. Seldom/occasionally	1.8 (1.6, 2.0)	0.9	2.0
3.0. Occasionally	2.9 (2.7, 3.1)	1.4	3.1
3.5. Occasionally/half time	2.8 (2.6, 3.0)	1.4	3.0
4.0. Half-the-time	3.7 (3.4, 3.9)	1.9	4.0
4.5. Half time/generally	3.3 (3.0, 3.6)	1.6	3.5
5.0. Generally	3.0 (2.7, 3.3)	1.5	3.3
5.5. Generally/practically always	3.0 (2.3,3.3)	1.5	3.2
All	17.7 (16.9, 18.5)	8.8	18.9

Note. Estimates were derived from the APHAP-global dataset (NU, N = 584). SD = standard deviation; CI = confidence interval; SEM = standard error of measurement; r_xx = test−retest correlation.

Another estimation method, shown in the last column of Table 3, involves first calculating the standard error of measurement (SEM) from the SD_b value using the test‒retest correlation (r_xx) and then multiplying this by 2.77 (1.96 × 2^0.5; Demorest & Walden, 1984; Klukowska et al., 2024; Norman et al., 2003; Sedaghat, 2019). Because the benefit measures are difference scores, unaided—aided, the SEM of both scores must be considered. This results in the standard error of the difference score equaling 2^0.5 × SEM. Finally, the 95% confidence interval is obtained by multiplying 2^0.5 × SEM by 1.96 (or, SEM × 2.77). The test‒retest correlation, r_xx, was estimated to be 0.85 for the APHAP-global baseline scores (Cox & Alexander, 1995); the derivation of this estimate is described in more detail in the Discussion.

As expected (e.g., Franceschini et al., 2023; Klukowska et al., 2024), the various MCID estimates in Table 3 do not converge on the same values. However, the SD_b and SEM × 2.77 estimates are quite similar which arises from use of r_xx = .85 as the test‒retest correlation for both PROMs.

Regardless of the method of computation, the distribution-based MCID estimates in Table 3 for the APHAP-global PROM clearly vary somewhat across baseline groups, as shown in Figure 2 for SD_b. To demonstrate the consequences of this variation, the SEM-based MDD estimate of 19% for the APHAB-global, a value in line with estimates in the literature, was applied to the NU dataset. This study was of interest because 83.9% of those who enrolled kept their hearing aids after a 6-week trial period. When a single MDD of 19% for the APHAB-global benefit was applied, 41.2% of the total sample exceeded the MDD with values of 5.6%, 32.2%, 63.4%, 72.7%, 68.6%, and 72.2% for unaided baseline APHAP-global response categories of 3.0 to 5.5, respectively. When the baseline-specific MDDs in Table 3 were applied to the APHAB-global scores, 84.9% demonstrated benefit with values of 80.6%, 85.2%, 92.4, 91.9%, 94.3%, and 100% for unaided baseline APHAP-global response categories of 3.0 to 5.5, respectively. The higher prevalence of APHAB-global benefit exceeding the baseline-specific MDDs, 84.9%, is consistent with the high retention rate of the devices by these individuals, whereas the application of a single MDD value to all cases yielded only 41.2% who appeared to experience meaningful benefit. Why would 58.8% without meaningful benefit opt to keep their hearing aids?

Exploring Anchor-Based Approaches to MCID Estimation for the APHAB-Global

Assuming the APHAP response categories have functional significance, we evaluated whether meaningful differences could be defined using these response scales directly. As noted above, the two most employed change criteria for 7-point response scales have been either a full-point change or a one-half-point change on the response scale (Norman et al., 2003). The ½-point change criterion afforded greater resolution in these exploratory analyses and was evaluated here. Based on the analyses of other 7-point Likert-scale PROMs, the authors hypothesized that a half-point change in APHAP-global scores after 6 weeks of hearing-aid use reflects clinically meaningful changes. Using the NU (N = 584) dataset, the prevalences of unaided and aided response categories, as well as changes in response categories (benefit), for the PHAP-based global scores were determined. Eighty percent of the individuals demonstrated APHAB-global values representing at least a ½-step improvement. For a larger 1-step change criterion, only about 55%–60% would meet this one-step change criterion for functional change. Recall that about 84% of the adults in the NU dataset opted to keep their hearing aids after the 6-week trial.

ROC Analyses

To better evaluate the optimal response-scale change criterion for the MCID, Receiver Operating Characteristic (ROC) curves were generated. The ROC analyses were used to determine the optimal cutoff change criterion for the identification of hearing-aid “success.” The NU dataset (N = 584) included data from 465 older adults, 79.6% of study enrollees, who returned for evaluation 6 months after the hearing-aid fitting. In earlier studies, hearing-aid “success” has often been defined based on use of hearing aids for at least 4 h per day after 1 year or more of use (e.g., Cox & Alexander, 1995; Cox & Rivera, 1992; Hickson et al., 2010). This was likely a reasonable usage criterion from an era during which most candidates for hearing aids had moderate or severe hearing loss and frequent daily usage was expected. Even so, about 10%–25% of those who had worn hearing aids for 1 year to more than 10 years, self-reported daily usage was less than 4 h per day (e.g., Cox & Alexander, 1995; Cox & Rivera, 1992; Johnson et al., 2010). As hearing-aid candidacy criteria have changed to include more adults with milder hearing loss, part-time hearing aid use may be more realistic. Hearing-aid “success” was defined here in two ways: (1) an individual opted to keep their hearing aids at 6-weeks post-fit and was wearing the devices an average of 2 or more hours daily at 6-months post-fit; or (2) the same as (1) except for an average of 4 or more hours of daily use at 6 months. Note that the present definition of success includes daily usage criteria based on datalogging. In many prior studies, including those making use of the APHAB (e.g., Cox & Alexander, 1995; Johnson et al., 2010), the definition of success was based on self-reported estimates of usage. It has been found that usage estimates based on self-report tend to be 1 to 4 h higher than those obtained with datalogging for the same individuals (Brooks, 1972; Haggard et al., 1981; Humes et al., 1996; Laplante-Levesque et al., 2014; Solheim & Hickson, 2017; Taubman et al., 1999).

Using these two success criteria, ROC curves were generated for the APHAB-global scores for nine ½-response-category steps between one full response-category decline (−1) to 3.5 response-category improvements. ROC curves are typically evaluated by the Area-Under-the-Curve (AUC) metric with the measured AUC tested against AUC = 0.5, or chance performance. As shown in Figure 3, the AUC values ranged from 0.56 to 0.61 with one of the two AUC values being significant (p < .05). It should be noted, however, that the nonsignificant AUC value shown in Figure 3 was p = .05, which would often be considered an acceptable criterion for statistical significance in exploratory analyses like these. Nonetheless, even the significant AUC value is small. The best cut-off point varied somewhat across the ROCs. For the APHAB-global, a change criterion of a half-step yielded the highest Youden Index based on 4 h of daily hearing-aid use at 6 months. However, the Youden Index was highest for a change criterion of 1.5 steps for the 2-h daily usage success criterion. Thus, a single change criterion using the response scale was not found to be optimal across these two ROCs and neither of the ROCs had high AUC values.

Figure 3.

Receiver operating characteristic (ROC) curves for the APHAB-global PROM and for hearing-aid success based on either 2 or 4 h of daily usage at 6-months post-fit. The area under the curve (AUC) metrics for each ROC curve are provided in the figure legend and AUC values differing significantly (p < .05) from chance (AUC = 0.5) are marked with an asterisk.

Binary Logistic-Regression Analyses

Binary logistic-regression analyses were performed to further evaluate the ½- and 1-step change criteria for the APHAB-global and PHAB-global. In these exploratory analyses, we considered both “success” criteria. Table 4 provides the odds ratios (ORs) and their boot-strapped 95% confidence intervals for the APHAB-global and each success criterion. The left-most set of ORs in Table 4 is for the half-step change criterion and the right-most set of ORs is for the 1-step change criterion. Covariates in each analysis were gender, age, and better-ear PTA4 and, for two of the four analyses, those marked by an asterisk by the ORs, the PTA4 covariate was significant (p < .05). For the 2-h-a-day usage success criterion (top row of Table 4), neither the half-step nor the 1-step change criterion resulted in significant fully adjusted ORs. For the 4-h-per-day success criterion (bottom row of Table 4), the APHAB-based change criteria were all significant. The odds of hearing-aid success at 6-months post-fit were 1.6 to 1.8 times higher if the APHAB-global score at 6 weeks showed improvement corresponding to at least one-half or one-full step. The significant ORs were slightly higher for the ½-step change criterion than for the full-step change criterion.

Table 4.

Fully Adjusted Boot-Strapped Odds Ratios (ORs) (and 95% Confidence Intervals) From Several Binary Logistic-Regression Analyses for Two Criteria for Hearing-Aid “Success.”

Usage at 6 months	½-step response category OR (95% CI)	1-step response category OR (95% CI)
≥2 h/day	1.4 (0.6, 2.8)	1.7 (0.9, 3.5)
≥4 h/day	1.8 (1.1, 3.2)*	1.6 (1.0, 2.5)*

Note. The left-most set of ORs were for the one-half-step change and the right-most set of ORs were for the 1-step change. Covariates in each logistic regression analysis were gender, age, and better-ear PTA4. For 2 of the 4 analyses, marked with an asterisk, PTA4 was the only significant covariate. For the ½-step criterion and ≥ 4 h/day, the OR (95% CI) for PTA4 = 1.028 (1.001, 1.054; p = .038) and, for the 1-step criterion and ≥ 4 h/day, the OR (95% CI) for PTA4 = 1.027 (1.001, 1.054; p = .039). Significant (p < .05) ORs for the APHAP change criterion are shown in bold and italicized font.

Best-Supported MCIDs and MDDs

Overall, most analyses favor a change of one-half step as meaningful for the APHAB-global. The midpoints of each response category from 1 to 7.5 are shown in Table 1. The differences between successive midpoints define the change in APHAB-global score at each baseline that results in at least a one-half-step improvement in score from unaided to aided conditions. For example, a person who has an unaided APHAP-global score between 62.5% and 72%, would fall into response category 5, corresponding to a rating of “generally.” The midpoint response for those in this category is 67.3% as shown in Table 1. For an aided improvement to be considered meaningful, at least a half-step improvement, the response category would be required to improve to 4.5, which has a corresponding midpoint score of 56.3%. This represents an improvement in APHAB-global score of 11.0%. This change, along with those generated in the same manner for baseline response categories from 2.5 to 5.5, are shown as the filled circles in Figure 4. These values are also presented in Table 5 and are considered the best estimates of the MCIDs for the APHAB-global PROM. For comparison, the unfilled circles in Figure 4 show the MDDs estimated using the SD_b-based distribution method (Table 3; far-right column). The MDD is typically smaller than the MCID, although the variation of the two with baseline score is similar.

Figure 4.

Plots of the suggested minimal clinically important differences (MCIDs; filled circles) and the minimum detectable differences (MDDs; unfilled circles) for the APHAB-global PROM as a function of APHAP-global unaided response category. The MCIDs represent the differences in response-category midpoints from Table 4. These data were derived from the NU Dataset (N = 584).

Table 5.

APHAP-Global Response-Category Baselines, in One-Half Steps, Corresponding Lower and Upper Score Ranges, Response-Range Midpoints, and APHAB-Global MCIDs are given for Each Response Category. Too Few Data Were Available for the Response-Scale Extremes to Estimate the MCID.

APHAP-global baseline response categories	Half-step response scale	Lower limit (%)	% at midpoint	Upper limit (%)	APHAB-global MCID (%)
Never	1	1	2.4	3.75
Never/seldom	1.5	3.751	5.1	6.5
Seldom	2	6.501	9.5	12.5
Seldom/occasionally	2.5	12.501	15.5	18.5	6.0
Occasionally	3	18.501	23.3	28	7.8
Occasionally/half-the-time	3.5	28.001	32.8	37.5	9.5
Half-the-time	4	37.501	43.8	50	11.0
Half-the-time/generally	4.5	50.001	56.3	62.5	12.5
Generally	5	62.501	67.3	72	11.0
Generally/practically Always	5.5	72.001	76.8	81.5	9.5
Practically always	6	81.501	84.5	87.5
Practically always/always	6.5	87.501	90.5	93.5
Always	7	93.501	94.9	96.25
Always+	7.5	96.251	97.6	99

At first glance, it may appear that the way to use the values in Table would be to identify the one-half step grade associated with the unaided PHAP-based score and then identify the one-half-step grade corresponding to the aided PHAP-based score. If the improvement in grade is at least 0.5, then the MCID will have been met or exceeded. Such a use of the MCIDs in Table 5 would be inappropriate. To illustrate the problem with this approach, assume an individual gives an unaided APHAP-global score of 65% and, after six weeks of hearing-aid use, gives an aided APHAP-global score of 54%, an APHAB-global benefit score of 11%. From Table 5, this corresponds to a change of one-half step along the response scale from “generally” to “half-the-time/generally” and would be considered as meeting the criterion for an MCID. However, consider another individual with the same unaided baseline score of 65% but an aided score of 61%, resulting in an APHAB-global (benefit) score of 4%. Once again, this example yields a change of one-half step along the response scale from “generally” to “half-the-time/generally” but would be incorrectly considered as having met the criterion for an MCID. Instead, the appropriate use of Table 5 is to identify the response category corresponding to the unaided baseline score, but then to use the MCID APHAB-global values in the far-right column of Table 5. For both the preceding examples, the same MCID, APHAB-global (benefit) = 11.0%, would apply and only the first case would exceed the MCID.

A far simpler approach to determine benefit ≥ MCID would be to abandon the use of the percentages for frequency of speech-communication difficulties all together. In this case, the responses, 1 to 7, would be averaged, rather than their associated percentages. If the unaided and aided APHAP-global scores are computed directly from the response categories and then subtracted, mean unaided—mean aided, a difference ≥ 0.5 reflects at least a half-step change in mean global scores on this 7-point scale. This is the recommended procedure for application of the half-step MCID criterion. It is much simpler than relying on the % values in Table 5. The primary drawback with this simplified approach to scoring and interpretation of benefit scores is that it has not been used previously. All prior APHAP and APHAB scores have been reported as percentages reflecting the frequency of difficulties experienced. Global rating scores, however, can be converted to corresponding percentage scores using the following formula: APHAP score in percentage = ‒4.88 + [109.77/(1 + e ^{(−(APHAP score – 4)/−1.1)})], where the APHAP score included in the exponent is the mean response rating. This transformation is from the best-fitting four-parameter sigmoidal function (adjusted r² = .997) relating assigned percentages to the seven integer response categories. This transformation from mean rating scale score to mean percentage difficulty score might be needed, for example, when comparing results to prior studies or to previous tests, all of which most likely have been expressed in percentages of difficulties and not as mean rating scores.

Further research that includes more individuals with higher baseline scores is needed to enable the generation of MCIDs for such baseline score ranges. This is unnecessary for those with very low baseline scores, as it is unlikely that those who have mean global APHAP unaided baseline scores corresponding to “seldom” or “never” would take up hearing aids. At the other extreme, those who have mean baseline scores corresponding to “practically always” or “always” may be candidates for the use of other devices, such as cochlear implants.

Discussion

Summary of Findings and Recommendations

The evidence presented above supports the use of baseline-dependent change criteria for the APHAB-global. A single value of the 95% critical difference, MDD or MCID, will typically under-identify meaningful benefit for those with better (lower) baselines and over-identify benefit for those with poorer (higher) baselines. Both in the synthesis of the literature and the analyses of the NU dataset presented here, the measured hearing-aid benefit was found to vary significantly with the unaided baseline score, including in regression analyses adjusting for differences in age, hearing loss, and gender. Baseline-specific MDDs and MCIDs are required for the accurate determination of significant benefit from hearing aids using the APHAB-global.

The analyses presented above focused on the shorter APHAB outcome measure rather than the PHAB, given its broader use clinically and in clinical research. Analyses of the PHAB-global scores from the NU dataset and two other clinical trials (Haskell et al., 2002; Humes et al., 2017) yielded the same conclusions as those drawn from the analyses of the APHAB-global scores presented here. This is not surprising, given that the APHAB was developed from the analyses of the PHAB data by Cox and Alexander (1995), including the use of identical items from three scales (EC, RV, and BN) common to both instruments as well as the use of an identical response scale. The individual data for both PHAP-based measures were available in the NU dataset and the correlations between the PHAP/B and APHAP/B were 0.97, 0.96, and 0.95, for unaided, aided, and benefit scores, respectively. Given these correlations, together with the similar composition of the scales and the use of identical response scales and scoring, the MDD and MCID values described above for the APHAB-global most likely can be used with the PHAB-global as well.

Regression to the Mean

The dependence of the APHAB-global score on the unaided APHAP-global score, described in Figure 1, resembles the expected result for the effect of regression to the mean. Consideration of the effects of regression to the mean has a long history in the evaluation of test scores in education and PROMs in clinical contexts because the typical approach is to obtain a score twice, once before intervention and again following intervention (e.g., Nunnally, 1967; Speer, 1992). When a test or scale, such as the APHAP, is repeated, regression to the mean may affect the results, especially when the results are interpreted based on a change score or a difference score, as for the APHAB. The classic approach to adjust scores for regression to the mean is to generate a “true” difference score by adjusting the observed baseline score to a true baseline score with the adjustment based on the reliability of the measure (Cronbach & Furby, 1970; Nunnally, 1967). To make such adjustments, one must know the reliability of the test score, the APHAP-global, and the mean APHAP-global baseline score of the group (as an estimate of the mean for the population of interest). As applied to the APHAB difference score, the true APHAP baseline score = r*(APHAP global baseline) + M, where r is a measure of reliability and M = the group mean baseline score (Cronbach & Furby, 1970; Nunnally, 1967). This correction for regression to the mean has been recommended for use in the study of clinical interventions (e.g., Crosby et al., 2003, 2004; Speer, 1992; Speer & Greenbaum, 1995).

Revelle and Condon (2019) identified at least a dozen different measures of “reliability,” and a variety of reliability coefficients have been used to correct scores for regression to the mean. Most often, the test‒retest correlation, r_xx, has been recommended (Nunnally, 1967; Revelle & Condon, 2019). In the absence of test‒retest reliability studies, the split-half correlation is an alternative (e.g., Trimble & Cronbach, 1943).

The test‒retest reliability of the APHAP-global was estimated here to be r_xx = 0.85. This was determined from results for test‒retest reliability of the scale scores comprising the APHAP-global. Somewhat surprisingly, no other test‒retest correlations could be found for the unaided APHAP scales or the global score. Each scale score in the APHAP is based on six items, resulting in 18 items for the APHAP-global score. Using the lowest scale test‒retest correlation (r_xx = 0.65 for the six-item RV scale) reported by Cox and Alexander (1995) and applying the Spearman-Brown formula to estimate the test‒retest correlation for the 18-item global score yields an estimated r_xx value of .85 for the APHAP-global. For comparison, we calculated r_xx for the full PHAP-global using the raw data (scale scores only) from 334 older adults in the NIDCD/VA clinical trial (Haskell et al., 2002). The NIDCD/VA trial repeated the unaided PHAP measurements four times with an interval of about 90 days between each repetition of the unaided PHAP. The observed test‒retest correlations for unaided PHAP-global scores for these data ranged from r_xx = .80 to .87 across test intervals, values in line with the estimated APHAP-global r_xx value of .85.

Finally, in the absence of solid r_xx values for the APHAP-global unaided scores, we also generated split-half reliability estimates using the R package psych (Version 2.4.12; Revelle, 2024) and obtained split-half correlations of r = .95, .92, and .84, for the maximum, average, and minimum split-half correlations for the unaided APHAP-global score, respectively. Based on these split-half correlations, and the test‒retest data for the unaided PHAP-global scores presented above, an assumed r_xx of .85 for the APHAP-global unaided scores appears reasonable.

The effects of the assumed r_xx value on the estimation of the true APHAP-global unaided score and the resulting estimates of the true APHAB-global scores are shown in Figure 5. The range of r_xx values included in Figure 5 is .70 to .95. For comparison, the observed APHAP-global baseline and APHAB-global benefit scores from the NU dataset are plotted as filled circles in Figure 5. The estimated true APHAB-global scores for those in the first or fifth quintiles of the baseline score suggest that regression to the mean may have resulted in the overestimation of the true APHAB-global scores for those with the highest baseline scores and underestimated the true APHAB-global scores for those with the lowest baseline scores. Using the assumed r_xx value of r = .85, these errors at each extreme amount to about 3–4 points and flatten the dependence of APHAB-global on unaided baseline slightly. Of course, if a lower r_xx value is assumed, the flattening of this function is even greater as shown in Figure 5. However, even for the worst case shown in Figure 5, there is still a dependence of APHAB-global scores on unaided APHAP-global baseline scores.

Figure 5.

Estimates of the true APHAB-global scores are shown for a range of r_xx values from 0.70 to 0.95. Estimated true APHAP-global baseline score = r_xx × (APHAP global baseline) + M, where r_xx is the test-retest reliability of the APHAP-global baseline score and M = the group mean baseline score.

In summary, regression to the mean undoubtedly has some effect on the dependence of APHAB-global scores on unaided APHAP-global baseline scores. The best estimates of these effects, however, are small given the assumed r_xx of .85 for the APHAP-global scores.

Additional Considerations

The evidence presented here supports the use of a half-step change in the PHAP-based response scale as an MCID. The anchor-based MCIDs established here for the APHAB-global are shown in Figure 3 and Table 5, while Table 3 shows the MDDs established using the distribution-based method and SD_b. As noted, given the same response scale and assuming the same reliability for the PHAP and the APHAP global scores, the SD_b, MDD, and MCID estimates provided in Tables 3 and 5 can be applied to each of these PROMs. As was noted, implementation of the ½-step change criterion for MCID can be accomplished more readily by averaging the integer responses directly rather than converting them to percentages representing the frequency of difficulties experienced.

When using either PHAP-based PROM, it must be kept in mind that it is the frequency of experiencing everyday communication difficulties, from never (1%) to always (99%), that is being self-reported. This is not the same as assessing how much difficulty is experienced in those same communication contexts. To demonstrate this, data from Cox et al. (1991) were used. They administered the PHAP and the Intelligibility Rating Improvement Scale (IRIS) to 42 elderly hearing-aid users. The IRIS presented the same 66-listening situations as the PHAP, but the response scale was changed to reflect the percentage of speech understood using the same seven-response scale and values from 1% to 99%. There was little correspondence between the benefit measured on the speech-communication scales of the PHAB and those same scales on the IRIS, with correlations of 0.39, 0.34 and 0.54 for the EC, RV and BN scales, respectively. In support of this dissociation between frequency of difficulties and severity of difficulty, Cox et al. (2003) found low-to-moderate correlations, 0.3 < r < 0.6, between APHAB scores and behavioral measures of speech understanding in noise. Thus, reducing the frequency of communication difficulties experienced does not necessarily imply that speech understanding performance has been improved with hearing aids, whether the latter is measured by self-report (IRIS) or by behavioral measures of speech-in-noise performance. Research is needed to further explore the meaning and interpretation of the benefit scores obtained with the PHAP-based PROMs.

The appropriate determination of SD_b is essential for the derivation of both the MDD and MCID for hearing-aid PROMs. Although the limited evidence available and presented in Figure A1 in the Appendix suggests that it is not critical whether the baseline score is obtained at pre-intervention or post-intervention, using the “then-test” method, additional research is needed. At a minimum, it is important that future studies using the PHAP-based PROMs specify the method used to establish baseline scores.

Until more research is available, the way in which the scale items were administered should be clearly specified. For the NU dataset reviewed here, for instance, the 66-item PHAP was administered, with APHAP scores obtained by extracting the responses to the 24-items comprising the APHAP or the 18 items comprising the APHAP-global. This is the same procedure as used by Cox and Alexander (1995) to generate the original norms for the APHAP/B and that has been followed in several studies. Subsequently, Johnson et al. (2010) compared the original PHAP-extracted APHAP norms to a separate set of norms obtained using the APHAP as a separate and independent scale (and with different hearing-aid technology compared to 1995). For the EC, FT and BN scales comprising the APHAP-global, there were no notable differences in percentiles between these two sets of norms. This comparison suggests that the way in which the score was obtained is of little consequence. Nonetheless, it is recommended that this be reported in future research studies until this can be confirmed.

We have proposed some guidelines for MDDs and MCIDs that can be used with the APHAB-global (or PHAB-global) to identify meaningful changes following the use of hearing aids. Professional practice guidelines recommend the use of outcome measures, including self-report measures, to assess the benefit of hearing aids (ASHA; Valente et al., 2006). MCIDs are central to the interpretation of those outcomes. In addition, both MDDs and MCIDs are essential to the interpretation of the outcomes of research examining the efficacy and effectiveness of hearing-aid interventions. These values are not only of importance to the interpretation of the results at the conclusion of a study but also are needed to appropriately power studies prior to data collection at the design stage.

Finally, our focus in this article has been on the APHAB-global with implications for the parent outcome measure, PHAB-global, as well. Although one of the most broadly used self-report outcomes, many others exist. It may be the case that the baseline dependence of outcomes that has been described here is also relevant for these other outcome measures. In fact, we have examined this for one of the other widely used outcome measures, the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982), and have observed a similar dependency (Humes et al., 2025b).

Footnotes

Acknowledgments

For the data reported from Northwestern University (NU), the collection of these data was supported, in part, by a research contract from PCORI (#HL-2019C1-16094).

ORCID iDs

Larry E. Humes

Jasleen Singh

Ethical Approval and Informed Consent Statement

For the Northwestern University (NU) data reported here, the study protocol, including screening of participants, was approved by Northwestern University's Institutional Review Board (STU00213710).

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: For the Northwestern University (NU) data reported here, the research to gather those data was funded, in part, through a Patient-Centered Outcomes Research Institute (PCORI) Award (HL-2019C1-16094; S. Dhar, L. Humes Co-PIs). The views, statements, and opinions in this presentation are solely the responsibility of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The NU data will be made freely available through the PCORI website following publication of the final project report on that website in the near future.

Appendix A

References

American Speech-Language-Hearing Association. (n.d.). Hearing aids for adults [Practice portal]. www.asha.org/Practice-Portal/Professional-Issues/Hearing-Aids-For-Adults/

Arthur

Watts

Davies

Manchaiah

Slater

(2016). An exploratory study identifying a possible response shift phenomena of the Glasgow Hearing Aid Benefit Profile. Audiology Research, 6(2), 152. https://doi.org/10.4081/audiores.2016.152

Brooks

(1972). The use and disuse of Medresco hearing aids. British Journal of Audiology, 13(4), 81–84. https://doi.org/10.3109/03005367909078882

Chisolm

T. H.

Abrams

H. B.

McArdle

Wilson

R. H.

Doyle

P. J.

(2005). The WHO-DAS II: Psychometric properties in the measurement of functional health status in adults with acquired hearing loss. Trends in Amplification, 9(3), 111–126. https://doi.org/10.1177/108471380500900303

Cohen

(1988). Statistical power analysis for the behavioral sciences. Routledge Academic.

Cox

R. M.

(1997). Administration and application of the APHAB. Hearing Journal, 50(4), 32–48. https://doi.org/10.1097/00025572-199704000-00002

Cox

R. M.

Alexander

G. C.

(1995). The abbreviated profile of hearing aid benefit. Ear and Hearing, 16(2), 176–186. https://doi.org/10.1097/00003446-199504000-00005

Cox

R. M.

Alexander

G. C.

Gray

G. A.

(2003). Audiometric correlates of the unaided APHAB. Journal of the American Academy of Audiology, 14(7), 361–371. https://doi.org/10.1055/s-0040-1715755

Cox

R. M.

Alexander

G. C.

Gray

G. A.

(2005). Hearing aid patients in private practice and public health (veterans affairs) clinics: Are they different? Ear and Hearing, 26(6), 513–528. https://doi.org/10.1097/01.aud.0000188188.01311.0b

10.

Cox

R. M.

Alexander

G. C.

Gray

G. A.

(2007). Personality, hearing problems, and amplification characteristics: Contributions to self-report hearing aid outcomes. Ear and Hearing, 28(2), 141–162. https://doi.org/10.1097/AUD.0b013e31803126a4

11.

Cox

R. M.

Gilmore

(1990). Development of the profile of hearing aid performance (PHAP). Journal of Speech and Hearing Research, 33(2), 343–357. https://doi.org/10.1044/jshr.3302.343

12.

Cox

R. M.

Gilmore

Alexander

G. C.

(1991). Comparison of two questionnaires for patient-assessed hearing aid benefit. Journal of the American Academy of Audiology, 2(3), 134–145.

13.

Cox

R. M.

Rivera

I. M.

(1992). Predictability and reliability of hearing aid benefit measured using the PHAB. Journal of the American Academy of Audiology, 3(4), 242–254.

14.

Cronbach

L. J.

Furby

(1970). How should we measure “change”-or should we? Psychological Bulletin, 74(4), 68–80. https://doi.org/10.1037/h0029382

15.

Crosby

R. D.

Kolotkin

R. L.

Williams

G. R.

(2003). Defining clinically meaningful change in health-related quality of life. Journal of Clinical Epidemiology, 56(5), 395–407. https://doi.org/10.1016/s0895-4356(03)00044-1

16.

Crosby

R. D.

Kolotkin

R. L.

Williams

G. R.

(2004). An integrated method to determine meaningful changes in health-related quality of life. Journal of Clinical Epidemiology, 57(11), 1153–1160. https://doi.org/10.1016/j.jclinepi.2004.04.004

17.

Demorest

M. E.

Erdman

S. A.

(1988). Retest stability of the communication profile for the hearing impaired. Ear and Hearing, 9(5), 237–242. https://doi.org/10.1097/00003446-198810000-00002

18.

Demorest

M. E.

Walden

B. E.

(1984). Psychometric principles in the selection, interpretation, and evaluation of communication self-assessment inventories. The Journal of Speech and Hearing Disorders, 49(3), 226–240. https://doi.org/10.1044/jshd.4903.226

19.

De Sousa

K. C.

Manchaiah

Moore

D. R.

Graham

M. A.

Swanepoel

(2023). Effectiveness of an over-the-counter self-fitting hearing aid compared with an audiologist-fitted hearing aid: A randomized clinical trial. JAMA Otolaryngology – Head & Neck Surgery, 149(6), 522–530. https://doi.org/10.1001/jamaoto.2023.0376

20.

De Sousa

K. C.

Manchaiah

Moore

D. R.

Graham

M. A.

Swanepoel

(2024). Long-term outcomes of self-fit vs audiologist-fit hearing aids. JAMA Otolaryngology – Head & Neck Surgery, 150(9), 765–771. https://doi.org/10.1001/jamaoto.2024.1825

21.

Dillon

James

Ginis

(1997). Client oriented scale of improvement (COSI) and its relationship to several other measures of benefit and satisfaction provided by hearing aids. Journal of the American Academy of Audiology, 8(1), 27–43.

22.

Dornhoffer

J. R.

Meyer

T. A.

Dubno

J. R.

McRackan

T. R.

(2020). Assessment of hearing aid benefit using patient-reported outcomes and audiologic measures. Audiology & Neuro-Otology, 25(4), 215–223. https://doi.org/10.1159/000506666

23.

Edwards

D. E.

Yarvis

R. M.

Mueller

D. P.

Zingale

H. C.

Wagman

W. J.

(1978). Test-taking and the stability of adjustment scales: Can we assess patient deterioration? Evaluation Quarterly, 2(2), 275–291. https://doi.org/10.1177/0193841X7800200206

24.

Franceschini

Boffa

Pignotti

Andriolo

Zaffagnini

Filardo

(2023). The minimal clinically important difference changes greatly based on the different calculation methods. The American Journal of Sports Medicine, 51(4), 1067–1073. https://doi.org/10.1177/03635465231152484

25.

Granberg

Dahlström

Möller

Kähäri

Danermark

(2014). The ICF core sets for hearing loss–researcher perspective. Part I: Systematic review of outcome measures identified in audiological research. International Journal of Audiology, 53(2), 65–76. https://doi.org/10.3109/14992027.2013.851799

26.

Haggard

M. P.

Foster

J. R.

Iredale

F. E.

(1981). Use and benefit of postaural aid in sensory hearing loss. Scandinavian Audiology, 10(1), 45–52. https://doi.org/10.3109/01050398109076161

27.

Haskell

G. B.

Noffsinger

Larson

V. D.

Williams

D. W.

Dobie

R. A.

Rogers

J. L.

(2002). Subjective measures of hearing aid benefit in the NIDCD/VA clinical trial. Ear and Hearing, 23(4), 301–307. https://doi.org/10.1097/00003446-20020000-00005

28.

Hickson

Clutterbuck

Khan

(2010). Factors associated with hearing aid fitting outcomes on the IOI-HA. International Journal of Audiology, 49(8), 586–595. https://doi.org/10.3109/14992021003777259

29.

Howard

G. S.

Dailey

P. R.

(1979). Response-shift bias: A source of contamination of self-report measures. Journal of Applied Psychology, 64(2), 144–150. https://doi.org/10.1037/0021-9010.64.2.144

30.

Hozo

S. P.

Djulbegovic

Hozo

(2005). Estimating the mean and variance from the median, range, and the size of a sample. BMC Medical Research Methodology, 5, 13. https://doi.org/10.1186/1471-2288-5-13

31.

Humes

L. E.

(2021). An approach to self-assessed auditory wellness in older adults. Ear and Hearing, 42, 745–761. https://doi.org/10.1097/AUD.0000000000001001

32.

Humes

L. E.

Ahlstrom

J. B.

Bratt

G. W.

Peek

B. F.

(2009). Studies of hearing aid outcome measures in older adults: A comparison of technologies and an examination of individual differences. Seminars in Hearing, 30(2), 112–128. https://doi.org/10.1055/s-0029-1215439

33.

Humes

L. E.

Dhar

Meskan

Pitman

Singh

(2025). A multi-site randomized control trial comparing the effectiveness of two self-fit methods to the best-practices method of hearing aid fitting. Journal of Speech, Language, and Hearing Research, 68(4), 2080–2103. https://doi.org/10.1044/2024_JSLHR-24-00423

34.

Humes

L. E.

Dhar

Singh

(2025b). Some considerations in the use of the hearing handicap inventory for the elderly and its derivatives as hearing-aid outcome measures. International Journal of Audiology (Published ahead of print). https://doi.org/10.1080/14992027.2025.2511218

35.

Humes

L. E.

Halling

Coughlin

(1996). Reliability and stability of various hearing-aid outcome measures in a group of elderly hearing-aid wearers. Journal of Speech and Hearing Research, 39(5), 923–935. https://doi.org/10.1044/jshr.3905.923

36.

Humes

L. E.

Humes

L. E.

(2004). Factors affecting “long term” hearing-aid success. Seminars in Hearing, 25(1), 63–72. https://doi.org/10.1055/s-2004-823048

37.

Humes

L. E.

Rogers

S. E.

Quigley

T. M.

Main

A. K.

Kinney

D. L.

Herring

(2017). The effects of service-delivery model and purchase price on hearing-aid outcomes in older adults: A randomized double-blind placebo-controlled clinical trial. American Journal of Audiology, 26, 53–79.

38.

Humes

L. E.

Wilson

D. L.

Humes

A. C.

(2003). Examination of differences between successful and unsuccessful elderly hearing aid candidates matched for age, hearing loss, and gender. International Journal of Audiology, 42, 432–441.

39.

Jaeschke

Singer

Guyatt

G. H.

(1989). Measurement of health status. Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6

40.

Johnson

J. A.

Cox

R. M.

Alexander

G. C.

(2010). Development of APHAB norms for WDRC hearing aids and comparisons with original norms. Ear and Hearing, 31(1), 47–55. https://doi.org/10.1097/AUD.0b013e3181b8397c

41.

Joore

M. A.

Potjewijd

Timmerman

A. A.

Anteunis

L. J.

(2002). Response shift in the measurement of quality of life in hearing impaired adults after hearing aid fitting. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 11(4), 299–307. https://doi.org/10.1023/a:1015598807510

42.

Klukowska

A. M.

Vandertop

W. P.

Schröder

M. L.

Staartjes

V. E.

(2024). Calculation of the minimum clinically important difference (MCID) using different methodologies: Case study and practical guide. European Spine Journal, 33(9), 3388–3400. https://doi.org/10.1007/s00586-024-08369-5

43.

Knoetze

Manchaiah

De Sousa

Moore

D. R.

Swanepoel

(2024). Comparing self-fitting strategies for over-the-counter hearing aids: A crossover clinical trial. JAMA Otolaryngology – Head & Neck Surgery, 150(9), 784–791. https://doi.org/10.1001/jamaoto.2024.2007

44.

Kochkin

(1997). Subjective measures of satisfaction and benefit: Establishing norms. Seminars in Hearing, 19(1), 37–48. https://doi.org/10.1055/s-0028-1083008

45.

Kochkin

(2003). On the issue of value: Hearing aid benefit, price, satisfaction, and brand repurchase rates. Hearing Review, 10(2), 12–26.

46.

Kochkin

(2007). Marketrak VII: Obstacles to adult non-user adoption of hearing aids. Hearing Journal, 60(4), 24–51. https://doi.org/10.1097/01.HJ.0000285745.08599.7f

47.

Laplante-Lévesque

Nielsen

Jensen

L. D.

Naylor

(2014). Patterns of hearing aid usage predict hearing aid use amount (data logged and self-reported) and overreport. Journal of the American Academy of Audiology, 25(2), 187–198. https://doi.org/10.3766/jaaa.25.2.7

48.

Larson

V. D.

Williams

D. W.

Henderson

W. G.

Luethke

L. E.

Beck

L. B.

Noffsinger

Bratt

G. W.

Dobie

R. A.

Fausti

S. A.

Haskell

G. B.

Rappaport

B. Z.

Shanks

J. E.

Wilson

R. H.

(2002). A multi-center, double blind clinical trial comparing benefit from three commonly used hearing aid circuits. Ear and Hearing, 23(4), 269–276. https://doi.org/10.1097/00003446-200208000-00001

49.

Larson

V. D.

Williams

D. W.

Henderson

W. G.

Luethke

L. E.

Beck

L. B.

Noffsinger

Wilson

R. H.

Dobie

R. A.

Haskell

G. B.

Bratt

G. W.

Shanks

J. E.

Stelmachowicz

Studebaker

G. A.

Boysen

A. E.

Donahue

Canalis

Fausti

S. A.

Rappaport

B. Z.

(2000). Efficacy of 3 commonly used hearing aid circuits: A crossover trial. NIDCD/VA hearing aid clinical trial group. JAMA, 284(14), 1806–1813. https://doi.org/10.1001/jama.284.14.1806

50.

Löhler

Gräbner

Wollenberg

Schlattmann

Schönweiler

(2017a). Sensitivity and specificity of the abbreviated profile of hearing aid benefit (APHAB). European Archives of oto-Rhino-Laryngology, 274(10), 3593–3598. https://doi.org/10.1007/s00405-017-4680-y

51.

Löhler

Wollenberg

Schönweiler

(2017b). APHAB-Scores zur individuellen Beurteilung des Nutzens von Hörgeräteversorgungen [APHAB scores for individual assessment of the benefit of hearing aid fitting]. HNO, 65(11), 901–909. https://doi.org/10.1007/s00106-017-0350-z

52.

McArdle

Chisolm

T. H.

Abrams

H. B.

Wilson

R. H.

Doyle

P. J.

(2005). The WHO-DAS II: Measuring outcomes of hearing aid intervention for adults. Trends in Amplification, 9(3), 127–143. https://doi.org/10.1177/108471380500900304

53.

Newman

C. W.

Sandridge

S. A.

(1998). Benefit from, satisfaction with, and cost-effectiveness of three different hearing aid technologies. American Journal of Audiology, 7(2), 115–128. https://doi.org/10.1044/1059-0889(1998/021)

54.

Norman

(2003). Hi! How are you? Response shift, implicit theories and differing epistemologies. Quality of Life Research, 12(3), 239–249. https://doi.org/10.1023/a:1023211129926

55.

Norman

G. R.

Sloan

J. A.

Wyrwich

K. W.

(2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592. https://doi.org/10.1097/01.MLR.0000062554.74615.4C

56.

Nunnally

J. C.

(1967). Psychometric theory. McGraw-Hill.

57.

Perez

Edmonds

B. A.

(2012). A systematic review of studies measuring and reporting hearing aid usage in older adults since 1999: A descriptive summary of measurement tools. PLoS ONE, 7(3), e31831. https://doi.org/10.1371/journal.pone.0031831

58.

Prinsen

C. A. C.

Vohra

Rose

M. R.

, et al. (2016). How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – A practical guideline. Trials, 17(1), 449. https://doi.org/10.1186/s13063-016-1555-2

59.

Revelle

(2024). psych: Procedures for psychological, psychometric, and personality research [Computer software manual] (R package version 2.4.12; 2024-12-05). https://CRAN.r-project.org/package=psych.

60.

Revelle

Condon

D. M.

(2019). Reliability from α to ω: A tutorial. Psychological Assessment, 31(12), 1395–1411. https://doi.org/10.1037/pas0000754

61.

Sabin

A. T.

Van Tasell

D. J.

Rabinowitz

Dhar

(2020). Validation of a self-fitting method for over-the-counter hearing aids. Trends in Hearing, 24, 2331216519900589. https://doi.org/10.1177/2331216519900589

62.

Schwartz

C. E.

Sprangers

M. A. G.

(1999). Methodological approaches for assessing response shift in longitudinal health related quality-of-life research. Social Science and Medicine, 48(11), 1531–1548.

63.

Sedaghat

A. R.

(2019). Understanding the minimal clinically important difference (MCID) of patient-reported outcome measures. Otolaryngology–Head and Neck Surgery, 161(4), 551–560. https://doi.org/10.1177/0194599819852604

64.

Shi

Luo

Weng

Zeng

X. T.

Lin

Chu

Tong

(2020). Optimally estimating the sample standard deviation from the five-number summary. Research Synthesis Methods, 11(5), 641–654. https://doi.org/10.1002/jrsm.1429

65.

Solheim

Hickson

(2017). Hearing aid use in the elderly as measured by datalogging and self-report. International Journal of Audiology, 56(7), 472–479. https://doi.org/10.1080/14992027.2017.1303201

66.

Speer

D. C.

(1992). Clinically significant change: Jacobson and Truax (1991) revisited. Journal of Consulting and Clinical Psychology, 60(3), 402–408. https://doi.org/10.1037//0022-006x.60.3.402

67.

Speer

D. C.

Greenbaum

P. E.

(1995). Five methods for computing significant individual client change and improvement rates: Support for an individual growth curve approach. Journal of Consulting and Clinical Psychology, 63(6), 1044–1048. https://doi.org/10.1037//0022-006x.63.6.1044

68.

Stark

Hickson

(2004). Outcomes of hearing aid fitting for older people with hearing impairment and their significant others. International Journal of Audiology, 43(7), 390–398. https://doi.org/10.1080/14992020400050050

69.

Taubman

L. B.

Palmer

C. V.

Durrant

J. D.

Pratt

(1999). Accuracy of hearing aid use time as reported by experienced hearing aid wearers. Ear and Hearing, 20(4), 299–305. https://doi.org/10.1097/00003446-199908000-00003

70.

Trimble

H. C.

Cronbach

L. J.

(1943). A practical procedure for the rigorous interpretation of test-retest scores in terms of pupil growth. The Journal of Educational Research, 36(7), 481–488. https://doi.org/10.1080/00220671.1943.10881186

71.

Tripathi

S. H.

Min

Cody

A. S.

Shukla

Houssein

F. A.

Howard

J. S.

Previtera

M. J.

Phillips

K. M.

Sedaghat

A. R.

(2024). Variability in minimal clinically important difference calculation and reporting in the otolaryngology literature. The Laryngoscope, 134(5), 2059–2069. https://doi.org/10.1002/lary.31145

72.

Valente

Abrams

Benson

, et al. (2006). Guidelines for the audiologic management of adult hearing impairment. Audiology Today, 18(05), 1–44.

73.

Vanier

Oort

F. J.

McClimans

Gulek

B. G.

Böhnke

J. R.

Sprangers

Sébille

Mayo

, & Response Shift - in Sync Working Group. (2021). Response shift in patient-reported outcomes: Definition, theory, and a revised model. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 30(12), 3309–3322. https://doi.org/10.1007/s11136-021-02846-w

74.

Ventry

I. M.

Weinstein

B. E.

(1982). The hearing handicap inventory for the elderly: A new tool. Ear and Hearing, 3(3), 128–134. https://doi.org/10.1097/00003446-198205000-00006

75.

Walden

B. E.

Surr

R. K.

Cord

M. T.

Edwards

Olson

(2000). Comparison of benefits provided by different hearing aid technologies. Journal of the American Academy of Audiology, 11(10), 540–560. https://doi.org/10.1055/s-0042-1748200

76.

Walden

B. E.

Surr

R. K.

Cord

M. T.

Pavlovic

C. V.

(1998). A clinical trial of the ReSound BT2 personal hearing system. American Journal of Audiology, 7(2), 85–100. https://doi.org/10.1044/1059-0889(1998/017)

77.

Walden

B. E.

Surr

R. K.

Cord

M. T.

Pavlovic

C. V.

(1999). A clinical trial of the ReSound IC4 hearing device. American Journal of Audiology, 8(1), 65–78. https://doi.org/10.1044/1059-0889(1999/010)

78.

Wang

Y. C.

Hart

D. L.

Stratford

P. W.

Mioduski

J. E.

(2011). Baseline dependency of minimal clinically important improvement. Physical Therapy, 91(5), 675–688. https://doi.org/10.2522/ptj.20100229

79.

Weinstein

B. E.

Spitzer

J. B.

Ventry

I. M.

(1986). Test-retest reliability of the Hearing Handicap Inventory for the Elderly. Ear and Hearing, 7, 295–299.

Some Considerations for the Use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) as a Hearing-Aid Outcome Measure

Abstract

Keywords

Introduction

Interpretation of PHAP-Based PROMs

Evidence for the Dependence of Benefit on Baseline

Analyses

Results of Analyses of Baseline Dependence

Derivation of MDDs and MCIDs

Exploring Anchor-Based Approaches to MCID Estimation for the APHAB-Global

ROC Analyses

Binary Logistic-Regression Analyses

Best-Supported MCIDs and MDDs

Discussion

Summary of Findings and Recommendations

Regression to the Mean

Additional Considerations

Footnotes

Acknowledgments

ORCID iDs

Ethical Approval and Informed Consent Statement

Funding

Declaration of Conflicting Interests

Data Availability Statement

Appendix A

References