Sage Journals: Discover world-class research

Abstract

Difficulty recognizing speech in noise is a common complaint among those with sensorineural hearing loss. Yet the degree of difficulty differs widely among individuals, often unrelated to the clinical gold standard for evaluating hearing, the pure-tone audiogram. Research has isolated both auditory and nonauditory factors responsible for these differences, but these factors do not operate in isolation. In the present work, a generic computational model involving simultaneous cue sensitivity, cue reliance, and decision noise provided an integrative framework for identifying sources of between-listener variance not accounted for by the audiogram. The framework was applied to performance differences within and between normal-hearing (NH) and hearing-impaired (HI) groups in the processing of linguistic, acoustic, and statistical cues supporting speech recognition in noise. The primary source of performance differences between groups was differences in sensitivity for the subtle, but largely stationary acoustic cues required for speech recognition. The overwhelming source of performance differences within groups was differences in decision noise associated with more salient, but highly variable statistical cues for speech separation. For speech separation, HI listeners placed far greater reliance than NH listeners on the one cue for which they were most sensitive. HI listeners, but not NH listeners, benefitted by shifting all acoustic information to this most relied on cue. The results provide preliminary support for the feasibility of integrative modeling as a means of evaluating the collective influence of factors affecting speech recognition in noise.

Keywords

integrative modeling speech-in-noise recognition hearing loss individual differences 43.66Dc

Introduction

Difficulty understanding speech in noise is a serious problem for those with hearing loss, affecting work, social interaction, and the quality of life (ASHA task force, 1996; Katz, 1994; Keilman et al., 2007; Keith, 1990). New hearing aid technologies promise major advances in treating the problem using artificial intelligence (AI) and machine learning to reduce the impact of the noise (Nielsen, 2024), but that promise depends on the quality of the data these technologies have regarding the individualized nature of the problem they are designed to treat (Sanchez-Lopez et al., 2020). So far, that data has come by-and-large from standard clinical diagnostics, which have not kept pace with modern research on difficulties listening in noise (Mealings et al., 2020; Pang et al., 2019). The audiogram, the clinical gold-standard for assessing hearing loss, is not a reliable predictor of who will have difficulty listening in noise (Fitzgerald et al., 2023; Smith et al., 2024), even when hearing levels are well within normal limits (Humes, 2019, 2021; Shinn-Cunningham, 2017; Tremblay et al., 2015). Of patients presenting in the clinic with this complaint, 9% to 15% are evaluated to have normal hearing (Kumar et al., 2007; Zhao & Stephens, 2007), while many who meet clinical criteria for hearing loss, perform as well as, or better than, their normal-hearing (NH) cohorts listening in noise (e.g., Alexander & Lutfi, 2004).

Research has identified a host of possible reasons for these differences (Pienkowski, 2017). Animal studies reveal a type of cochlear synaptopathy affecting hearing in noise that is not detected by conventional audiometry. The pathology is thought possibly to be widespread among humans (Kobel et al., 2017; Kujawa & Liberman, 2009; Plack & Léger, 2016). Elevations in threshold resulting from cochlear hair cell loss can be missed if measured only at the audiometric frequencies (Lee & Long, 2012), and even at the audiometric frequencies, there is wide variation in normal thresholds reflecting differences in hair cell health that could affect listening in noise (Plack et al., 2014). At the behavioral level, some listeners may compensate for the effects of a hearing loss by regularly using linguistic information to infer segments of a spoken message obscured or made inaudible by the noise (Brouwer et al., 2012; Calandruccio et al., 2014, 2017; Lutfi et al., 2021). Other listeners will leverage expectations regarding the statistical properties of speech to arrive at decisions regarding who is talking and what was said (Alexander & Lutfi, 2004; Chang et al., 2016; Cherry, 1953; Kidd et al., 2008; Lutfi et al., 2013, 2021), and still others may effectively switch their reliance on acoustic cues to compensate for cues lost to or distorted by their hearing loss (Calandruccio & Doherty, 2008; Doherty & Lutfi, 1996; Jesteadt et al., 2014; Lentz & Leek, 2002; Roverud et al., 2020, 2021; Souza et al., 2015, 2020; Thrailkill et al.). Working memory (Conway et al., 2005), lapses in attention (Bidelman & Yoo, 2020; Brungart & Simpson, 2007), and cognitive changes with age (Helfer et al., 2020) can alternatively influence listening in noise in ways that can be mistaken for a hearing loss. Also, in real-world listening, speech and noise are not static. Unlike the pure tones used in audiometry, voices modulate widely in level and pitch, topics of conversation change abruptly, and different talkers speak unpredictably from different locations at different times. The uncertainty created by such moment-by-moment variation can cause both NH and hearing-impaired (HI) listeners to have great difficulty following the speech of a conversation partner; this is the lesson learned from studies of informational masking (Kidd et al., 2008, 2016; Lutfi et al., 2013). Finally, speech recognition in noise is not one problem, but two. To recognize the speech in noise the listener must also separate the speech from the noise. The separation problem, popularly known as the cocktail-party problem, has a long history of study dating back to Cherry (1953). A defining feature of this work is huge individual differences in performance for both NH and HI listeners (Kidd & Colburn, 2017; Shinn-Cunningham, 2017; Shinn-Cunningham & Best, 2008).

The above narrative underscores the complex and multifaceted nature of the problem of understanding individual differences recognizing speech in noise that are not accounted for by the audiogram. Progress has been made identifying relevant factors, but the problem is unlikely to be reduced to any one or even combination of these factors. Different scenarios pose different challenges, different individuals respond differently to these challenges, and different factors take on different significance when all are simultaneously in play. The goal of the present study was to evaluate the feasibility of developing a generic model that could be applied generally in such cases to evaluate the source of individual difficulties listening in noise not accounted for by the audiogram. Three questions were posed: (a) Can what is known about the individual factors affecting speech recognition in noise be used to model their collective influence in a typical speech-in-noise recognition task, (b) can the model be used to quantify the relative impact and interaction of these factors for individual listeners in this task, and (c) can the information obtained be used to provide a benefit for HI listeners?

The approach required trading specificity for generality. We recognized that to model the specific action of every factor implicated would be too large an undertaking. Instead, we leveraged the fact that different factors often have a common effect on behavior associated with a particular stage of speech processing. By grouping factors in terms of their common behavioral effect, the problem might be reduced to a few general factors that would easily lend themselves to evaluation. This integrative approach was applied here to conditions involving the processing of three major speech cues supporting speech recognition in noise (linguistic, acoustic, and statistical) for both HI and NH listeners. The work is described in four parts. The first covers the development of the model and how its parameters are tied to specific sources of listener variance in the data. The second describes the methods involved in parameter estimation. The third describes results wherein the model is applied to the partitioning of the variance in speech separation and its relation to speech recognition. The fourth describes an interaction between stages of processing that is exploited to provide a benefit in the task for individual HI listeners.

The Integrative Model

The model is designed to evaluate the collective impact of factors on the coordinated function of different stages of auditory processing. Known factors affecting performance, those that were described in the introduction, are assigned to one of three stages of processing determined by whether their primary effect is on cue sensitivity (transduction), cue reliance (selective attention), or decision noise (decision processes). The relative impact of these stages on behavior and the interaction among them is then determined from their ties to specific sources of listener variance in the data. The focus is on the listener variance rather than the mean differences between conditions. In practice, the listener variance is partitioned into three parts using a general linear model, GLM (see Lutfi et al., 2021 for the analytic development). The GLM includes a listener fixed effect, which carries over from one condition to the next, a random effect, responsible for the error in replication of conditions, and a conditional effect, representing the unusual difficulty a particular listener may have with a particular condition or conditions. The parameters of the integrative model at each stage are then selected to capture the three sources of listener variance in the GLM recovered from the data.

Figure 1 shows the stages of the integrative model and their computational formulas. The computational formulas are intended to represent generally accepted views of the behavior of each stage (Common physiological processes associated with different behaviors are neither necessary nor implied). The input I to the model is a pair of column vectors representing two sets of speech cues; one for speech recognition C_R, the other for speech separation C_S. At the first stage, transduction, a monotonic transformation F of I corrupted by transduction noise $ε \sim N$ determines collectively the effect of cochlear pathology (e.g., hair cell loss, cochlear synaptopathy, auditory neuropathy) on listener's audibility threshold (audiogram), degree of spectro-temporal resolution (STR; see e.g., Bernstein et al., 2013), and smallest discriminable changes in I (see e.g., Lopez-Poveda, 2014; Neely & Jesteadt, 2005). Variance among listeners attributable to transduction thus appears as a conditional effect tied to the individual difficulties listeners have processing the acoustic properties of I. The second stage, selective attention, is where listeners decide for the task at hand which cues in I to monitor and what relative reliance to place on each. Relative cue reliance is modeled as a vector of fixed scalar weights a on cues as estimated from a logistic regression between the listener's trial-by-trial response and the value of cues as they vary from trial to trial (see Berg, 1990 and Lutfi et al., 2023 for details of the analysis). Variance among listeners at this stage is the variance accounted for by the regression and is distinguished from variance due to the other stages by the residual term in the regression (see Berg, 1990 for the theoretical development). The variance at the selective attention stage is manifest as conditional effects of any one or combination of the speech cues (linguistic, acoustic, or statistical) depending on how each listener chooses to selectively attend to these cues. The third stage, decision stage, is where listeners make decisions regarding the likelihood of who is talking (speech separation) and what was said (speech recognition) from trial to trial, here on simply referred to as signal likelihood (Green & Swets, 1966). Factors limiting performance at this stage are those affecting judgements of signal likelihood: individual expectations/uncertainty regarding the statistical properties of the signal and noise (see e.g., Bell & Sejnowski, 1995; Lutfi et al., 2013), imperfect knowledge of how those properties change with context (Kidd et al., 2008), and cognitive factors affecting decisions (memory, learning, and executive function) associated with age (Brouwer et al., 2012; Calandruccio et al., 2014, 2017; Conway et al., 2005; Helfer et al., 2020; Lutfi et al., 2021). In the model the effect of these factors is to introduce decision noise $ζ \sim N$ modulated by the variability, σ (“uncertainty”), in the background noise. The decision noise becomes large and greater than the transduction noise, ζ > ε, as uncertainty increases (see e.g., Lutfi et al., 2013). Variance among listeners at the decision stage is manifest as a fixed effect on performance across conditions having a fixed signal likelihood. Finally, the model includes a common source of internal noise $ξ \sim N$ at each stage to account for the variance in replications inevitably resulting from the random effect of listener lapses in attention and motivation across experimental sessions.

Figure 1.

Schematic of generic integrative model with computational formulas for each stage (see text for details).

Model Application and Simulation

For the present application of the model, an important consideration is that experimental conditions involve the manipulation of speech cues (linguistic, acoustic, and statistical) having different units of measure. This means that common metrics of listener performance (e.g., d′, threshold signal-to-noise ratio [SNR], or percent correct) do not allow for equivalent comparisons from one condition to the next. To permit equivalent comparisons, a measure of performance efficiency (0 ≤ η ≤ 1), specifically intended for this purpose, is adopted from signal detection theory (Tanner & Birdsall, 1958). This measure expresses performance relative to that of an ideal observer, one that optimizes decisions based on the likelihood ratio of speech signal-to-noise specific to each condition (Green & Swets, 1966). This measure is used both to evaluate the overall performance of listeners and to evaluate the different stages of processing across different conditions. For the different stages, η is computed using as input to each stage the output O of the previous stage.

Before data collection, model simulations were performed for a group of 11 NH and 11 HI hypothetical listeners with all conditions exactly as described in the General Methods section. Simulations are useful in support of comparisons to listener results, at least to show measurable expected differences in possible outcomes. The same software programs used to collect and analyze the data from listeners were used to perform the simulations. The section of the code that represented the responses from listeners was simply replaced with code representing the model. Realistic values of model parameters were selected based on previous studies (Lutfi et al., 2020, 2021, 2023) so that the dominant source of between-listener variability would be differences in (A) decision noise, ζ, (B) cue reliance, a , or (C) cue sensitivity, F, ε . The values of η at each stage, based on the computational formulas given in Figure 1, were obtained from the resulting slopes, $λ = d^{'} / d_{ideal}^{'}$ , of the psychometric functions (PFs) relating simulated listener performance at each stage to that of an ideal observer at that stage:

η_{ζ} = {(\frac{λ_{sep}}{λ_{a}})}^{2}

(1a )

η_{a} = {(\frac{λ_{a}}{λ_{ideal}})}^{2}

(1b )

η_{F, ε} = {(\frac{λ_{rec}}{λ_{sep}})}^{2}

(1c )

where, λ_sep, λ_rec, and λ _a denote, respectively, the slopes of the PFs for speech separation, speech recognition, and the reliance weights alone. Note here that the ratio of slopes at each stage gives the cumulative effect on slope at each stage.

Ternary plots of Figure 2 show the results of the simulation where the values of η at each stage are expressed relative to the other stages. Simulated NH and HI listeners are represented by unfilled and filled symbols, respectively. In these plots the degree to which each stage contributes to the variance among listeners is given by the degree to which the data fall parallel to each axis. For example, in panel A of Figure 2, where the dominant source of listener variance is decision noise ζ, the data fall parallel to the decision access and variance in η _a due to cue reliance is negligible. This source of variance is similar for both groups and tends to obscure the real mean difference between groups in cue sensitivity, η_F_, _ε (cf. dashed curve). In panel B, where cue reliance dominates the variance, the data fall parallel to the selective attention axis, the variance in η _a is large and has little effect on the real difference in group sensitivity given along the transduction axis. In panel C, where cue sensitivity dominates, the outcome is a bit different. The data do not fall exactly parallel to the transduction axis because transduction noise ε cannot be much greater than decision noise ζ when variability in the background noise σ is large. This was the case here and is for typical cocktail-party listening experiments. The group difference in sensitivity η_F_,ε is nonetheless clear and the variance in η_F_,ε is only large for the HI group, both as expected. These are endpoint outcomes for each stage, but the outcomes quickly become more complex when more than one stage contributes significantly to the variance, when the contributing stages are different for different listeners, and/or when variation due to lapses ξ (not simulated here) is large.

Figure 2.

Model simulations. Ternary plots give relative performance efficiency η of each stage for three model scenarios: (A) variance due to decision process, (B) variance due to selective attention, and (C) variance due to transduction. Results for simulated HI and NH listeners given by filled and unfilled symbols, respectively. (see text for further details).

General Methods/ Parameter Estimation

The general methods are in principle the same as those of Lutfi et al. (2021, 2023); specific details regarding stimulus generation and procedure are repeated here. Like many speech-in-noise recognition studies, we focused on two major voice cues for the speech separation task - differences in voice fundamental frequency, F0, and azimuthal location, θ, of talkers (Bronkhorst, 2000, 2015; Kidd et al., 2008). For physical reasons alone, these two cues are most diagnostic for separating natural sound sources including speech (Lutfi, 2008). The speech stimuli were recordings of 200 naturally spoken, grammatically correct, English sentences (Zhou et al., 2021) ranging 1.6 to 2.5 s in duration. Sentence exemplars were from a single native speaker of English whose average F0, using an autocorrelation-based tracker (de Cheveigné & Kawahara, 2002), was estimated to be 170 Hz, which is roughly midway between an average male and female adult speaker. This talker was identified as “Pat”. The original sentences associated with Pat were then processed to produce the sentences of “Jon” and “Jen” differing in F0 (using MATLAB code solaf; Hejna & Musicus, 1991) and θ (using KEMAR head-related transfer functions; Gardner & Martin, 1995). This ensured that only differences in F0 and θ were viable cues for the separation task. Random, independent, and normally distributed trial-by-trial perturbations in F0 and θ for each listener were selected to simulate natural variation in voice pitch (σ_F0 = 7 Hz after Horii, 1975) and talker location (σ_θ = 7°). Note, differences in θ may have resulted in a better SNR for HI listeners in one ear. Whether HI listeners use the information in this way does not impact our estimate of the reliance weight for θ as a measure of selective attention to θ. As in real-world listening, talkers never spoke from exactly the same position with the same F0, or spoke the same sentence at the same time. Sentences were played at a 44100-Hz sampling rate with 16-bit resolution using an RME Fireface UCX audio interface. They were delivered to listeners seated in a double-wall, sound-attenuation chamber listening over Beyerdynamic DT990 headphones. Average sound level calibrated at the headphones was 72 dB SPL for both HI and NH listeners.

Listeners were presented with a series of trials in which they heard two of the three talkers speaking simultaneously. One of the talkers was always Pat, the other was equally likely to be Jon or Jen. Pat had a nominal voice pitch (F0) intermediate between Jon and Jen and was located center/front of the listener (θ = 0). Jon had a lower voice pitch than Pat (ΔF0 = -14 Hz) and was located to the left of Pat (Δθ = −14°). Jen had a higher voice pitch than Pat (ΔF0 = + 14 Hz) and was located to the right of Pat (Δθ = + 14°). These values and the random perturbations added (σ_F0 = 7 Hz and σ_θ = 7 Hz) were selected to represent typical listening situations where for both NH and HI listeners difficulties arise in recognizing speech in noise (Bronkhorst, 2000, 2015; Kidd & Colburn, 2017). In the speech separation task, listeners were instructed to ignore Pat and identify by button press on each trial whether the other talker was Jon or Jen using the voice pitch and location of Jon and Jen. Listener differences in speech separation were investigated for three major properties of the sentences; linguistic (sentences played forward or reversed), acoustic (F0 vs. θ as the only separation cue), and statistical (increase in Δ vs. decrease in σ for same signal likelihood, same $d_{ideal}^{'} = Δ / σ$ ). In the speech recognition task listeners were shown a sentence on their monitor following each trial and were instructed to report, again by button press, whether the sentence was spoken by one of the targets (Jon or Jen) or by Pat, either being equally likely. For speech recognition, data was collected only for the linguistic sentences played forward condition.

Listeners underwent a three-step voice (re)familiarization routine before each experimental block of trials. In step 1, they listened passively to a sequence of 20 trials where Jon and Jen alternately spoke a random sentence without perturbation in F0 and θ. Corresponding left-right cartoon images of Jon and Jen were projected synchronously as they spoke on the listener's monitor. In step 2, 20 trials were repeated with different spoken sentences and with projected images fixed throughout. Jon and Jen spoke in random order and listeners identified with feedback who was speaking on each trial. In step 3, task 2 was repeated with Pat added, again without perturbation in F0 and θ for any of the voices. Listeners always performed perfectly or near perfectly on practice trials. For experimental trials, perturbations in F0 and θ were added. Fixed left and right images of Jon and Jen with corresponding location and pitch labels remained projected on the listener's monitor throughout experimental trials. Data were collected in eight blocks of 50 trials per block. Each trial block corresponded to the datum for a single condition and took typically 45 to 60 minutes to complete. Listeners were allowed at their choosing to take breaks between trial blocks.

Altogether 19 NH listeners (ages 19–24 years) and 14 HI listeners (ages 23–83 years) participated in the study. Fourteen of the 19 NH listeners and all 14 HI listeners participated in the separation task. Twelve of the 19 NH listeners and 10 of the 14 HI listeners participated in the recognition task. HI listeners having a unilateral loss (S500, S513, and S514) were not included in the recognition task because of the potential to recognize sentences using their good ear alone. Hearing loss was identified as a pure-tone average (PTA) hearing level (HL) of 20 dB or greater from 0.5 to 4.0 kHz in at least one ear. Given the focus on difficulty listening in noise, 20 dB HL was chosen as the beginning of functional hearing loss rather than the more common 25 dB HL (see Humes, 2019). Averaged audiograms for each group of listeners for each ear are shown with error bars in Figure 3. All listeners presented normal tympanograms and no history of middle ear disease or surgery. All were compensated for their participation with gift cards. Informed consent was obtained from all listeners, and all procedures were followed in accordance with University of South Florida internal review board (IRB) approval.

Figure 3.

Averaged audiograms of study participants by ear and hearing status.

Results

Partitioning of Variance

Performance for the speech separation task $d_{sep}^{'}$ is plotted against ideal performance $d_{ideal}^{'}$ for both NH (panel A) and HI (panel B) listeners ordered from best to poorest performance in Figure 4. Different symbols denote the different experimental conditions as described in the caption to the figure. Dashed lines give performance of the ideal observer, and continuous lines give PFs as linear fits to the data. Note that the obtained PFs should have exactly 0 intercept, but rather than forcing the fits in this way $d_{sep}^{'} = 0$ at $d_{ideal}^{'} = 0$ was simply included as datum in the regression. The $d_{sep}^{'}$ values are presented initially instead of overall performance efficiency, η_sep, to show the range of performance levels corresponding to a constant η_sep. Overall performance efficiency for each listener is simply the square of the slopes of the PFs for each listener, $η_{sep} = λ_{sep}^{2}$ from Eqs. (1a) and (1b).

Figure 4.

Speech separation performance $d_{sep}^{'}$ vs. $d_{ideal}^{'}$ for individual NH (A) and HI (B) listeners (panels). for the three major properties of sentences, linguistic (sentences played forward or reversed, ● vs. ○), acoustic (F0 vs. θ as the only separation cue, ▴ vs. △), and statistical (increase in Δ vs. decrease in σ same signal likelihood, same $d_{ideal}^{'} = Δ / σ$ , ▪ vs. □). Dashed lines give performance of the ideal observer. Continuous lines are linear best fits to the data.

The first and most notable feature of the data is that for most listeners the properties of the speech have little if any impact on performance. The individual PFs account for 80% or more of the variance across conditions for all but a few listeners, 97% of the variance averaged across NH listeners, and 94% of the variance averaged across HI listeners. The clearly largest effect in the data is the fixed effect of individual listeners across conditions on the slope λ_sep of the PF, nearly doubling in value from the poorest to the best performing NH listener, and more than doubling in value from the poorest to the best performing HI listener. Most of the remaining variance can be attributed to a few listeners for whom a particular condition posed unusual difficulty. These conditional effects are evident for the two poorest performing NH listeners (S181 and S184) who appear to rely heavily on linguistic cues to perform the task (cf. filled and unfilled circles). Conditional effects are also evident for two HI listeners (S506 and S514) who show greater sensitivity for the location cue alone (cf. filled and unfilled triangles). Curiously, S514 is one of the three HI listeners having a unilateral hearing loss, S500 and S513 being the other two. Notwithstanding these conditional effects, the data show that when equivalent comparisons across conditions are made by expressing listener performance relative to an ideal observer, the largest source of variance in the data is not the different properties of the speech but the fixed effect of listeners on the slopes of the PFs for the different properties of the speech.

Figure 5 next shows the relation between the slopes of the PFs and the PTA thresholds for NH and HI listeners, left and right ears (panels). PTA thresholds show, at best, a weak relation to the slopes within or between groups (unfilled and filled symbols) for either ear (left and right panels). An F-TOST test fell short of supporting equivalence of the within-group variances (±25% bounds, p > 0.5), not surprising given the small number of listeners, but the similarity of these variances suggest they may have a common source. The similarity of these results for NH and HI listeners is not entirely surprising. In real-world cocktail-party listening, as simulated here, differences among talkers in voice pitch (F0) and location (θ) are large and mostly conveyed by harmonicity and interaural time differences at low frequencies (Ritsma, 1967; Stevens & Newman, 1936). This is where most of the audiometric thresholds of our HI listeners fall in the normal range (cf. Figure 3). Hence, while the speech itself for these listeners may be somewhat distorted, even unrecognizable, differences in F0 and θ between talkers are still apparently often heard.

Figure 5.

Listener fixed effect on slope λ_sep of the PF relating $d_{sep}^{'}$ to $d_{ideal}^{'}$ plotted against PTA thresholds from 0.5 to 4.0 kHz for NH and HI listeners (unfilled and filled symbols) and left and right ears (panels).

Having identified the listener fixed effect in these conditions as the primary source of listener variance unrelated to the audiogram, we consider next its contribution to the variance in the speech recognition task. Some contribution is naturally expected as speech separation is required for speech recognition, but that contribution may be small given the very different cues for recognition and the different impact hearing loss has on those cues. The question was addressed by determining the shared variance in the slopes of the PFs for the speech separation and recognition tasks using a regression analysis. Figure 6 shows the results where the slopes for the recognition task λ_rcc are plotted against those for speech separation task λ_sep for NH and HI listeners. For both groups the variance between tasks is indeed largely shared, a linear regression (dashed line) accounting for 88% of the variance within and between groups. The slope of the regression curve is greater than 1.0 such that the small effect of hearing loss evident in the mean difference between groups for separation is scaled up in the mean difference for recognition (dotted lines). To the authors’ knowledge, these data are the first to establish a direct functional relation between speech separation and speech recognition in noise for the same listeners, same stimuli, under identical listening conditions (cf. Guo et al., 2025; Holmes & Griffiths, 2023; Wasiuk, 2022). They could have turned out differently given the fundamentally different cues and processes involved in speech separation and recognition. Taken together, the results of Figures 4 to 6 suggest that the variance in speech recognition in these conditions is in large measure due to the common problem both NH and HI listeners have with speech separation.

Figure 6.

Relation between PF slopes λ for speech in noise recognition and speech separation, NH and HI listener data represented by unfilled and filled symbols, respectively. Dotted lines show that the small effect of hearing loss evident in the mean difference between groups for separation is scaled up in the mean difference for recognition.

Having now the values of λ_sep and λ_rec, and λ _a , the efficiencies ηs of stages can be computed using Eqs. 1a-b. Figure 7 shows these values expressed relative to one another as in the simulations of Figure 2. The figure shows selective attention to have the greatest overall negative impact on recognition performance; η _a ≤ η_F,e for most listeners and η _a ≤ η_ζ for all listeners, yet it accounts for little of the variability within or between groups (constant η _a given by the dashed line). According to the model, the greater source of listener variability by far is decision noise ζ, a conclusion reinforced by our simulation of the effect of decision noise in panel A of Figure 2.

Figure 7.

Obtained relative performance efficiency η of each stage of the integrative model plotted as in Figure 2.

Interaction: cue Sensitivity and Reliance

This section next provides evidence for an interaction between stages. Figure 7 showed cue reliance to have little impact on the NH and HI group difference in performance, but subsequent analyses showed a fundamental difference in the reliance weights for the two groups. Figure 8 gives these weights as relative values a_F0/(a_F0 + a _θ ) plotted against relative sensitivity $d_{F 0}^{'} / (d_{F 0}^{'} + d_{θ}^{'})$ for the two cues, where $d_{F 0}^{'}$ and $d_{θ}^{'}$ are obtained from the single cue conditions (triangles of Figure 4). For NH listeners (unfilled symbols) relative cue sensitivity and reliance differ over a narrow range centered around 0.5 with only a weak relation between the two (r = 0.38). For HI listeners (filled symbols) relative sensitivity and reliance differ widely with relative reliance nearly matching relative sensitivity over this range (r = 0.92). The difference between the Pearson-product moment correlations for HI and NH listeners by Fisher's z transform is highly significant (p ≤ 0.004, two-tailed test). This is a new result. It suggests that HI listeners compensate for their hearing loss by placing greater reliance on the cue for which they are most sensitive.

Figure 8.

Relative cue reliance plotted against relative cue sensitivity for NH and HI listeners, unfilled and filled symbols, respectively. See text for details.

This brings us to question 3 of the study. Can the information provided by the model be used to deliver a benefit for HI listeners in the task? There are two ways the information in Figure 8 might be used to make this happen, by enhancing the cue the HI listeners rely on most or by enhancing the cue they rely on least (e.g., Richardson et al., 2025). The two approaches were tested by using the single cue data of Figure 4 (filled and unfilled triangles of Figure 4). These two conditions represent the two extreme cases where ‘enhancement’ corresponds to all acoustic information for the task being shifted to either the most or least relied on cue as previously determined from the estimated reliance weights. Figure 9 shows the results of this test using the data from all 28 participants in Figure 4. The abscissa gives the values of λ_sep for the linguistic cues present condition where both F0 and θ served as equally viable cues (filled circles in Figure 4). The ordinate gives the values of λ_sep for the single cue conditions where filled and unfilled symbols give, respectively, the values for the least and most relied on cue. Note that the data are given as λ values, so if neither condition provides a benefit or a cost the data should fall on the diagonal of Figure 9. For the NH listeners (left panel) the results are mixed, the two conditions equally often producing a benefit or a cost. By comparison all but 3 of the 14 HI listeners derive a benefit when information is shifted to the preferred cue and no HI listener receives a benefit when the information is shifted to the least preferred cue, most experience a cost.

Figure 9.

Cost/benefit for HI (left panel) and NH (right panel) listeners in the speech separation task of shifting information to the least (filled symbols) vs. most (unfilled symbols) relied on cue. See text for details.

Discussion

The present study addressed a fundamental problem in audiology: identifying why many difficulties recognizing speech in noise go undetected by the audiogram. Research over the past two decades has identified likely contributing factors but has yet to offer a means for breaking down the collective influence of these factors where, as in natural listening situations, each plays some role. The goal of the present study was to provide an initial test of the feasibility of an integrative model that could serve this purpose in different listening scenarios. The results offered preliminary support. Across tasks involving the processing of the linguistic, acoustic, and statistical properties of speech, the largest source of listener variability not accounted for by the audiogram was a listener fixed effect on speech separation. A significant 88% of this variability was shared with speech recognition. The fixed effect was tied in the model to differences in the ability of listeners at a late decision stage to separate signals based on their different statistical properties. Cue reliance associated with selective attention in the model had the greatest impact overall on performance but contributed little to the individual differences in performance. Evidence for an interaction between selective attention and transduction was also found wherein HI listeners placed greater reliance on the cue for which they were most sensitive. The interaction contributed little to performance differences between HI and NH listeners, but when information was shifted to the most relied on cue HI listeners received a benefit for speech separation, NH listeners did not.

Speech separation has long been acknowledged to be an essential component of speech recognition in noise, as has the statistical properties of speech for speech separation. Cherry (1953) gave the problem its modern name, the ‘cocktail-party problem’ and described it fundamentally as a “statistical separation”. Nowadays, accounts more often appeal to principles of perceptual organization (common fate, similarity, continuity, and the like) than statistics, but these principles too can be shown to have statistical representations in computational approaches to the problem. In computational approaches source separation is accomplished by maximizing the statistical independence of the internal representation of signals (Bell & Sejnowski, 1995). This provides a framework for how listeners might achieve optimal decision-making, the basis of the present approach to integrative modeling. An opposite example applied to human source separation is the information divergence hypothesis (Lutfi, 2023; Lutfi et al., 2013). Information divergence, otherwise known as Kullback–Leibler divergence, D_KL, forms the basis for many objective functions used to maximize statistical independence (Kullback & Leibler, 1951). Where signal distributions are equal-variance Gaussian, as they are in the present study, $d_{ideal}^{'}$ collapses onto D_KL as a functionally equivalent measure of discriminability. The goodness of fits of the PFs plotted relative to $d_{ideal}^{'}$ in Figure 4 adds support for the primary role of the statistical properties of the speech in this study.

Until now, no clear functional link had been established between speech separation and speech recognition in noise using the same listeners, procedures, and stimuli (cf. Guo et al., 2025; Holmes & Griffiths, 2023; Wasiuk, 2022). No theoretical framework, moreover, had been offered for quantifying the relative impact of factors operating collectively. This study did both for a common speech-in-noise recognition task to identify the fixed effect for speech separation as the primary source of variability within diagnostic groups. Lutfi et al. (2021) reported early evidence of a listener fixed effect among NH listeners for conditions similar to the present study, although they did not model it or try to determine its relation to the audiogram. The stimuli were ABA sequences of synthesized vowels rather than the spoken sentences used here; the separation task was also different for all but one condition. Overlapping PFs for the two manipulations of each speech property were found, as well as for two different versions of the separation task. Four-point PFs also showed the random effect of replications (associated with the residual noise ζ in the present model) to be quite small compared to the fixed effect. The study did not attempt to establish a relation to speech recognition and the evidence for the fixed effect was not as compelling as different listeners participated in different conditions. The study did however replicate the failure of reliance weights to account for performance differences among listeners, a result that has now been widely documented in other studies involving different tasks and procedures for both speech and nonspeech stimuli (Lutfi et al., 2020, 2022, 2023; Lutfi & Liu, 2007).

The strong relation between cue sensitivity and cue reliance observed here for the HI listeners (Figure 7) is another new result but should come as no surprise. The HI listeners are behaving as they should, like ideal observers they are placing greatest reliance on the cue for which they are most sensitive. The consequential result is that shifting information to the most relied on cue provides a benefit in speech separation only for the HI listeners (Figure 9). This outcome could have implications for the clinic. Directional microphones in hearing aids are commonly used to improve SNR, but they can also result in tunnel hearing (Best et al., 2018; Neher et al., 2017; Rallapalli et al., 2021). The benefits are highly varied, and it is not clear whether the SNR improvement alone is sufficient for the listeners, or if they would benefit from the added spatial cues, particularly in high background noise levels (Kidd et al., 2020). Real differences in reliance weights can be determined in relatively few trials (Lutfi, 1995) and might be practically used in the clinic to identify these listeners for targeted treatment.

This was a first attempt at integrative modeling. The results are considered positive given the many possible outcomes that might have rendered the approach impractical. Future tests will be required to evaluate the ultimate utility of the approach. The present test was one case, cocktail-party listening, where stimulus uncertainty σ was high, a condition that would favor a dominant fixed effect (Kidd & Colburn, 2017, Figure2A). A different outcome would be expected if there were less stimulus uncertainty, smaller σ, or if a listener had unusually poor STR, a factor shown to be strongly correlated with speech recognition in noise (Bernstein et al., 2013). What is the relative impact of these factors, how might they interact, and how might their role change for different listeners in different listening scenarios? Integrative modeling can potentially answer these questions through the partitioning of variance associated with factors, individual correlations will not. This is the incentive for an integrative approach.

Footnotes

Acknowledgements

The authors would like to thank Gabriella Brown, Angelina Natalie, Reagan Huynh, and Alberta Tran for their assistance in collecting the data. Dr. Bernhard Laback and two anonymous reviewers provided helpful comments on an earlier version of the manuscript. This research was supported by NIDCD grant R01 DC001262-30.

ORCID iDs

Robert A Lutfi

Lindsey Kummerer

Jungmee Lee

Varsha Rallapalli

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institute for Deafness and Other Comm (grant number NIH R01 DC001262-32).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Alexander

J. M.

Lutfi

R. L.

(2004). Informational masking in hearing-impaired and normal-hearing listeners: Sensation level and decision weights. Journal of the Acoustical Society of America, 116 (4), 2234–2247. https://doi.org/10.1121/1.1784437

ASHA Task Force. (1996). Central auditory processing: Current Status of research and implications for clinical practice. American Journal of Audiology, 5 (2), 41–54. https://doi.org/10.1044/1059-0889.0502.41

Bell

A. J.

Sejnowski

T. J.

(1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159. https://doi.org/10.1162/neco.1995.7.6.1129

Berg

B. G.

(1990). Observer efficiency and weights in a multiple observation task. Journal of the Acoustical Society of America, 88(1), 149–158. https://doi.org/10.1121/1.399962

Bernstein

J. G. W.

Mehraei

Shamma

Gallun

F. J.

Theodoroff

S. M.

Leek

M. R.

(2013). Spectrotemporal modulation sensitivity as a predictor of speech intelligibility for hearing-impaired listeners. The Journal of the American Academy of Audiology, 24(4), 293–306. https://doi.org/10.3766/jaaa.24.4.5

Best

Ahlstrom

J. B.

Mason

C. R.

Roverud

Perrachione

T. K.

Kidd

Dubno

J. R.

(2018). Talker identification: Effects of masking, hearing loss, and age. Journal of the Acoustical Society of America, 143(2), 1085–1092. https://doi.org/10.1121/1.5024333

Bidelman

G. M.

Yoo

(2020). Musicians show improved speech segregation in competitive, multi-talker cocktail-party scenarios. Frontiers in Psychology, 11, 1927. https://doi.org/10.3389/fpsyg.2020.01927

Bronkhorst

A. W.

(2000). The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acustica United with Acustica, 86(1), 117–128.

Bronkhorst

A. W.

(2015). The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception, and Psychophysics, 77, 1465–1487. https://doi.org/10.3758/s13414-015-0882-9

10.

Brouwer

Van Engen

Calandruccio

Bradlow

A. R.

(2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. Journal of the Acoustical Society of America, 131(2), 1449–1464. https://doi.org/10.1121/1.3675943

11.

Brungart

D. S.

Simpson

B. D.

(2007). Cocktail party listening in a dynamic multitalker environment. Perception and Psychophysics, 69(1), 79–91. https://doi.org/10.3758/BF03194455

12.

Calandruccio

Buss

Bowdrie

(2017). Effectiveness of two-talker maskers that differ in talker congruity and perceptual similarity to the target speech. Trends in Hearing, 21, 1–14. https://doi.org/10.1177/2331216517709385

13.

Calandruccio

Buss

Hall

J. W.

(2014). Effects of linguistic experience on the ability to benefit from temporal and spectral masker modulation. Journal of the Acoustical Society of America, 135(3), 1335–1343. https://doi.org/10.1121/1.4864785

14.

Calandruccio

Doherty

K. A.

(2008). Spectral weighting strategies for hearing-impaired listeners using a correlational method. Journal of the Acoustical Society of America, 123(4), 2367–2378. https://doi.org/10.1121/1.2887857

15.

Chang

A.-C.

Lutfi

R. A.

Lee

Heo

(2016). A detection-theoretic analysis of auditory streaming and its relation to auditory masking. Trends in Hearing, 20, 1–9. https://doi.org/10.1177/2331216516664343

16.

Cherry

E. C.

(1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25(5), 975–979. https://doi.org/10.1121/1.1907229

17.

Conway

A. R.

Kane

M. J.

Bunting

M. F.

Hambrick

D. Z.

Wilhelm

Engle

R. W.

(2005). Working memory span tasks: A methodological review and user's guide. Psychonomics Bulletin Review, 12(5), 769–786. https://doi.org/10.3758/bf03196772

18.

de Cheveigné

Kawahara

(2002). YIN, a fundamental frequency estimator for speech and sounds. Journal of the Acoustical Society of America, 111(4), 1917–1930. https://doi.org/10.1121/1.1458024

19.

Fitzgerald

M. B.

Gianakas

S. P.

Qian

Z. J.

Losorelli

Swanson

A. C.

(2023). Preliminary guidelines for replacing word-recognition in quiet with speech in noise assessment in the routine audiologic test battery. Ear and Hearing, 44(6), 1584–1561. https://doi.org/10.1097/AUD.0000000000001409

20.

Gardner

W. G.

Martin

K. D.

(1995). HRTF Measurements from a KEMAR. Journal of the Acoustical Society of America, 97(6), 3907–3908. https://doi.org/10.1121/1.412407

21.

Green

D. M.

Swets

J. A.

(1966). Signal Detection Theory and Psychoacoustics. Wiley.

22.

Guo

Benzaquen

Holmes

Berger

J. I.

Bruhl

I. C.

Sedley

Rushton

S. P.

Griffiths

T. D.

(2025). Predicting speech-in-noise ability with static and dynamic figure–ground paradigms using structural equation modelling. Proceedings of the Royal Society B: Biological Sciences, 292(2042), 1–12. https://doi: https://doi.org/10.1098/rspb.2024.2503

23.

Helfer

K. S.

Bartlett

E. L.

Popper

A. N.

Fay

R. R.

(2020). Aging and Hearing: Causes and Consequences, Springer Handbook of Auditory Research. Springer.

24.

Holmes

Griffiths

T. D.

(2023). Predicting speech-in-noise ability in normal and impaired hearing based on auditory cognitive measures. Scientific Reports, 17, 1077344. https://doi.org/10.3389/fnins.2023.1077344

25.

Horii

(1975). Some statistical characteristics of voice fundamental frequency. Journal of Speech and Hearing Research, 18(1), 192–201. https://doi.org/10.1044/jshr.1801.192

26.

Humes

L. E.

(2019). Examining the validity of the world health organization's long-standing hearing impairment grading system for unaided communication in age-related hearing loss. American Journal of Audiology, 28(3S), 810–818. https://doi.org/10.1044/2018_AJA-HEAL18-18-0155

27.

Humes

L. E.

(2021). An approach to self-assessed auditory wellness in older adults. Ear and Hearing, 42(4), 745–761. https://doi.org/10.1097/AUD.0000000000001001

28.

Jesteadt

Valente

D. L.

Joshi

S. N.

Schmid

K. K.

(2014). Perceptual weights for loudness judgments of six-tone complexes. Journal of the Acoustical Society of America, 136(2), 728–735. https://doi.org/10.1121/1.4887478

29.

Katz

(1994). Handbook of Clinical Audiology. Williams and Wilkins.

30.

Keilman

Limberger

Mann

W. J.

(2007). Psychological and physical well-being in hearing-impaired children. International Journal of Pediatric Otorhinolaryngology, 71(11), 1747–1752. https://doi.org/10.1016/j.ijporl.2007.07.013

31.

Keith

(1990). Ear and Hearing, 11(5), Special Supplement.

32.

Kidd

Colburn

(2017). Informational masking in speech recognition. In Middlebrooks

J. C.

Simon

J. Z.

Popper

A. N.

Fay

R. R.

(Eds.), Springer handbook of auditory research: The auditory system at the cocktail party (pp. 75–110). Springer-Verlag.

33.

Kidd

Jennings

T. R.

Byrne

A. J.

(2020). Enhancing the perceptual segregation and localization of sound sources with a triple beamformer. The Journal of the Acoustical Society of America, 148(6), 3598–3611. https://doi.org/10.1121/10.0002779

34.

Kidd

Mason

C. R.

Richards

V. M.

Gallun

F. J.

Durlach

N. I.

(2008). Informational masking. In Yost

W. A.

Popper

A. N.

(Eds.), Springer handbook of auditory research: Auditory perception of sound sources (pp. 143–190). Springer-Verlag.

35.

Kidd

Mason

C. R.

Swaminathan

Roverud

Kameron

C. K.

Best

(2016). Determining the energetic and informational components of speech-on-speech masking. Journal of the Acoustical Society of America, 140(1), 132–144. https://doi.org/10.1121/1.4954748

36.

Kobel

LePrell

C. G.

Liu

Hawks

J. W.

Bao

(2017). Noise-induced cochlear synaptopathy: Past findings and future studies. Hearing Research, 349(2017), 148–154. https://doi.org/10.1016/j.heares.2016.12.008

37.

Kujawa

S. G.

Liberman

M. C.

(2009). Adding insult to injury: Cochlear nerve degeneration after ‘temporary’ noise-induced hearing loss. Journal of Neuroscience, 29(45), 14077–14085. https://doi.org/10.1523/JNEUROSCI.2845-09.2009

38.

Kullback

Leibler

R. A.

(1951). On Information and Sufficiency . Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694

39.

Kumar

Amen

Roy

(2007). Normal hearing tests: Is a further appointment really necessary? Journal of the Royal Society of Medicine, 100(2), 66. https://doi.org/10.1177/014107680710000212

40.

Lee

Long

(2012). Stimulus characteristics which lessen the impact of threshold fine structure on estimates of hearing status. Hearing Research, 283(1–2), 24–32. https://doi.org/10.1016/j.heares.2011.11.011

41.

Lentz

J. J.

Leek

M. R.

(2002). Decision strategies of hearing-impaired listeners in spectral shape discrimination. Journal of the Acoustical Society of America, 111(3), 1389–1398. https://doi.org/10.1121/1.1451066

42.

Lopez-Poveda

E. A.

(2014). Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech. Frontiers in Neuroscience, 8(2014), 348. https://doi.org/10.3389/fnins.2014.00348

43.

Lutfi

R. A.

(1995). Correlation coefficients and correlation ratios as estimates of observer weights in multiple-observation tasks. Journal of the Acoustical Society of America, 97(2), 1333–1334. https://doi.org/10.1121/1.412177

44.

Lutfi

R. A.

(2023). The ideal observer: The forgotten second half of signal detection theory. Proceedings of Meetings on Acoustics, 153(3_Supplement), A116. https://doi.org/10.1121/10.0018350

45.

Lutfi

R. A.

Gilbertson

Chang

A.-C.

Stamas

(2013). The information-divergence hypothesis of informational masking. Journal of the Acoustical Society of America, 134(3), 2160–2170. https://doi.org/10.1121/1.4817875

46.

Lutfi

R. A.

Liu

C. J.

(2007). Individual differences in source identification from synthesized impact sounds. Journal of the Acoustical Society of America, 122(2), 1017–1028. https://doi.org/10.1121/1.2751269

47.

Lutfi

R. A.

Pastore

Rodriguez

Lee

Yost

W. A.

(2022). Molecular analysis of individual differences in talker search at the cocktail party. Journal of the Acoustical Society of America, 152(3), 1804–1813. https://doi.org/10.1121/10.0014116

48.

Lutfi

R. A.

Rodriguez

Lee

(2021). The listener effect in multitalker speech segregation and talker identification. Journal of the Acoustical Society of America, 148(6), 4014–4024. https://doi.org/10.1121/10.0002961

49.

Lutfi

R. A.

Rodriguez

Lee

Pastore

(2020). A test of model classes accounting for individual differences in the cocktail-party effect. Journal of the Acoustical Society of America, 148(6), 4014–4024. https://doi.org/10.1121/10.0002961

50.

Lutfi

R. A.

Zandona

Lee

(2023). Simultaneous, relative cue reliance in speech-on-speech masking. Journal of the Acoustical Society of America, 122(2), 2530–2538. https://doi.org/10.1121/10.0021874

51.

Mealings

Yeend

Valderrama

J. T.

Gilliver

Pang

Heeris

Jackson

(2020). Discovering the unmet needs of people with difficulties understanding speech in noise and a normal or near-normal audiogram. American Journal of Audiology, 29(3), 329–355. https://doi.org/10.1044/2020_AJA-19-00093

52.

Neely

S. T.

Jesteadt

(2005). Quadratic-compression model of auditory discrimination and detection. Acta Acustica United with Acustica, 91(6), 980–991.

53.

Neher

Wagener

K. C.

Latzel

(2017). Speech reception with different bilateral directional processing schemes: Influence of binaural hearing, audiometric asymmetry, and acoustic scenario. Hearing Research, 353, 36–48. https://doi.org/10.1016/j.heares.2017.07.014

54.

Nielsen

(2024). Understanding the Artificial Intelligence Revolution in Hearing Healthcare Delivery, Audiology Online, 29121.

55.

Pang

Beach

E. F.

Gilliver

Yeend

(2019). Adults who report difficulty hearing speech in noise: An exploration of experiences, impacts and coping strategies. International Journal of Audiology, 58(12), 851–860. https://doi.org/10.1080/14992027.2019.1670363

56.

Pienkowski

(2017). On the etiology of listening difficulties in noise despite clinically normal audiograms. Ear and Hearing, 38(2), 135–148. https://doi.org/10.1097/AUD.0000000000000380

57.

Plack

C. J.

Barker

Prendergast

(2014). Perceptual consequences of “hidden” hearing loss. Trends in Hearing, 18, 1–11. https://doi.org/10.1177/2331216514550621

58.

Plack

C. J.

Léger

(2016). Toward a diagnostic test of “hidden” hearing loss. Trends in Hearing, 18, 1–11. https://doi.org/10.1177/2331216516657466

59.

Ritsma

R. J.

(1967). Frequencies dominant in the perception of the pitch of complex sounds. Journal of the Acoustical Society of America, 42(1), 191–198. https://doi.org/10.1121/1.1910550

60.

Roverud

Dubno

J. R.

Kidd

(2020). Hearing-impaired listeners show reduced attention to high-frequency information in the presence of low-frequency information. Trends in Hearing, 24. https://doi.org/10.1177/2331216520945516

61.

Roverud

Dubno

J. R.

Richards

V. M.

Kidd

(2021). Cross-frequency weights in normal and impaired hearing: Stimulus factors, stimulus dimensions, and associations with speech recognition. Journal of the Acoustical Society of America, 150(4), 1–17. https://doi.org/10.1121/10.0006450

62.

Sanchez-Lopez

Nielsen

S. G.

Jepsen

M. L.

Dau

Santurette

(2020). Auditory tests for characterizing hearing deficits: The BEAR test battery. Ear and Hearing, 41(1), 25–36. https://doi.org/10.1097/AUD.0000000000000740

63.

Shinn-Cunningham

(2017). Cortical and sensory causes of individual differences in selective attention ability among listeners with normal hearing thresholds. Journal of Speech, Language, and Hearing Research, 60(10), 2976–2988. https://doi.org/10.1044/2017_JSLHR-H-17-0080

64.

Shinn-Cunningham

B. G.

Best

(2008). Selective attention in normal and impaired hearing. Trends in Amplification, 12(4), 283–299. https://doi.org/10.1177/1084713808325306

65.

Smith

M. L.

Winn

M. B.

Fitzgerald

M. B.

(2024). A large-scale study of the relationship between degree and type of hearing loss and recognition of speech in quiet and noise. Ear and Hearing, 45(4), 915–928. https://doi.org/10.1097/AUD.0000000000001484

66.

Souza

Gallun

Wright

(2020). Contributions to speech-cue weighting in older adults with impaired hearing. Journal of Speech, Language, and Hearing Research, 63(1), 334–344. https://doi.org/10.1044/2019_JSLHR-19-00176

67.

Souza

Wright

Blackburn

M. C.

Tatman

Gallun

F. J.

(2015). Individual sensitivity to spectral and temporal cues in listeners with hearing impairment. Journal of Speech, Language, and Hearing Research, 58(2), 520–534. https://doi.org/10.1044/2015_JSLHR-H-14-0138

68.

Stevens

S. S.

Newman

E. B.

(1936). The localization of actual sources of sound. American Journal of Psychology, 48(2), 297–306. https://doi.org/10.2307/1415748

69.

Tanner

W. P.

Birdsall

T. G.

(1958). Definitions of d′ and η as psychophysical measures. Journal of the Acoustical Society of America, 30(10), 922–928. https://doi.org/10.1121/1.1909408

70.

Tremblay

K. L.

Pinto

Fischer

M. E.

Klein

B. E.

Klein

Levy

Cruickshanks

K. J.

(2015). Self-reported hearing difficulties among adults with normal audiograms: The beaver dam offspring study. Ear and Hearing, 36(6), e290–e299. https://doi.org/10.1097/AUD.0000000000000195

71.

Wasiuk

(2022). The Importance of Glimpsed Audibility for Speech-in-Speech. PhD Dissertation, Case Western Reserve University.

72.

Zhao

Stephens

(2007). A critical review of King-Kopetzky syndrome: Hearing difficulties, but normal hearing?Audiological Medicine, 5(2), 119–125. https://doi.org/10.1080/16513860701296421

73.

Zhou

Sisman

Liu

(2021). Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 920–924).

Integrative Modeling of Individual Differences Recognizing Speech in Noise by Hearing-Impaired Adults

Abstract

Keywords

Introduction

The Integrative Model

Model Application and Simulation

General Methods/ Parameter Estimation

Results

Partitioning of Variance

Interaction: cue Sensitivity and Reliance

Discussion

Footnotes

Acknowledgements

ORCID iDs

Data Availability Statement

Funding

Declaration of Conflicting Interests

References