Abstract
Stuart Gatehouse was one of the pioneers of cognitive hearing science. The ease of language understanding (ELU) model (Rönnberg) is one example of a cognitive hearing science model where the interplay between memory systems and signal processing is emphasized. The mismatch notion is central to ELU and concerns how phonological information derived from the signal, matches/mismatches phonological representations in lexical and semantic long-term memory (LTM). When signals match, processing is rapid, automatic and implicit, and lexical activation proceeds smoothly. Given a mismatch, lexical activation fails, and working or short-term memory (WM/STM) is assumed to be invoked to engage in explicit repair strategies to disambiguate what was said in the conversation. In a recent study, negative long-term consequences of mismatch were found by means of relating hearing loss to episodic LTM in a sample of old hearing-aid wearers. STM was intact (Rönnberg et al.). Beneficial short-term consequences of a binary masking noise reduction scheme on STM was obtained in 4-talker babble for individuals with high WM capacity, but not in stationary noise backgrounds (Ng et al.). This suggests that individuals high on WM capacity inhibit semantic auditory distraction in 4-talker babble while exploiting the phonological benefits in terms of speech quality provided by binary masking (Wang). Both long-term and short-term mismatch effects, apparent in data sets including behavioral as well as subjective (Rudner et al.) data, need to be taken into account in the design of future hearing instruments.
Introduction
Stuart Gatehouse was a true pioneer of cognitive hearing science. He forcefully argued that cognition plays a vital role in hearing, especially when it comes to the interaction between signal processing in hearing aids and cognitive function, and that this should be reflected in the field of audiology.
This article is about cognitive hearing science. Cognitive hearing science recognizes the impact of cognitive functions on hearing and speech understanding. When Gatehouse made his largest contributions to the area, it had not yet been named. Nevertheless, he was a pioneer in demonstrating the importance of acknowledging individual differences in certain cognitive skills when it came to fitting of different rationales for signal processing in hearing aids (see Gatehouse, Naylor, & Elberling, 2003, 2006a, 2006b).
The core research question of cognitive hearing science is the nature of the interaction between bottom-up and top-down processes that promote understanding in a variety of communication conditions (Arlinger, Lunner, Lyxell, & Pichora-Fuller, 2009). The interaction can be manifest at different levels in the brain, from the peripheral organ to the cortex; communication may be in any modality, including both spoken and signed languages as well as their visual and tactile derivatives. The focus of this article is on memory systems and how memory systems help us understand the cognitive side of dealing with signal processing in hearing aids and listening in the real world.
Memory Systems
Memory is a multifaceted concept including a number of well-defined and researched subsystems that can be described within a unifying framework (e.g., Squire, 2009). The fundamental division is between short-term memory (STM) and long-term memory (LTM). STM is characterized by its limited capacity and brief duration, whereas LTM is more capacious and of longer duration. LTM can be usefully subdivided into episodic and semantic memory.
In general, when a speech signal is made audible via amplification, effortless, automatic, fast, and precise understanding is promoted (i.e., implicit processing, see further under ELU model). It is no easy task, however, to make sounds audible without adding distortions. Modern hearing aids use wide dynamic range compression (WDRC) amplification, which amplifies weak input sounds more than loud input sounds. Thus, gain regulation systems may cause distortions of the original speech cues in different ways and may themselves generate a need for extra use of explicit WM resources by the listener (see further under ELU model). If the regulation of gain during rapid drops in the input level is too slow (long release time in the compressor), weak speech cues may be underamplified, and speech cues will be less salient. Likewise, if the regulation of gain during fast increases in the input level is too slow (long attack time), sounds may be too loud, attracting unnecessary attention, and thus consuming WM resources. However, if the regulation system is fast, secondary speech sources may also introduce artifacts in the regulation.
Translated into the perceptual challenges that a hearing impaired person faces, we will in this article report cognitive data that are pertinent to online, short-term (“here” and “now”) processing of speech in noise, for example, showing the importance of individual WM capacity under adverse conditions (Foo, Rudner, Rönnberg, & Lunner, 2007; Rudner, Foo, Rönnberg, & Lunner, 2009), especially in fluctuating noise backgrounds with fast-acting compression in the hearing aids (Lunner & Sundewall-Thorén, 2007; Rönnberg, Rudner, Lunner, & Zekveld, 2010). We will also show, as reported in a recent study, that noise reduction signal processing (i.e., binary masking) enhances memory recall of sentences presented in a competing talker background for persons with high WM capacity, even when perceptual accuracy has been accounted for (Ng, Rudner, Lunner, & Rönnberg, 2010). We start (see Empirical Studies) by reporting on long-term (“there” and “then”) negative consequences of hearing loss over a period of years, where we have shown that semantic and episodic LTM representations fade as a result of long-term hearing impairment, despite the use of existing hearing-aid technology (Rönnberg et al., 2011). This suggests that hearing-aid technology needs to be improved to prevent cognitive decline and ultimately the risk of dementia (Lin et al., 2011).
These examples show the significance of addressing STM/WM and LTM systems during communication under adverse conditions as well as the importance of measuring top-down cognitive processing abilities in addition to bottom-up sensory and perceptual processing abilities when the outcomes of hearing interventions are being considered.
Memory Systems Seen From a Communicative Perspective
The approach we have taken to memory systems at Linnaeus Centre HEAD in Linköping, Sweden, is from the perspective of how they contribute to facilitation or enrichment of poorly perceived elements of language. The general assumption is that a hearing impairment, a noisy environment, or even the signal processing in the hearing aid may push the perceptual system to rely more on knowledge-driven, compensatory storage and processing capacities of the individual hearing-aid user. The main reason is that under such conditions, the input signal does not automatically activate knowledge stored in semantic LTM, thus hindering lexical access. This is why we assume a continuous interaction between perception of heard linguistic input, semantic LTM, and WM, where WM plays the role of filling in missing pieces of information. This interaction is spelled out in the ease of language understanding (ELU) model.
The ELU Model
The ELU model (Rönnberg, 2003; Rönnberg, Rudner, & Foo, 2010; Rönnberg, Rudner, Foo, & Lunner, 2008) is about the flow (and bottlenecks) of information processing under adverse speech-understanding conditions. Skilled performance is characterized by
Thus, we have observed that high lexical access speed, phonological skills, and complex WM capacity generally are useful predictors of lip-reading skill as well as audiovisual, auditory, and tactile speech understanding (Andersson, Lyxell, Rönnberg, & Spens, 2001a, 2001b; Lunner, 2003; Lyxell et al., 1996; Lyxell & Rönnberg, 1987, 1989, 1991, 1992, 1993; Lyxell, Rönnberg, & Samuelsson, 1994; Rönnberg, 1990, 1993; Rönnberg, Andersson, Lyxell, & Spens, 1998; Rönnberg, Arlinger, Lyxell, & Kinnefors, 1989; Rönnberg et al., 1999; see review in Rönnberg, 2003).
The ELU model (see Figure 1) starts at a linguistic-cognitive level by assuming that processing of spoken input involves RApid Multimodal Binding of PHOnology (RAMBPHO, for neural correlates of binding sign and speech tokens, see Rudner, Fransson, Nyberg, Ingvar, & Rönnberg, 2007). The term

The ease of language understanding (ELU) model
If there is a mismatch between incoming linguistic stimuli, processed in a RAMBPHO mode, and phonological representations in semantic LTM, then lexical activation fails, and explicit, WM-based, inference-making processes are assumed to come into play to reconstruct what was said. The phonological mismatch may be due to the type of hearing impairment, unfavorable signal-to-noise ratios (SNRs), distortions created by the signal processing in the hearing aid, or from excessive demands on cognitive processing speed. Empirically, we have focused on rather straightforward manipulations of compression release parameters, switching from either habitually used preexperimental settings to other experimental settings (Foo et al., 2007) or training with one setting and then testing with another (Rudner et al., 2009). However, the parametric detail of what changes in, for example, signal processing (and its effects on phonological processing) constitute the minimal changes to elicit mismatch with phonological representations in semantic LTM remain to be investigated (see further under Mismatch and WM).
As is illustrated in Figure 1, after an initial mismatch, explicit WM resources are invoked and used—sometimes after successive retrievals of semantic LTM knowledge—to infer and reconstruct what was uttered in, for example, the dialogue. The timescale of WM-based storage and processing is in seconds, running in parallel with the much faster implicit and automatic processes (in milliseconds). By implication, there is a processing economy built into the assumption of different modes of processing information, the implicit and explicit processing. Implicit processing is needed to unload the cognitive system until focused attention and explicit resources are recruited. If there is a match, however, processing runs smoothly by successive lexical activations and grammatical coconstruction (Rönnberg et al., 2008).
To extract meaning from a signal in noise, there is always an interaction between the implicit, automatic, and rapid perceptual processes and the more controlled, slow explicit processes (Rönnberg, 2003; Rönnberg et al., 2008). In other words, there is always an interaction and a relationship between WM/STM and semantic LTM, depending on the listener and the listening task. Context, talker, and dialogue characteristics set the frame for how the ratio of explicit over implicit processes may vary from time to time during discourse. The time scale of this change in ratio may be down to less than 400 ms because of the upper time limits of lexical access, that is, the time window for lexical access varies between 200 to 400 ms, during which a match/mismatch occurs (Stenfelt & Rönnberg, 2009). This in its turn demands signal processing in hearing instruments that rapidly and adaptively follows the ratio function over time. Thus, apart from amplification, compression speed, and noise reduction issues, optimization of the hearing instrument may have to take into account factors such as the WM capacity and lexical processing speed of the individual. Also, direct neurophysiological estimates of cognitive load can be one part of a solution to how a future cognitive hearing aid should adapt its signal processing to the fluctuations in the explicit/implicit ratio function (see, for example, Lunner, Rudner, & Rönnberg, 2009).
Language specificity
We have shown in several studies comparing WM for signed and spoken language that there are sign-specific cortical representations in parietal areas and that there are speech-specific WM areas as well (Rudner et al., 2007; Rönnberg, Rudner, & Ingvar, 2004). Interestingly, however, there are no indications that inferior frontal areas subserving phonological analysis differ between the languages (Rönnberg et al., 2004 for Swedish sign language [SSL]; compare. Petitto et al., 2000 for American sign language [ASL]; MacSweeney, Waters, Brammer, Woll, & Goswami, 2008 for British sign language [BSL]). This pattern of data—translated into the language of the ELU model—suggests that the RAMBPHO function operates at a relatively abstract level, common to both sign and speech. Similar neural correlates in other relatively implicit and automatic linguistic functions, such as semantic retrieval (Emmorey et al., 2003 for ASL) and grammatical construction (i.e., Japanese sign language [JSL] compared with spoken Japanese; Sakai, Tatsuno, Suzuki, Kimura, & Ichida, 2005; see also MacSweeney et al., 2006 for BSL), have also been empirically demonstrated. Similarities in cortical activation of lipread, audiovisual as well as tactilely mediated speech speak to the same possibility (e.g., Balk et al., 2010; Beauchamp, Yasar, Frye, & Ro, 2008; Calvert, Campbell, & Brammer, 2000; Lee, Truy, Mamou, Sappey-Marinier, & Giraud, 2007; Levänen, 1998; MacSweeney et al., 2004; see also Okada & Hickok, 2009). Thus, there seems to be no language or modality specificity for the implicit RAMBPHO function of language processing, excluding primary sensory processing differences, whereas language specificity is manifest in slower, explicit kinds of language processing such as WM.
Mismatch and WM
We have tested the mismatch assumption by changing the signal processing in the experimental hearing aid relative to the processing habitually used in a hearing aid (Foo et al., 2007) or by means of an intervention period where the participants have become acclimatized to a certain kind of signal processing but become tested with another signal-processing algorithm (i.e., compression speed manipulations; Rudner et al., 2009). The prediction based on the ELU model is that in conditions of mismatch, an increased dependence on explicit, WM-based processes should occur, whereas in match conditions no dependence on WM is expected. This basic prediction has been confirmed and also generalized to another Scandinavian language, Danish (Rudner, Foo, Sundewall Thorén, Lunner, & Rönnberg, 2008).
One further piece of evidence makes this very clear: Variance in aided speech recognition in noise performance has been shown in multiple regression analyses to be mainly explained by WM performance as the predictor variable (tested by the reading span test, tapping both the storage and processing aspects of WM; see, for example, Andersson et al., 2001a; Rönnberg et al., 1989) in
Mismatch may still pertain in noisy situations for persons with aided hearing even after acclimatization to signal processing and it seems that the ability to capitalize on the potential benefits of signal processing such as those offered by WDRC when listening in modulating noise also are contingent on WM capacity (Gatehouse et al., 2003; Lunner & Sundewall-Thorén, 2007; Rudner et al., 2011a). When persons with hearing impairment listen to speech in modulating noise at low SNRs without their hearing aids, good WM capacity makes a significant contribution to their ability to report what they hear (Rudner, Rönnberg & Lunner, 2011a). These effects were independent of degree of peripheral hearing loss.
Thus, again we find that top-down, explicit processing mechanisms play a crucial role in driving speech recognition and understanding also under adverse conditions that are independent of the phonological mismatch engendered by hearing-aid signal-processing manipulations. Therefore, it is important for the hearing-aid industry to realize that cognition counts as a relatively general determinant for listening to adverse speech in noise conditions as well as the particular conditions created by manipulations of different types of signal processing. We will describe three recent studies that pertain to mismatch, memory systems, and effort.
Empirical Studies
Study 1: Long-Term Mismatch Effects on Memory Systems
The mismatch function of the ELU model is about the short-term function of mismatch, namely, the importance of being able to switch to an explicit mode of information processing at different points in time of a conversation. The capacity to compensate for mismatch is dependent on the capacity of, for example, WM, the speed with which the lexicon is activated and the quality of the phonological representations in LTM.
In a recent study (Rönnberg et al., 2011), we studied long-term—as opposed to short-term—consequences of mismatch. This was accomplished by studying the relationships between degree of hearing loss and memory performance in a subsample (
ELU predictions
As phonological representations belong to semantic LTM, we assumed that the status of semantic LTM would not be affected by mismatch because semantic LTM is always used in the matching process. Nevertheless, the prediction for
Imagine the number of times per day that you actually encode and retrieve information from episodic LTM: Perhaps, you activate your lexicon (i.e., a portion of semantic LTM) 30,000 times per day; with mismatches, perhaps you only successively can unlock the lexicon say 20,000 times per day. With a relatively smaller number of lexical activations, episodic LTM will be less engaged, trained, and maintained. This relative difference, then, causes a relative
WM or STM is on the same account
Tests
Episodic memory was indexed by recall of (a) subject-performed tasks (SPTs, two-word imperatives were printed on index cards, the participants enacted the imperatives, one action per 8 s; free recall was oral for the duration of 2 min), (b) sentence recall (text + auditory presentation/encoding of imperative; free recall as for SPTs), and (c) auditorily presented word lists at a rate of one word every 2 s. Participants took the tests with their hearing aids switched on. Episodic LTM was operationally defined by a lag of >7 items intervening between presentation and recall of a certain item (including presentations and recall of other items, see Tulving & Colotla, 1970). Consequently, episodic STM was defined as having a lag of 7 or less. Semantic LTM was indexed by word fluency (i.e., initial letter fluency) and vocabulary.
Results
Structural equation modeling (SEM) demonstrated that (a) episodic LTM and semantic LTM are negatively associated with degree of hearing loss, whereas STM is not, (b) semantic LTM is strongly related to episodic LTM, and (c) age and hearing loss contribute independently to episodic LTM deficits. Thus, the ELU model receives support on all points, apart from the effect of hearing loss on semantic LTM.
Alternative accounts
Semantic LTM deficits may be predicted by an account that assumes a successive deterioration of phonological representations that influence phonological neighborhoods with increasing age (e.g., Luce & Pisoni, 1998; Sommers, 1996). The interesting part here is that our SEM models show that not only age but also the hearing loss per se contributes to a deterioration of semantic LTM. This result has been shown before with more profound impairments (Andersson, 2002; Lyxell et al., 2009) but is now shown for moderate hearing loss as well.
The episodic LTM results may also be predicted by an
It should be noted that we did not evaluate the hearing aids per se in the current study and two comments are important to make. First, we actually have a conservative estimate of the effects of hearing impairment in the current study as all participants wore hearing aids which presumably compensate to some degree for the hearing loss—and still we find negative effects related to hearing impairment. Second, a proper evaluation of a hearing aid in this context needs a well-matched group of nonusers, where each hearing-aid user has a “twin” nonuser, matched for hearing loss, IQ, gender, and schooling. Only then can we approach a conclusion regarding the potential effects of hearing aids per se.
Study 2: Immediate Memory Effects: Noise Reduction and STM
In a recent study by Sarampalis, Kalluri, Edwards, and Hafter (2009), a noise reduction algorithm (Ephraim & Malah, 1984) was found to provide beneficial effects for participants with normal hearing. In a study by Ng et al. (2010) on 20
The procedure involved two steps. First, the participants listened to Swedish Hearing In Noise Test (HINT) sentences (Hällgren, Larsby, & Arlinger, 2006) at an individually adapted Speech reception threshold (SRT) of 95% in different background conditions (stationary noise and 4-talker babble) with different signal-processing algorithms applied and completed a perceptual speech recognition task: that is, they had to repeat the final word (to verify audibility and speech recognition) of each sentence immediately after listening to each sentence. Second, the participants recalled, in any order, as many final words as possible from a set of eight sentences after the set had been completed.
Results showed that free recall memory performance was lowered by a competing talker background but that this effect was less for persons with high WM capacity when noise reduction was applied. This effect was shown to pertain to STM recency positions (i.e., the last two positions in a list). Thus, individuals with high WM capacity seem to be able to exploit noise reduction to enhance cognitive performance.
Study 3: WM and Effort
Explicit cognitive processing mechanisms are by definition conscious and may be perceived as effortful. Thus, the perceived effort of listening in adverse conditions may be informative as to the degree of explicit demands on WM. Although the subjectively rated effort involved in listening to speech in noise increases with decreasing SNR, there is no linear relation between rated effort and WM (Zekveld, Kramer, & Festen, 2010). Rönnberg (2003) postulated that the contribution of explicit cognitive processing mechanisms to ease of language understanding can be described as a U-shaped function, whereby the greatest contribution is at moderate levels of listening challenge, where there is room for interaction between the level of clarity of the input signal and WM capacity. Too much or too little information extracted from the signal will leave less room for the bottom-up and top-down processes to interact optimally. Thus, it might be expected that perceived effort would correlate with WM capacity under moderately difficult listening conditions but not necessarily under more or less adverse conditions. In a recent study, we found that WM, rather than influencing relative rating of effort between different SNRs, modulated the relative rating of effort between different types of noise (Rudner, Lunner, Behrens, Sundewall Thorén & Rönnberg, 2011b).
There is a growing interest within the hearing-aid industry in using perceived listening effort as a measure of the efficacy of hearing aids. Thus, it is important to understand the relationship between listening effort and cognitive measures and how they relate to the ability to understand speech in noise. The ELU model provides a framework for understanding these relationships.
Summary and Future Challenges for the Hearing-Aid Industry
The findings from the Rönnberg et al. (2011), the Ng et al. (2010), and the Rudner et al. (2011b) studies suggest that there are several cognitive hearing science challenges for the hearing-aid industry:
To improve signal-processing options for individuals such that negative
To improve such options for the
To improve subjective measurements and measurement paradigms of listening effort such that the
At a theoretical level, it can be stated that the ELU model provides a framework for understanding (1) to (3) above. However, future studies will in even more detail target the subcomponents of explicit WM processes (e.g., inhibition, switching) to come to grips with the mechanism(s) most sensitive for short-term and long-term mismatch effects on memory systems and how to measure these effects both behaviorally and subjectively.
At a more general clinical and industrial level, all three studies (Ng et al., 2010; Rönnberg et al., 2011; Rudner et al., 2011b) collectively suggest an importance for the industry to look beyond pure speech recognition in noise measures as these speech-recognition measures may underestimate or misrepresent the consequences of hearing loss and signal processing in hearing instruments for memory load, deterioration of LTM systems, and perceived effort. Furthermore, the results of these three studies indicate that cognitive consequences should always be taken into account before new hearing-aid signal-processing algorithms are introduced.
Footnotes
The article is based on an invited talk by the first author, The Stuart Gatehouse Memorial lecture, at IHCON, Lake Tahoe, California in August 2010.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The authors disclosed that they received the following support for their research and/or authorship of this article: The research was supported by a Linnaeus Centre HEAD grant (349-2007-8654) from the Swedish Research Council to the first author.
