Sage Journals: Discover world-class research

Abstract

The reduction in spectral resolution by cochlear implants oftentimes requires complementary visual speech cues to facilitate understanding. Despite substantial clinical characterization of auditory-only speech measures, relatively little is known about the audiovisual (AV) integrative abilities that most cochlear implant (CI) users rely on for daily speech comprehension. In this study, we tested AV integration in 63 CI users and 69 normal-hearing (NH) controls using the McGurk and sound-induced flash illusions. To our knowledge, this study is the largest to-date measuring the McGurk effect in this population and the first that tests the sound-induced flash illusion (SIFI). When presented with conflicting AV speech stimuli (i.e., the phoneme “ba” dubbed onto the viseme “ga”), we found that 55 CI users (87%) reported a fused percept of “da” or “tha” on at least one trial. After applying an error correction based on unisensory responses, we found that among those susceptible to the illusion, CI users experienced lower fusion than controls—a result that was concordant with results from the SIFI where the pairing of a single circle flashing on the screen with multiple beeps resulted in fewer illusory flashes for CI users. While illusion perception in these two tasks appears to be uncorrelated among CI users, we identified a negative correlation in the NH group. Because neither illusion appears to provide further explanation of variability in CI outcome measures, further research is needed to determine how these findings relate to CI users’ speech understanding, particularly in ecological listening conditions that are naturally multisensory.

Keywords

cochlear implant multisensory integration McGurk illusion audiovisual speech visual bias

Introduction

Naturalistic oral communication is typically an audiovisual (AV) experience that reveals striking perceptual benefits in speech comprehension when visual cues are present (Sumby & Pollack, 1954). In contrast, for cochlear implant (CI) users, talking on the phone is a difficult, and oftentimes insurmountable, challenge—one that requires parsing auditory-only speech signals that are conveyed by the CI at a greatly reduced spectral resolution. Fortunately, visual input can improve listening thresholds (Barone and Deguine, 2011; Barone et al., 2010; Grant and Seitz, 2000), and for many CI users, AV conversations are a necessity.

AV integration is the process of filtering and combining information from the different senses, like auditory speech and visual articulations, in order to create a more accurate percept than either sense could on its own. Each sensory modality encodes complementary information that is increasingly useful when the saliency/interpretability of the individual signals is relatively low. While typical, acoustic listeners may only experience the benefits of AV integration in, for example, a very noisy restaurant, electrical hearing through a CI may in itself introduce enough “noise” to necessitate the integration of visual cues as a critical compensatory strategy. In effect, many CI users rely on AV integration to successfully understand speech in most listening environments, not just particularly noisy ones. As a result, visual orofacial articulations play a crucial role in verbal communication both before and after cochlear implantation, and in order to fully describe aural speech recovery following implant surgery, characterization of both unisensory and multisensory processing is necessary.

Despite a daily reliance on the multisensory integration of speech signals, AV testing is not a part of routine audiological exams. Furthermore, as a clinical population, there are substantial individual differences (e.g., onset of hearing loss, etiology, duration of deafness, and age of implantation) that contribute to variable auditory outcome measures (Blamey et al., 2013) and may also have cascading effects on AV integration.

Arguably, the index of multisensory processing that we know the most about in CI users is the McGurk illusion (McGurk & MacDonald, 1976). First described over 40 years ago, this illusion results from hearing a bilabial syllable (e.g., “ba” or “pa”) while seeing the visually ambiguous articulation of a velar syllable (“ga” or “ka”). Together, these elicit a novel, fused percept such as “da”, “tha”, or “ta” (Tiippana, 2014). This effect is both robust and persistent for individuals who experience it, although it is well documented that a subset of individuals fails to perceive the illusion (Mallick et al., 2015). Many studies of CI users have focused on the responses of these “non-perceivers,” because they provide insight into sensory biases in the absence of fused percepts.

One very consistent finding in all studies evaluating the McGurk illusion in CI users is that those who do not perceive the illusion are biased toward the visual component (see Table 1 for brief summaries and references). This finding is in contrast to NH controls who typically report the auditory component when not experiencing the illusion (Massaro et al., 1986; McGurk & MacDonald, 1976). This discrepancy makes intuitive sense, because many CI users struggle to discriminate auditory-only syllables in their everyday lives, and may be perceptually “weighting” visual information more highly in order to improve AV estimates (Huyse et al., 2013). Indeed, adding noise or otherwise altering the saliency of one sensory modality is well-known to effectively and rapidly alter sensory weights in typical populations (Ernst & Bülthoff, 2004). Though there are shared networks involving the perception of congruent AV speech and incongruent speech like the McGurk effect, it should be noted that there are known differences in their processing mechanisms as well (Erickson et al., 2014). As such, McGurk results present a challenge for interpreting how illusory “da” percepts may (or may not) relate to naturally congruent speech, especially in the context of greater semantic information such as full sentences (Van Engen et al., 2017). Attention is another factor to consider with CI users since they typically report high listening effort (Davis et al., 2021), and greater attentional demands can also influence sensory weights (Kok et al., 2012).

Table 1.

An Overview of McGurk Studies with Cochlear Implant Users.

Citation	Study size	Study sample (ages)	McGurk Stimuli*		Group differences in fusion	Key findings
Citation	Study size	Study sample (ages)	A	V	Group differences in fusion	Key findings
Schorr et al., 2005	CI n = 36 NH n = 35	Children (5-14y)	pa	ka	CI < NH	Children with CIs are more likely to fuse syllables if they receive their implant prior to the age of 2.5y, otherwise they perceive the visual syllable “ka”.
Rouger et al., 2008	CI n = 33 NH n = 39	Adults CI_avg = 52y NH_avg = 32y	aba apa ama	aga	CI ≈ NH	CI users are biased toward the visual information, though illusion perception is similar to controls.

Desai et al., 2008	CI n = 8NH n = 14	Adults(18-80y)	ba	ga	CI ≈ NH	CI users rely more on the visual cue and their McGurk fusion rate positively correlates to duration of implant experience.
Tremblay et al., 2010	CI n = 17NH n = 12	Adults(19-69y)	ba	ga	CI ≈ NH	Proficient CI users, like NH controls, responded with either “da” or “ba”, while non-proficient CI users are more visually biased.
Huyse et al., 2013	CI n = 31NH n = 31	Children(7-17y)	afaapa	ashaaka	CI > NH	Non-proficient CI kids’ responses were more visual and less auditory than proficient kids.
Tona et al., 2015	CI n = 24NH n = 12	Children(4-10y)	bapa	gata	CI > NH	The McGurk illusion was experienced more often in Japanese children with CIs than age-matched controls.
Stropahl et al., 2017	CI n = 8NH n = 24	Adults(19-75y)	bamapa	gaka	CI > NH	CI users have higher sensory noise and a higher threshold of perceiving the illusion compared to controls in a stimulus-independent NED model.
Yamamoto et al., 2017	CI n = 31	Children(5.5-14y)	bapa	gaka	n/a	Prelingually deafened Japanese children with bilateral CIs experienced the illusion while listening with just one or with both implants.

All studies report a visual bias in CI users, while Schorr, Desai, and Tremblay also found various correlations to clinical measures.

*Only auditory bilabial stimuli and visual velar stimuli are listed, though it should be noted that four studies (Rouger, Stropahl, Tona, and Yamamoto) also tested several other syllables in up to 12 different combinations. NED = noisy encoding of disparity (Magnotti & Beauchamp, 2015). CI = cochlear implant, NH = normal hearing, n/a = not applicable.

Despite the consensus in the literature regarding visual bias among CI users in the context of McGurk stimuli, how their rate of fusion (e.g., “da” percepts) compares to NH controls is less apparent. Some studies indicate similar reports of fused percepts in CI users (Huyse et al., 2013; Rouger et al., 2008; Tremblay et al., 2010), while others find both lower (Schorr et al., 2005) and higher (Desai et al., 2008; Stropahl et al., 2017; Tona et al., 2015) reports of McGurk percepts in CI users than controls (Table 1). Given that auditory saliency is lower in CI users (i.e., less reliable), direct comparisons to NH individuals requires further calculations and may account for some discrepancies in the literature. Take, for example, an auditory error in unisensory trials like mistaking the sound of “ba” for “da”. Because this response could also mimic a McGurk percept in the incongruent trials, an adjustment is necessary to better quantify illusion perception per se (Grant et al., 1998; Stevenson et al., 2012). This need may be particularly true for patient populations with hearing impairments. Desai et al. (2008), for instance, found much higher fusion response rates in CI users than controls (60% v. 24%, respectively) that were actually not significantly different after applying an error correction. Further investigations are needed to address this issue, particularly in larger sample sizes that also capture the clinical diversity of CI users today.

In addition to this outstanding question of whether McGurk perception truly differs between CI and NH listeners, it is also unknown how these results directly relate to other metrics of AV integration (Grant & Seitz, 1998; Stevenson et al., 2017). The sound-induced flash illusion (SIFI)—sometimes referred to as the flashbeep or double flash illusion—was discovered more recently and involves simple, non-speech stimuli (Shams et al., 2000). Unlike the McGurk effect, the sound-induced flash illusion, or SIFI, asks participants to ignore what they hear and only report what they saw. Specifically, participants are asked to count the number of rings that rapidly flash on a screen while they also hear a varying number of beeps. The greater the number of beeps, the more likely that multiple, illusory flashes are perceived when only a single flash is presented. Given that deafness has been linked to perceptual enhancements in the visual periphery (Bavelier et al., 2006) and this illusion is perceived more strongly in the parafoveal visual fields of typical listeners (Shams et al., 2002), employing this task in CI users may provide new context for underlying differences in AV integration extending beyond speech stimuli.

We hypothesized that the known visual biases in CI users would lead to more veridical visual perceptions, which would correspond to a decreased likelihood of perceiving the non-existent flashes in a SIFI task or the fused “da” percept in a McGurk task (see Figure 1a schematic). Furthermore, because low McGurk fusion is effectively a metric of visual bias for CI users, we predicted a positive correlation between these two tasks; that is, when simple visual percepts are more easily biased by sound, we predicted a higher rate of fusion of auditory and visual speech signals leading CI users to perceive the McGurk illusion (Figure 1b). By the same logic, lower McGurk fusion in NH controls typically corresponds to an auditory-dominant percept, which could, in turn, influence a greater number of perceived flashes in the SIFI experiment (i.e., result in a negative correlation between the two tasks).

Figure 1.

Schematic of main hypotheses and results. Data from this study supported our hypothesis that CI users experience the McGurk and Sound-Induced flash illusions less often than controls (a). We did not find a positive correlation between these two illusions in CI users as we expected (b), nor did they predict clinical speech outcomes beyond known metrics in the CI group (not pictured). We did, however, find a negative correlation between both tasks among the NH control group in support of our reasoning that their higher perceptual “weighting” of auditory information results in perceiving more illusory flashes (b).

Our aim was to investigate how these tasks relate both to one another and to clinical outcome measures for CI users. Such work is necessary to characterize AV integration more completely in this cohort of individuals for whom it may be exceedingly important to “bind” auditory and visual information into more reliable multisensory percepts. The aforementioned hypotheses (see Figure 1) allowed us to address several outstanding questions in the literature: how proficient CI users are with integrating speech sounds compared to simple flash/beep pairings and whether these tasks have clinical implications for predicting other speech measures.

Our results indicate that CI users perceived both of these illusions less often than their NH counterparts. Within the CI group we did not find a correlation between the two illusions; however, two summary metrics [i.e., SIFI susceptibility index and p(McGurk)] correlated significantly in the NH group such that lower McGurk fusion (i.e., more auditory-dominant percepts) corresponded with more illusory flashes (SIFI). Given the absence of a correlation between these variables in the CI group, it is also possible that these two illusions may be mediated by different mechanisms of multisensory (i.e., AV) integration that are disproportionally influenced by other demographic or clinical factors in CI users. Furthermore, the lack of predictive ability of these illusions in explaining clinical variability within the CI group highlights the need for more extensive clinical characterization and perceptual modeling [e.g., (Braida, 1991; Grant et al., 2007)] to describe, and perhaps predict, AV proficiency among a population of individuals who critically rely on AV integration for speech comprehension.

Methods

Participants

We recruited 63 CI users and 69 NH controls (Table 2). There was no significant difference between groups for age (t₍₁₃₀₎ = 1.9, p = 0.06, d = 0.33). At least 3 months of experience with their implants was an inclusion criterion for CI users, and the average was 4.6 ± 3.7 years post activation (range = 3 mo – 14 y). The duration of hearing loss as defined from the known onset was 23.6 ± 17.4 years on average (range = 1.5–75 y). We also calculated an approximate “duration of deafness,” which we defined as the date of cochlear implantation minus the onset of severe hearing loss. We used a standard definition of severe hearing loss as either the date when pure tone detection thresholds exceeded 70 dB HL or else the closest estimate from clinical records. The average duration of deafness was 2.4 ± 4.2 years (range 0–27 y). The majority of CI users (n = 53) were postlingually deafened; however, a subset was prelingually deafened (n = 10). These individuals received their implants at an average age of 3.5 ± 1.8 years (range = 1.3–6.5 y; see black circles in Figure 2). Although prelingual deafness (i.e., occurring prior to language acquisition and during critical periods of development) can negatively impact speech outcomes (Fitzpatrick, 2015; Miyamoto et al., 1994), all of these 10 individuals achieved high speech proficiency (Figure 2).

Figure 2.

Clinical outcomes for the CI group. Age of implantation ranged from 1.3-75 years old (left), and all users had at least 3 months of CI use prior to participation. Clinical speech testing had a wide range of outcomes (right). We see higher than average performance in prelingually deafened individuals (filled circles) who were all implanted in early childhood (i.e., between 1.3 and 6.5 years of age). Bars indicate group means and error bars are 95% confidence intervals of the mean.

Table 2.

Participant Characterization.

Group	N	Sex (% female)	Mean age ± SD (y)	Number of CIs	Acoustic hearing	Implant manufacturer	Duration of hearing loss (y)
CI	63	56%	44.6 ± 21.1	one – 52%two – 48%	41%	60% Cochlear25% MED-EL15% AB	23.6 ± 17.4
NH	69	73%	37.8 ± 20.1	—	100%	—	—

A large cohort of cochlear implant (CI) and normal hearing (NH) listeners ranging in age from 6 to 77 years old participated in this study. Nearly half of the CI users (48%) had bilateral implants while many of the others had residual acoustic hearing (41%) in the non-implanted ear. Among the 93 implanted ears in this study, the majority (60%) were manufactured by Cochlear, Ltd, followed by Med-El and Advanced Bionics.

All CI users completed testing in their “best-aided” hearing condition, which included hearing aids in the non-implanted ear for 41% of the sample. All participants wore corrective lenses as needed and were screened for visual acuity using either a Snellen eye chart or verbal report. Speech perception in the CI group was either tested at the study visit or recorded from recent medical records from the last 6 months. This measure was not available for four individuals, and for the remaining 59 CI users, testing was carried out via best-aided, monosyllabic CNC word lists scored out of a possible 100% correct. Results indicate a wide range of proficiency from 0%-98% correct (67.1% ± 22.7%) (Figure 2), and this variability is consistent with other reports in the literature (Gifford et al., 2018).

Stimuli

Visual stimuli were displayed using Matlab 2008a and Psychophysics toolbox extensions (Brainard, 1997). These stimuli were presented on a CRT monitor positioned approximately 60 cm from participants. Visual stimuli were white circles (10 ms in duration at a 13° visual angle) on a black background and 2 s videos of a female articulating the syllables “ba” and “ga” (Stevenson et al., 2012). Auditory stimuli were 3.5 kHz tones (7 ms in duration) in the SIFI task and utterance of the syllables “ba” and “ga” in the McGurk task. When multiple flashes were presented in control conditions, they were separated by 43 ms intervals. For example, the congruent 4 flash and 4 beep trial is 169 ms in duration while a 1 flash, 1 beep trial is 10 ms. We aligned the onset of visual and auditory stimuli using a Hameg 507 oscilloscope via inputs from a photovoltaic cell and a microphone. Because it is well documented that CI users have normal auditory gap detection thresholds, generally in the range of 2–10 ms (Shannon, 1989), the 46 ms gaps between auditory stimuli in the SIFI experiment should have been easily perceptible as individual beeps for both groups. All auditory stimuli were delivered at a comfortably loud level (approximately 65 dB SPL) through a mono speaker.

Experimental Design and Analysis

The McGurk task has two blocks. The first block is unisensory testing of auditory-only and visual-only presentations of “ba” and “ga” (28 trials total). The next block is all AV presentations of both syllables (20 trials each) and the incongruent pairing of auditory “ba” with the visual articulations of “ga” (20 trials). Participants responded to each trial with one of four letters corresponding to the sounds: “ba”, “ga”, “da”, or “tha” (Figure 3). Illusory responses were considered either “da” or “tha”, which we will simply refer to as “da” throughout. Because misidentifying unisensory stimuli as one of these novel syllables could overinflate the apparent magnitude of the illusion, we made a correction using the formula:

Figure 3.

Experiment details of the McGurk illusion and the sound-induced flash illusion (SIFI). These two tasks have both congruent control trials and incongruent illusory trials where participants are asked to either report what the woman said (McGurk) or the number of flashes that they saw (SIFI).

p(“da” | McGurk trial) × [1-p(“da” | Unisensory trial)]

If a participant perceives a unisensory component as “da,” that obfuscates whether a subsequent AV perception of “da” is illusory and thereby indicative of integration. This formula adjusts for those occasions by calculating the proportion of “da” responses out of total trials in the incongruent AV condition and multiplying by (1 - proportion of “da” responses in unisensory trials). This effectively lowers the rate of fusion metric—p(McGurk)—for those who struggle to distinguish the component syllables on their own. Though this correction has the benefits of being a simple adjustment relative to a unisensory baseline, we should note that it assumes “da” errors occur at the same rate in incongruent AV conditions as they do in unisensory trials. For this reason, the p(McGurk) metric could be considered a conservative estimate of AV fusion. Even so, it is a common adjustment to make in McGurk studies such as these (Stevenson et al., 2012) and was applied here a priori. In circumstances where unisensory components elicit highly accurate responses, no such correction is needed. Instead, the raw proportion of fused percepts can be directly compared between groups [see, for example, (Grant et al., 1998)].

In the SIFI task, participants fixated on a white cross in the middle of the screen while a white ring flashed on a black background (Figure 3). They were asked to report the number of flashes while ignoring the beeps. Control trials consisted of: 1–4 flashes without beeps, congruent numbers of flashes and beeps (up to 4), and incongruent pairings of just one flash with 2–4 beeps. 25 trials were tested for each of these 11 conditions, and responses were scored as the average number of flashes reported. Additionally, we calculated a susceptibility index (SI; Stevenson et al., 2012) using the following formula where R_n is the average number of flashes for n beeps:

SI = \frac{\frac{R_{2} - R_{1}}{1} + \frac{R_{3} - R_{1}}{2} + \frac{R_{4} - R_{1}}{3}}{3}

This index collapses across all incongruent trials in order to derive a single metric for the average number of illusory flashes that are experienced per added beep. A value of 0 means that no illusion was perceived, and a value of 1, for instance, indicates that each additional beep increased the perceived number of flashes by 1.

Procedures

All protocols and procedures were approved by Vanderbilt University Medical Center's Institutional Review Board, and all subjects provided informed consent prior to participation. Experiments took place in a dimly lit, sound-attenuated room with an experimenter seated nearby. For the SIFI task, subjects were instructed to maintain fixation on the centrally-located fixation cross. All responses were collected using a standard keyboard. These experiments were part of a larger testing battery in which both trial and task order was pseudorandomized (Butera et al., 2018). Because some individuals ran out of time at the end of testing, subject numbers are included on figures to indicate those who were able to complete each task. In the McGurk experiment, one CI user and 2 NH controls did not have sufficient time to complete the unisensory testing necessary for p(McGurk) calculations. In lieu of imputation, we used the proportion of “da” responses as a proxy for p(McGurk) in these three individuals.

Statistical Approach

Between-group differences in the McGurk task were tested using resampling methods on account of several highly skewed variables that are incompatible with parametric tests even after standard transformations (e.g., log₁₀). This approach was selected for its minimal assumptions of the data's distribution in each group, and is based on 30,000 reshuffles where each shuffle reassigns data points randomly to the two groups. A two-tailed, Welch's t-test is then used as a comparison metric. We selected this metric instead of a difference in means, for instance, because it has the advantage of capturing both a central tendency and variability. Significance is expressed as the number of times a shuffled sample produces a p value exceeding what was found in the actual sample, and is expressed as a proportion of the total number of simulations. T scores and degrees of freedom are reported from the Welch's t-test of the observed data. Control McGurk trials, the p(McGurk) index, and SI were compared between groups using this resampling method in R (R Core team, 2013). After narrowing our comparisons of p(McGurk) measures to individuals who perceived the illusion on at least one trial, we tested whether proportionally more individuals were excluded from one group than the other using a chi-square test in SPSS (IBM).

Data from the SIFI task were less skewed and a square root transformation corrected for small deviations from normality. Congruent and incongruent conditions were both compared via a mixed-model ANOVA in SPSS. The repeated measure was the number of added beeps, and the group number was the between-subjects factor. Greenhouse-Geisser corrections for violations of sphericity were applied as needed, and significant interactions were followed up with pairwise t-tests.

Correlations between task metrics were done with a Spearman's rank-order correlation in SPSS software. Lastly, for a linear regression within the CI group, we included uncorrelated clinical variables with log transformations as needed to correct for deviations from normality.

Because prelingual deafness and subsequent pediatric implantation is frequently considered to be a distinct subgroup of CI users with a unique developmental time course, we also conducted all analyses excluding data from these 10 individuals. Since none of the main findings were affected by their inclusion, all statistical results and plots include all data collected, including both pre and postlingually deafened CI users. Significance for all statistical tests was defined as α < 0.05, and throughout, effect sizes are reported where applicable and means are reported ± standard deviations. De-identified data from this study are available from the corresponding author (I.M.B.) upon reasonable request.

Results

McGurk Illusion

In both the unisensory and AV control (i.e., congruent) trials, CI users had lower speech perception accuracy for both “ba” and “ga” stimuli (Figure 4). Not surprisingly, the group differences were the largest for the auditory-only conditions (Table 3). Additionally, for identifying “ga,” CI users had lower accuracy in lipreading (CI mean = 53%, NH mean = 68%) and AV listening (CI mean = 83%, NH mean = 99%) (Figure 5). Though AV identification of “ba” was also statistically lower for CI users, both groups scored close to ceiling (CI mean = 97.7% and NH mean = 99.3%). Group responses for the control conditions indicated that incorrect responses mostly corresponded to the fused, illusory stimulus. In fact, for every control condition tested, CI users had more “da” responses than controls (Figure 4). The largest group differences were in auditory only testing, though even for congruent AV presentations of “ga,” the CI group responded incorrectly with “da” on 16% of trials compared to just 4% of these errors in the NH group (Figure 4b). This finding highlights the need for a correction factor in order to better equate groups on illusion perception.

Figure 4.

Auditory, visual, and congruent audiovisual responses to McGurk control trials. The distribution of responses is shown for (a) the "ba" stimulus and (b) the "ga" stimulus. The disproportionate amount of "da" errors in the CI group prompted us to apply a correction factor for group comparisons with incongruent AV stimuli.

Figure 5.

McGurk experiment results. (a) Mean accuracy of perceiving syllables in both unisensory and AVc (congruent audiovisual) trials are shown for “ba” and “ga." (b) Mean responses to AV incongruent, “McGurk” trials are reported as the proportion of responses out of the total trials for this condition with no further adjustments. (c) Individual data for each response type is shown for AV incongruent trials. Error bars indicate 95% confidence interval of the mean. Chance performance is at 25%. * p < 0.05, **p < 0.01, ***p < 0.001.

Table 3.

Statistical Results from Unisensory, Audiovisual Congruent, and Audiovisual Incongruent Trials of the McGurk Experiment.

Modality	Syllable	Group means [%] ± SD		t statistic	Significance	Effect size
Modality	Syllable	CI	NH	t statistic	Significance	Effect size
Auditory-only	baga	55.1 ± 30.268.7 ± 34.7	96.8 ± 16.298.9 ± 3.75	t_(92.2) = −9.6t_(62.3) = −6.8	p < 0.001 p < 0.001	d = 1.72d = 1.23
Visual-only	baga	97.2 ± 7.2752.7 ± 28.8	98.7 ± 4.1168.4 ± 29.6	t_(94.3) = −1.4t_(125.9) = −3.0	p = 0.16p = 0.004	d = 0.25d = 0.54
Audiovisual (congruent)	baga	97.7 ± 5.5282.5 ± 19.5	99.3 ± 1.9399.1 ± 2.12	t_(74.1) = −2.3t_(63.4) = −6.7	p = 0.02 p < 0.001	d = 0.42d = 1.20
Audiovisual (incongruent)	ba	2.5 ± 7.61	33.6 ± 42.7	t_(70.5) = −5.9	p < 0.001	d = 1.01
	ga	43.1 ± 31.4	13.0 ± 25.4	t_(119.4) = 6.0	p < 0.001	d = 1.05
	da	54.4 ± 32.0	53.4 ± 43.6	t_(121.0) = 0.15	p = 0.87	d = 0.03

Comparisons are made between CI and NH groups using a resampling method and Welch's t-test. Values (p < 0.05) are bolded.

In the incongruent AV (i.e., McGurk) trials (Figure 5b), we analyzed the raw data of response proportions and found no significant difference in average “da” responses between groups (Table 3). However, when CI users did not fuse the syllables, they were much more likely to report the visual component “ga.” Conversely, non-fusing NH controls were most likely to report the auditory component “ba.” Individual data from these McGurk trials illustrate these biases as well as the high proportion of NH controls who did not perceive the illusion (Figure 5c, white circles at or near 0% responses). In NH populations, prior studies have described the McGurk illusion occurring as an “all-or-nothing” effect such that the majority of individuals either experience the illusion almost always or else rarely ever (Mallick et al., 2015). In the present study, we found that 72% of the NH group fell within the two extremes of perceiving the illusion either ≥ 90% of the time or ≤ 10%. In contrast, the CI group had many more intermediate responses with only 35% of individuals having very high or very low fused responses (see “da” panel in Figure 5c).

Sound-Induced Flash Illusion (SIFI)

In control trials where the number of flashes and beeps is matched (Figure 6a), there are no between-group differences in the perceived flashes (F_(3,127) = 0.002, p = 0.97, η_partial² = 0). In the incongruent pairings of just one flash with multiple beeps (Figure 6b), there is also no between-subjects effect (F_(4,127) = 0.55, p = 0.46, η_partial² = 0.004). However, there is a beep × group interaction (F_(1.4,127) = 7.6, p = 0.003, η_partial² = 0.056). Follow-up tests did not identify any group differences at specific conditions, though several approached significance (Table 4). Most notably, CI users reported slightly fewer flashes in the three-beep (avg flashes = 1.97) and four-beep conditions (avg flashes = 2.06) compared to controls (2.16 and 2.30, respectively).

Figure 6.

Sound-induced flash illusion results. Average reports of the number of perceived flashes are plotted for each group. Trials were either congruent pairings of one or more flashes and beeps (a) or 0-4 beeps paired with only one flash (b). Dotted lines represent 100% accuracy and error bars are 95% confidence intervals of the mean. Note that the 1 beep, 1 flash condition from (a) is replotted in (b) for reference.

Table 4.

SIFI Results.

Condition	Group means ± SD		t statistic	Significance	Effect size
Condition	CI	NH	t statistic	Significance	Effect size
1 flash	1.24 ± 0.29	1.17 ± 0.22	t₍₁₂₇₎ = 1.5	p = 0.13	d = 0.27
1 flash + 1 beep	1.27 ± 0.35	1.17 ± 0.26	t_(112.2) = 2.0*	p = 0.051	d = 0.35
1 flash + 2 beep	1.82 ± 0.55	1.87 ± 0.47	t₍₁₂₇₎ = −0.69	p = 0.49	d = 0.12
1 flash + 3 beep	1.97 ± 0.66	2.16 ± 0.61	t₍₁₂₇₎ = −1.8	p = 0.081	d = 0.31
1 flash + 4 beep	2.06 ± 0.72	2.30 ± 0.72	t₍₁₂₇₎ = −1.9	p = 0.066	d = 0.33

Though within-subjects effects from a mixed-model ANOVA indicated a significant condition × group interaction, no follow-up t-tests reached the p = 0.05 significance threshold.

*Fractional degrees of freedom result from unequal variances as identified by a Levene's test.

Illusion Metrics

In order to compare group differences in illusion perception between these two tasks, we calculated p(McGurk) and SI metrics (Figure 7). The p(McGurk) calculation corresponded to a unisensory baseline adjustment that lowered the probability of “da” responses for CI users by 0.161 and 0.057 for NH controls. In an initial comparison of p(McGurk) between groups, there is no significant difference (t_(109.6) = −1.7, p = 0.097, d = 0.33). However, we also did a post hoc test to compare whether the magnitude of the illusion differed strictly among individuals who did perceive the illusion in at least one trial. Thus, we excluded all “non-perceivers” who made up 13% of the CI group (n = 8) and 30% of the NH (n = 20). To test whether there was a significant difference in this ratio of non-perceivers, we did an exploratory chi-square test and found that groups were significantly different in this regard (χ²(1) = 5.65, p = 0.017, φ = 0.21). Of the remaining 55 CI users and 47 NH controls, we see a highly significant difference between p(McGurk) measures (t_(81.0) = −4.9, p = 4 × 10⁻⁵, d = 0.24). Following the same trend, there is also significantly lower SIFI susceptibility for CI users (t_(124.5) = −2.5,p = 0.014, d = 0.31) for a difference in means of 0.39 for CI users and 0.53 for NH controls (Figure 7b).

Figure 7.

Derived metrics of the probability of McGurk perception and SIFI susceptibility index. (a) The p(McGurk) metric corrects for inaccurate “da” percepts in control conditions (i.e., which only contain “ba” and “ga”). (b) Susceptibility index collapses across all incongruent conditions (i.e., 2 to 4 beeps per flash) to quantify the average increase in perceived flashes per added beep.

Figure 8.

Relationship between SIFI and McGurk. The correlation between susceptibility index (SI) and probability of McGurk perception is not significant for CI users (a), but does have a significant negative relationship among NH controls (b). Shaded areas correspond to 95% confidence intervals.

Correlation Between Illusion Tasks

Next, we asked whether these two illusions have a relationship to one another in these cohorts. We found no correlation between the illusions for the 87% of CI users who experienced the McGurk effect (r_s = 0.041, p = 0.77, Figure 7a); however, we did find a negative, albeit weak, correlation among the 70% of NH controls who perceived the McGurk illusion (r_s = −0.317, p = 0.030).

Testing for Additional Explained Variability in Clinical Measures

Lastly, we asked whether these AV illusion metrics had any predictive value for explaining variability in clinical speech scores within the CI group (Figure 2). In a stepwise regression model with CNC scores as the dependent variable, we entered the following four uncorrelated clinical and experimental measures as independent variables: duration of hearing loss, duration of deafness, p(McGurk), and SI. The only variable that was a significant predictor of CNC scores was the duration of hearing loss, explaining 13% of variability (R² = 0.130, F_(1,40) = 5.97, p = 0.019). All other variables were excluded from the model (Table 5).

Table 5.

Predictors of CNC Variability.

Predictor	B (unstandardized coefficient)	Partial correlation (pr)	p value
Duration of hearing loss	−22.606	-.36	.019
Model 1: R² = .13, F(1,40) = 5.97, p = .019
Excluded variables: Duration of deafness, p(McGurk), Susceptibility index

In a stepwise linear regression with CNC speech scores as the dependent variable, illusions metrics did not significantly contribute to the model beyond the 13% of variability predicted by duration of hearing loss.

Discussion

This study tested the McGurk illusion in a large cohort of CI users and is, to our knowledge, the first study that investigates the SIFI in this population. A key finding is that CI users perceived both of these AV illusions less often than NH controls (Figure 7). Additionally, we replicated the same general trend as others have reported in the McGurk task: that CI users display a bias toward the visual speech syllable (Figure 5). Importantly, our results also illustrate the need to correct for mistaking unisensory syllables as “da” (Figure 4). That is, in this study, like several others (Huyse et al., 2013; Rouger et al., 2008; Tremblay et al., 2010), raw data of “da” responses (i.e., without any error corrections) were the same between groups (Figure 5c). It was only in comparing the p(McGurk) measure of individuals who perceived the illusion at least once that we saw the group averages significantly diverge (Figure 7). Based on these findings, we recommend that any further investigations of this illusion in CI users take similar measures to disambiguate “da” responses and fused percepts to avoid overestimating the magnitude of the McGurk effect.

Similarly, in the SIFI task, CI users were susceptible to perceiving the illusion; however, the magnitude of the effect was lower than for controls. More specifically, we saw a significant between-group difference in a susceptibility index that collapses across all conditions into a single metric. To our knowledge, the lower CI group average in this metric is a novel finding, which is noteworthy given that relatively little is known about low-level AV stimulus detection in CI users (Stevenson et al., 2017). Based on our prior work, we do know that CI users’ temporal judgments of synchrony for these same flashbeep stimuli are indistinguishable from controls in a simultaneity judgement experiment (Butera et al., 2018). This suggests that low-level AV temporal function is likely intact in adult CI users, so any broader issues may be subtle. Although work in other clinical populations has related reduced SIFI perception to broader impairments in integration (Stevenson et al., 2014), additional work is needed to better characterize any further implications and mechanisms in CI users.

A broad interpretation of the present study is that CI users perceptually “weight” visual information more highly, which has a consequent impact on both the McGurk and SIFI illusions. That is, in incongruent tasks where illusory percepts are more heavily dependent upon sound biasing the visual cue, CI users are more likely to have strictly visual percepts. This visual bias translates to fewer illusory flashes in SIFI and a striking contrast in the higher frequency of visual “ga” percepts than auditory “ba” percepts for incongruent AV trials. In fact, CI users reported the auditory percept in just 2.5% of incongruent AV trials (Table 3), which represents merely 31 “ba” responses over 1,240 total trials. Though there appears to be lower p(McGurk) among CI users who perceived the illusion at least once, this difference in fusion rates is subtle compared with the broader difference in how each group allocates unisensory responses either overwhelmingly toward the visual component (for CI users) or the auditory component (for NH controls).

A similar pattern of more visually biased McGurk responses can also be simulated in NH users by vocoding stimuli to emulate CI processors (Desai et al., 2008). Similarly, adding a visual distractor reduces McGurk percepts (Tiippana et al., 2004), as does adding noise to reduce the auditory and/or video salience (Fixmer & Hawkins, 1998; Hirst et al., 2018). For CI users, it is likely that a combination of reduced auditory saliency and increased visual attention contribute to the greater weighting of visual information.

Our findings support a visual speech bias in CI users, and further work is needed to understand the clinical impacts of this sensory weighting. Though prior studies have suggested that more proficient CI users experience greater fusion (and less visual bias) (Huyse et al., 2013; Tremblay et al., 2010), we did not find McGurk fusion to explain any greater variability in CNC scores than other known measures—in our case, the total duration of hearing loss (Table 5). A major caveat to consider in interpreting these findings is that our outcome measure, which is standard in clinical practice, is auditory only. As a result, it is perhaps unsurprising that an AV illusion like the McGurk effect may relate more directly to AV speech measures than auditory-only ones (i.e., CNC scores). However, as others have reported in NH samples, correlations between McGurk results and AV speech measures are weak (Van Engen et al., 2017), and it is likely that many studies are underpowered to detect such relationships (Magnotti et al., 2020). As a result, larger studies may be necessary to draw the presumed connection between McGurk susceptibility and other AV speech measures, particularly those involving natural, congruent speech.

Though the stimuli in the McGurk and SIFI tasks are dissimilar, perceiving both of these AV illusions may have shared mechanisms since both require the viewer to reconcile whether incongruent auditory and visual cues do in fact originate from the same source (Körding et al., 2007). When comparing the illusion magnitudes between these two experiments, we found a negative correlation between the SI and p(McGurk) metrics in the NH group (r_s = −0.317, p = 0.030). However, we did not see a direct relationship between these measures in CI users as we had anticipated (Figure 1b). In a slightly younger and smaller NH sample size, Stevenson et al. (2012) reported a stronger correlation between p(McGurk) and SIFI (R² = 0.42, p = 0.003) that also had a negative relationship. In contrast, Tremblay et al. (2007) found no correlation between SIFI and McGurk tasks in a younger sample of 38 typical listeners between 5 and 19 years of age. While it is logical that differences in low-level AV illusions indexed by SIFI could have cascading effects on speech integration, these two tasks do not appear to have a direct relationship in CI users. Further work to model the process of cue combination in each of these tasks may provide additional context for how subject-specific parameters like sensory noise may correspond between McGurk and SIFI experiments (Magnotti et al., 2013; Magnotti & Beauchamp, 2015).

It should be noted that the illusion of multiple flashes can be elicited in other modalities as well (Violentyev et al., 2005). In an fMRI study of deaf individuals, a visual-tactile illusion that is analogous to SIFI, revealed larger visual responses in Heschl's gyrus and a greater magnitude of the illusion experienced in deafness (Karns et al., 2012). Although it's beyond the scope of the present study, it would be interesting for future work to investigate possible underlying mechanisms. For instance, given what is already known about the mechanisms behind the SIFI (Cecere et al., 2015; Kerlin & Shapiro, 2015; Watkins et al., 2006), it would be interesting to test whether, for instance, visually derived alpha oscillations play a similar role in predicting the likelihood of experiencing the illusion in CI users or if other, crossmodal mechanisms are at play (Stropahl & Debener, 2017).

One caveat to our study is that that McGurk perception is highly stimulus-specific, and in our case, we only tested responses to a single female speaker. As a result, it is possible that these results may not generalize more broadly to other syllable combinations, male speakers, languages, etc. The primary advantage for interpreting our findings, however, is that we tested a large sample of clinically diverse CI users, so CI perception of this particular stimulus is well-characterized. In future studies, applying a noisy encoding of disparity (NED) model to quantify results would be ideal for investigating the effect in a stimulus-independent manner (Magnotti & Beauchamp, 2015; Stropahl et al., 2017). Such work may also provide more context for our curious findings that CI users have lower lipreading proficiency with the syllable “ga,” as well as lower identification of the congruent, AV presentation of “ga” (Figure 5). In the past our lab has recorded and assayed several other McGurk stimuli and have continuously found the present stimulus to elicit the largest magnitude of the McGurk effect (unpublished observation). This may be due to a high ambiguity of the visual component in this particular recording, which could have disproportionately obfuscated the speech content for proficient lip readers like CI users. Thus, CI users may have performed more poorly in these control conditions for the same reason that this is a good McGurk stimulus—the articulation of “ga” (as well as the speaker's pre-articulatory movements) is particularly ambiguous and more like “da” than other speakers/recordings. Again, the methods described by Magnotti and Beauchamp (2015) would address this issue more conclusively. Another potential approach is to expand the stimulus set to include additional response options into which non-fusion percepts may fall. Otherwise, it is possible that listeners were forced to choose “da” percepts on some trials as the closest match given the limited response options, which may not have been the case with a larger stimulus bank (Rouger et al., 2008).

We should note that, as in all McGurk experiments, our stimuli were specifically selected for the high contrast between their unisensory reliabilities (i.e., the high visual ambiguity of “ga” and the low ambiguity for “ba”). For example, both CI and NH groups display V > A accuracy in identifying “ba” (Figures 4,5). In most practical circumstances this would not be the case. That is, for monosyllabic words, for example, most individuals correctly identify around 20% of words through lipreading alone. With current surgical techniques and modern CI processors, users can typically expect higher auditory-only word recognition accuracy than 20% (Blamey et al., 2013). Thus, these stimuli highlight an unnatural circumstance wherein lipreading provides better sensory estimates than the auditory information, even for NH controls.

Finally, it is not entirely clear what these results mean for CI outcomes given the lack of a relationship with CNC scores. Are more visually biased CI users less proficient with their implants? Are they worse “multisensory integrators” on broader—and more practical—AV speech tasks or simply less apt to perceive illusions? Because it's more likely that these AV illusions would relate to AV speech measures than the auditory-only ones tested in clinic (and reported here), future work comparing illusion perception to a real-world estimate of one's success in the integration of conversational, AV speech would provide more insight. Until we know how McGurk biases relate to natural speech integration of words or sentences it is unclear what utility administering McGurk testing at a broader scale might have. Either way, better identifying individual differences in multisensory integration is an important next step, particularly for the development of new AV remediation strategies.

Footnotes

Acknowledgments

We would like to thank the volunteers who participated in this study, and the assistance provided by Brannon Mangus and Amelia Schuster with recruitment as well as the statistical guidance provided by Dan Ashmead.

Declaration of Conflicting Interests

R.H.G. was a member of the Audiology Advisory Board for Advanced Bionics and Cochlear Americas and clinical advisory board for Frequency Therapeutics at the time of publication. No competing interests are declared for any other authors.

Funding

This work was supported in part by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (EKS NICHD) award No. U54HD083211 (M.T.W.), a Discovery grant No. RGPIN-2017-04656 (R.A.S.) from the Natural Sciences and Engineering Research Council of Canada (NSERC), a Social Sciences and Humanities Research Council (SSHRC) Insight Grant No. 435-2017-0936 (R.A.S.), the University of Western Ontario Faculty Development Research Fund (R.A.S.), the National Institute of Mental Health (NIMH) grant number T32 MH064913 (I.M.B.), and by the National Institute of Deafness and Communication Disorders (NIDCD) award No. 5F31DC015956 (I.M.B.). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of any of these funding sources.

ORCID iD

Iliza M. Butera

References

Barone

Deguine

(2011). Multisensory processing in cochlear implant listeners. In F.-G. Zeng, A. N. Popper, and R. R. Fay (Eds), Auditory Prostheses (Vol. 39, pp. 365–381). New York, NY: Springer New York.

Barone

Strelnikov

Deguine

(2010). Auditory recovery and speechreading in cochlear implanted deaf patients: A review. Audiological Medicine, 8(2), 100–106. https://doi. 10.3109/1651386X.2010.495243

Bavelier

Dye

M. W. G.

Hauser

P. C.

(2006). Do deaf individuals see better? Trends in Cognitive Sciences, 10(11), 512–518. https://doi.org/10.1016/j.tics.2006.09.006

Blamey

Artieres

Baskent

Bergeron

Beynon

Burke

Dillier

Dowell

Fraysse

Gallégo

Govaerts

P. J.

Green

Huber

A. M.

Kleine-Punte

Maat

Marx

Mawman

Mosnier

O’Connor,.

A. F.

, … Lazard

D. S.

(2013). Factors affecting auditory performance of postlinguistically deaf adults using cochlear implants: An update with 2251 patients. Audiology and Neurotology, 18(1), 36–47. https://doi.org/10.1159/000343189

Braida

L. D.

(1991). Crossmodal integration in the identification of consonant segments. The Quarterly Journal of Experimental Psychology Section A, 43(3), 647–677. https://doi.org/10.1080/14640749108400991

Brainard

D. H.

(1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436. https://doi.org/10.1163/156856897X00357

Butera

I. M.

Stevenson

R. A.

Mangus

B. D.

Woynaroski

T. G.

Gifford

R. H.

Wallace

M. T.

(2018). Audiovisual temporal processing in postlingually deafened adults with cochlear implants. Scientific Reports, 8(1), 11345. https://doi.org/10.1038/s41598-018-29598-x

Cecere

Rees

Romei

(2015). Individual differences in alpha frequency drive crossmodal illusory perception. Current Biology, 25(2), 231–235. https://doi.org/10.1016/j.cub.2014.11.034

Davis

Schlundt

Bonnet

Camarata

Bess

F. H.

Hornsby

(2021). Understanding listening-related fatigue: perspectives of adults with hearing loss. International Journal of Audiology, 60(6), 458–468. https://doi.org/10.1080/14992027.2020.1834631

10.

Desai

Stickney

Zeng

F.-G.

(2008). Auditory-visual speech perception in normal-hearing and cochlear-implant listeners. The Journal of the Acoustical Society of America, 123(1), 428. https://doi.org/10.1121/1.2816573

11.

Erickson

L. C.

Zielinski

B. A.

Zielinski

J. E. V.

Liu

Turkeltaub

P. E.

Leaver

A. M.

Rauschecker

J. P.

(2014). Distinct cortical locations for integration of audiovisual speech and the McGurk effect. Frontiers in Psychology, 5, 534. https://doi.org/10.3389/fpsyg.2014.00534

12.

Ernst

M. O.

Bülthoff

H. H.

(2004). Merging the senses into a robust percept. Trends in Cognitive Sciences, 8(4), 162–169. https://doi.org/10.1016/j.tics.2004.02.002

13.

Fitzpatrick

(2015). Chapter 19 - neurocognitive development in congenitally deaf children. In Aminoff

M. J.

Boller

Swaab

D. F.

(Eds.), Handbook of clinical neurology. Vol. 129, The human auditory system (pp. 335–356). Elsevier.

14.

Fixmer

Hawkins

(1998). The influence of quality of information on the McGurk effect. In AVSP’98 International Conference on Auditory-Visual Speech Processing. https://www.isca-speech.org/archive_open/avsp98/

15.

Gifford

R. H.

Noble

J. H.

Camarata

S. M.

Sunderhaus

L. W.

Dwyer

R. T.

Dawant

B. M.

Dietrich

M. S.

Labadie

R. F.

(2018). The relationship between spectral modulation detection and speech recognition: adult versus pediatric cochlear implant recipients. Trends in Hearing, 22. https://doi.org/10.1177/2331216518771176

16.

Grant

K. W.

Seitz

P. F.

(1998). Measures of auditory–visual integration in nonsense syllables and sentences. The Journal of the Acoustical Society of America, 104(4), 2438–2450. https://doi.org/10.1121/1.423751

17.

Grant

K. W.

Seitz

P.F.

(2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108(3), 1197–1208. https://doi. 10.1121/1.1288668

18.

Grant

K. W.

Tufts

J. B.

Greenberg

(2007). Integration efficiency for speech perception within and across sensory modalities by normal-hearing and hearing-impaired individuals. The Journal of the Acoustical Society of America, 121(2), 1164–1176. https://doi.org/10.1121/1.2405859

19.

Grant

K. W.

Walden

B. E.

Seitz

P. F.

(1998). Auditory-Visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103(5), 2677–2690. https://doi.org/10.1121/1.422788

20.

Hirst

R. J.

Stacey

J. E.

Cragg

Stacey

P. C.

Allen

H. A.

(2018). The threshold for the McGurk effect in audio-visual noise decreases with development. Scientific Reports, 8(1), 1–12. https://doi.org/10.1038/s41598-018-30798-8

21.

Huyse

Berthommier

Leybaert

(2013). Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children. Ear and Hearing, 34(1), 110–121. https://doi.org/10.1097/AUD.0b013e3182670993

22.

Karns

C. M.

Dow

M. W.

Neville

H. J.

(2012). Altered cross-modal processing in the primary auditory cortex of congenitally deaf adults: A visual-somatosensory FMRI study with a double-flash illusion. Journal of Neuroscience, 32(28), 9626–9638. https://doi.org/10.1523/JNEUROSCI.6488-11.2012

23.

Kerlin

J. R.

Shapiro

K. L.

(2015). Multisensory integration: How sound alters sight. Current Biology, 25(2), R76–R77. https://doi.org/10.1016/j.cub.2014.12.016

24.

Kok

Rahnev

Jehee

J. F. M.

Lau

H. C.

de Lange

F. P.

(2012). Attention reverses the effect of prediction in silencing sensory signals. Cerebral Cortex, 22(9), 2197–2206. https://doi.org/10.1093/cercor/bhr310

25.

Körding

K. P.

Beierholm

W. J.

Quartz

Tenenbaum

J. B.

Shams

(2007). “Causal inference in multisensory perception” edited by O. Sporns. PLoS ONE, 2(9), e943. https://doi.org/10.1371/journal.pone.0000943

26.

Magnotti

J. F.

Beauchamp

M. S.

(2015). The noisy encoding of disparity model of the McGurk effect. Psychonomic Bulletin & Review, 22(3), 701–709. https://doi.org/10.3758/s13423-014-0722-2

27.

Magnotti

J. F.

Dzeda

K. B.

Wegner-Clemens

Rennig

Beauchamp

M. S.

(2020). Weak observer–level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex, 133, 371–383. https://doi.org/10.1016/j.cortex.2020.10.002

28.

Magnotti

J. F.

W. J.

Beauchamp

M. S.

(2013). Causal inference of asynchronous audiovisual speech. Frontiers in Psychology, 4, 798. https://doi.org/10.3389/fpsyg.2013.00798

29.

Mallick

D. B.

Magnotti

J. F.

Beauchamp

M. S.

(2015). Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychonomic Bulletin & Review, 22(5), 1299–1307. https://doi.org/10.3758/s13423-015-0817-4

30.

Massaro

D. W.

Thompson

L. A.

Barron

(1986). Developmental changes in visual and auditory contributions to speech perception. Journal of Experimental Child Psychology, 41(1), 93–113. https://doi.org/ 10.1016/0022-0965(86)90053-6

31.

McGurk

MacDonald

(1976). Hearing lips and seeing voices. Nature, 264, 746–748. https://doi.org/10.1038/264746a0

32.

Miyamoto

R. T.

Osberger

M. J.

Todd

S. L.

Robbins

A. M.

Stroer

B. S.

Zimmerman-Phillips

Carney

A. E.

(1994). Variables affecting implant performance in children. The Laryngoscope, 104(9), 1120–1124. https://doi.org/10.1288/00005537-199409000-00012

33.

R Core team (2013). A language and environment for statistical computing.

34.

Rouger

Fraysse

Deguine

Barone

(2008). McGurk effects in cochlear-implanted deaf subjects. Brain Research, 1188, 87–99. https://doi.org/10.1016/j.brainres.2007.10.049

35.

Schorr

E. A.

Fox

N. A.

Wassenhove

V. v.

Knudsen

E. I.

(2005). Auditory-visual fusion in speech perception in children with cochlear implants. Proceedings of the National Academy of Sciences of the United States of America, 102(51), 18748–18750. https://doi.org/10.1073/pnas.0508862102

36.

Shams

Kamitani

Shimojo

(2000). Illusions. What you see is what you hear. Nature, 408(6814), 788. https://doi.org/10.1038/35048669

37.

Shams

Kamitani

Shimojo

(2002). Visual illusion induced by sound. Cognitive Brain Research, 14(1), 147–152. https://doi.org/10.1016/S0926-6410(02)00069-1

38.

Shannon

R. V.

(1989). Detection of gaps in sinusoids and pulse trains by patients with cochlear implants. The Journal of the Acoustical Society of America, 85(6), 2587–2592. https://doi.org/10.1121/1.397753

39.

Stevenson

R. A.

Sheffield

S. W.

Butera

I. M.

Gifford

R. H.

Wallace

M. T.

(2017). Multisensory integration in cochlear implant recipients. Ear and Hearing, 38(5), 521–538. https://doi.org/10.1097/AUD.0000000000000435

40.

Stevenson

R. A.

Siemann

J. K.

Woynaroski

T. G.

Schneider

B. C.

Eberly

H. E.

Camarata

S. M.

Wallace

M. T.

(2014). Evidence for diminished multisensory integration in autism Spectrum disorders. Journal of Autism and Developmental Disorders, 44(12), 3161–3167. https://doi.org/10.1007/s10803-014-2179-6

41.

Stevenson

R. A.

Zemtsov

R. K.

Wallace

M. T.

(2012). Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions. Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1517–1529. https://doi.org/10.1037/a0027339

42.

Stropahl

Debener

(2017). Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration. NeuroImage: Clinical, 16, 514–523. https://doi.org/10.1016/j.nicl.2017.09.001

43.

Stropahl

Schellhardt

Debener

(2017). McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The oldenburg audio visual speech stimuli (OLAVS). Psychonomic Bulletin & Review, 24(3), 863–872. https://doi.org/10.3758/s13423-016-1148-9

44.

Sumby

W. H.

Pollack

I. P.

(1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26(2), 212–215. https://doi.org/10.1121/1.1907309

45.

Tiippana

(2014). What is the McGurk effect? Frontiers in Psychology, 5, 725. https://doi.org/10.3389/fpsyg.2014.00725

46.

Tiippana

Andersen

T. S.

Sams

(2004). Visual attention modulates audiovisual speech perception. European Journal of Cognitive Psychology, 16(3), 457–472. https://doi.org/10.1080/09541440340000268

47.

Tona

Naito

Moroto

Yamamoto

Fujiwara

Yamazaki

Shinohara

Kikuchi

(2015). Audio–visual integration during speech perception in prelingually deafened Japanese children revealed by the McGurk effect. International Journal of Pediatric Otorhinolaryngology, 79(12), 2072–2078. https://doi.org/10.1016/j.ijporl.2015.09.016

48.

Tremblay

Champoux

Lepore

Théoret

(2010). Audiovisual fusion and cochlear implant proficiency. Restorative Neurology and Neuroscience, 28(2), 283–291. https://doi.org/10.3233/RNN-2010-0498

49.

Tremblay

Champoux

Voss

Bacon

B. A.

Lepore

Théoret

(2007). “Speech and non-speech audio-visual illusions: A developmental study” edited by J. Harris. PLoS ONE, 2(8), e742. https://doi.org/10.1371/journal.pone.0000742

50.

Van Engen

K. J.

Xie

Chandrasekaran

(2017). Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect. Attention, Perception, & Psychophysics, 79(2), 396–403. https://doi.org/10.3758/s13414-016-1238-9

51.

Violentyev

Shimojo

Shams

(2005). Touch-induced visual illusion. Neuroreport, 16(10), 1107–1110. https://doi.org/10.1097/00001756-200507130-00015

52.

Watkins

Shams

Tanaka

Haynes

J. D.

Rees

(2006). Sound alters activity in human V1 in association with illusory visual perception. NeuroImage, 31(3), 1247–1256. https://doi.org/10.1016/j.neuroimage.2006.01.016

53.

Yamamoto

Naito

Tona

Moroto

Tamaya

Fujiwara

Shinohara

Takebayashi

Kikuchi

Michida

(2017). Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation. International Journal of Pediatric Otorhinolaryngology, 102, 160–168. https://doi.10.1016/j.ijporl.2017.09.022

Visually biased Perception in Cochlear Implant Users: A Study of the McGurk and Sound-Induced Flash Illusions

Abstract

Keywords

Introduction

Methods

Participants

Stimuli

Experimental Design and Analysis

Procedures

Statistical Approach

Results

McGurk Illusion

Sound-Induced Flash Illusion (SIFI)

Illusion Metrics

Correlation Between Illusion Tasks

Testing for Additional Explained Variability in Clinical Measures

Discussion

Footnotes

Acknowledgments

Declaration of Conflicting Interests

Funding

ORCID iD

References