Abstract
Envelope following responses (EFRs) may be a useful tool for evaluating the audibility of speech sounds in infants. The present study aimed to evaluate the characteristics of speech-evoked EFRs in infants with normal hearing, relative to adults, and identify age-dependent changes in EFR characteristics during infancy. In 42 infants and 21 young adults, EFRs were elicited by the first (F1) and the second and higher formants (F2+) of the vowels /u/, /a/, and /i/, dominant in low and mid frequencies, respectively, and by amplitude-modulated fricatives /s/ and /∫/, dominant in high frequencies. In a subset of 20 infants, the in-ear stimulus level was adjusted to match that of an average adult ear (65 dB sound pressure level [SPL]). We found that (a) adult–infant differences in EFR amplitude, signal-to-noise ratio, and intertrial phase coherence were larger and spread across the frequency range when in-ear stimulus level was adjusted in infants, (b) adult–infant differences in EFR characteristics were the largest for low-frequency stimuli, (c) infants demonstrated adult-like phase coherence when they received a higher (i.e., unadjusted) stimulus level, and (d) EFR phase coherence and signal-to-noise ratio changed with age in the first year of life for a few F2+ vowel stimuli in a level-specific manner. Together, our findings reveal that development-related changes in EFRs during infancy likely vary by stimulus frequency, with low-frequency stimuli demonstrating the largest adult–infant differences. Consistent with previous research, our findings emphasize the significant role of stimulus level calibration methods while investigating developmental trends in EFRs.
Envelope following responses (EFRs) may be a useful objective tool to evaluate the audibility of speech sounds in children who are unable to participate in behavioral hearing tests (Easwar et al., 2015b, 2015c, 2020a). Scalp-recorded EFRs reflect neural activity phase-locked to the stimulus envelope and have been elicited by a variety of speech stimuli, including naturally spoken vowels (Aiken & Picton, 2006; Choi et al., 2013), synthesized vowels (Anderson et al., 2015; Skoe et al., 2015), high- or low-pass filtered vowels (Easwar et al., 2015b; Vanheusden et al., 2019), individual vowel formants (Easwar et al., 2015a, 2019; Laroche et al., 2013), and modified fricatives (Easwar et al., 2015b, 2020b). The rationale for using certain speech stimuli, such as individual vowel formants and modified fricatives, has been to assess audibility, using EFRs, at a wide range of frequencies with reasonable specificity (Easwar et al., 2015c). Such frequency-specific assessment of audibility is clinically desirable in light of varying audiometric configurations in individuals, particularly children with hearing loss, both with and without hearing aids, and the contribution of individual frequency regions to phoneme and overall speech recognition with hearing aids (e.g., McCreery et al., 2017; Van Eeckhoutte et al., 2020). To date, the majority of studies in children have evaluated the characteristics of EFRs elicited by natural or synthesized (broadband) vowel stimuli at the vocal fundamental frequency (f0). Little is known about EFR characteristics for comparably more frequency-specific speech stimuli during childhood. To this end, the objective of the current study was to evaluate the characteristics of EFRs elicited by band-limited speech during infancy—the age at which objective tools such as EFRs are likely to be clinically useful in evaluating the audibility of speech sounds with and without hearing aids.
Vowel-evoked EFRs, alternatively referred to as the frequency following response (FFRenv), have been successfully recorded in infants younger than ∼1 year of age. Infants as young as 1–5 days of age demonstrate discernable EFRs (Jeng et al., 2011, 2016; Ribas-Prats et al., 2019). Although EFRs are discernable, response characteristics, defined in terms of the accuracy in tracking f0 and the strength of phase-locking, tend to be weaker during the newborn period and improve in the first 2 to 3 months of life (Jeng et al., 2016). Beyond 2 to 3 months, EFR characteristics including phase-locking strength and consistency, as well as amplitude, remain fairly steady until at least ∼10 to 12 months of age and are not significantly different compared to adults (Anderson et al., 2015; Jeng et al., 2010; Skoe et al., 2015; Van Dyke et al., 2017). Together, these vowel-evoked EFR studies in infants suggest that the encoding of the vowel envelope at f0, reflected in EFRs, appears to be largely adult-like fairly early in life. Relatively minor changes are evident past the first 2 to 3 months and that includes a reduction in between-participant variability, improvements in response consistency, and phase-locking during formant transitions compared to steady-state portions of vowels (Jeng et al., 2010; Skoe et al., 2015; Van Dyke et al., 2017).
While early adult-like characteristics of vowel-evoked EFRs hold promise for clinical applications, two constraints for audibility estimation with vowel stimuli exist: limited frequency specificity and bandwidth. Limited frequency specificity arises due to the presence of a single f0 throughout the vowel spectrum that enables initiation of EFRs at the same f0 from more than one cochlear region stimulated by the vowel. Some studies indicate that EFRs at f0 entail dominant contributions from the higher formants with unresolved harmonics (Easwar et al., 2018; Vanheusden et al., 2019), whereas one study postulates dominant contributions from the lower frequency first formant (Laroche et al., 2013). Although it is likely that inaudibility of parts of the vowel spectrum will influence the nature (e.g., amplitude) of the scalp-recorded EFR, multiple cochlear regions of EFR initiation make the identification of inaudible frequency regions challenging. Limited bandwidth is a concern because spectral energy in vowels tends to decrease significantly above ∼4 kHz. Reduced energy at higher frequencies renders vowel stimuli less efficient at indicating changes in audibility past 4 kHz compared to stimuli like fricatives that have greater high-frequency energy. For example, in adults with normal hearing, increasing the stimulus bandwidth from 4 to 10 kHz improved the amplitude of EFRs by ∼4 to 54% for vowel stimuli like /u/ and /i/ compared to much higher increases of ∼70 to 200% for fricative stimuli (Easwar et al., 2015b). The aforementioned findings not only support the need for more frequency-specific stimuli for clinical applications but also suggest the possibility that infant–adult differences (or lack thereof) in past studies may largely reflect the development of f0 encoding at frequencies that dominate the scalp-recorded vowel-evoked EFR. Additional investigation to quantify infant–adult differences with more frequency-specific stimuli is therefore merited.
Compared to vowels, tonal stimuli for eliciting EFRs (e.g., amplitude-modulated tones or amplitude- and frequency-modulated tones) can offer significantly better frequency specificity with flexible bandwidth. However, unless tones simulate the temporal envelope of speech (e.g., Laugesen et al., 2018), they are often ineffective in accurately representing nonlinear hearing aid function for speech (Scollie & Seewald, 2002; Stelmachowicz et al., 1996) and are, therefore, not preferred for evaluating aided speech audibility. Here, we refer to the literature on tone-evoked EFRs (commonly called auditory steady-state responses) in infants and children to gain insight into possible frequency-specific developmental patterns. We specifically consider modulation (i.e., envelope) frequencies of ∼80–120 Hz due to their similarity to the f0 of male-spoken vowels most commonly used in EFR studies. Infant–adult differences in EFRs could vary by the stimulus (or carrier) frequency because of tonotopy-dependent maturation evident in the brainstem—the dominant neural source of scalp-recorded EFRs at ∼80–120 Hz (Bidelman, 2018; Herdman et al., 2002). Cochlear-place-specific auditory brainstem responses (ABRs) demonstrate orderly maturation of pathways corresponding to 1.4 kHz, 2.8 kHz, 5.7 kHz, 0.7 kHz, and finally, 11.3 kHz (Ponton et al., 1992).
Similar to vowels, frequency-specific tones can also elicit discernable EFRs in the first few days of life (Cone-Wesson et al., 2002; John et al., 2004; Rickards et al., 1994; Riquelme et al., 2006; Savio et al., 2001). Tone-evoked EFRs show significant improvement in the first few months of life (John et al., 2004; Luts et al., 2006) and gradually improve until ∼14 years to reach adult characteristics (Pethe et al., 2004). Compared to adults, EFR characteristics in infants are weaker in terms of amplitude, coherence, signal-to-noise ratio (SNR), and detectability (Alaerts et al., 2010; Levi et al., 1993; Lins et al., 1996; Luts et al., 2006; Pethe et al., 2004; Rance & Tomlin, 2006; Savio et al., 2001; Van Maanen & Stapells, 2009). Some studies report larger adult–infant differences for frequencies <1 kHz (Lins et al., 1996; Luts et al., 2006), and some report the opposite pattern (Levi et al., 1993). Within the first few years, while one study indicates a faster growth rate for 1–4 kHz tones compared to 0.5 kHz tones (Savio et al., 2001), another study shows no age or frequency dependencies (Van Maanen & Stapells, 2009).
Although vowel and tone-evoked EFRs reflect similar neural processing (i.e., phase-locking), findings from previous vowel- and tone-evoked EFR studies agree only on certain aspects. Common findings between vowel- and tone-evoked studies include the detectability of EFRs fairly early in life and the rapid improvement in EFR characteristics in the first few months. Such a pattern generally agrees with the brainstem being the dominant generator of EFRs at envelope frequencies between ∼80 and 110 Hz (male f0 range; Bidelman, 2018; Herdman et al., 2002) and the early maturation of brainstem structures (review by Moore & Linthicum, 2007). However, the age at which adult-like characteristics are achieved appears to differ between vowel- and tone-evoked EFR studies. Moreover, discrepancies in findings exist even among tone-evoked EFR studies. Such discrepancies may, in part, arise from methodological differences including, but not limited to, the calibration method (that determines stimulus level), stimulus type, response metrics, and/or participant state (awake vs. asleep/sedated).
Calibration method is an important factor to consider in these developmental studies due to the common use of insert earphones in the smaller-than-adult ear canals of infants. Ear simulator or coupler-based calibration, intended to simulate levels in an average adult ear, often leads to higher levels in infant ear canals, especially for higher frequency stimuli (Levi et al., 1995; Rance & Tomlin, 2006). To account for such level changes between different-sized ear canals, stimuli have either been presented at lower levels in infants compared to adults (Jeng et al., 2010, 2011) or measured in-ear (Rance & Tomlin, 2006). In-ear calibration has revealed the need for higher-than-adult stimulus levels in infants to achieve similar response detectability, not only for EFRs (Rance & Tomlin, 2006) but also for the more commonly used ABRs (Sininger et al., 1997). The adoption of infant-appropriate calibration alternatives has, however, not been consistent across EFR studies. Another essential consideration in tracking developmental trends is the response metric. As noted earlier, multiple response metrics have been used in past research. The use of metrics such as EFR amplitudes could be influenced not only by infant–adult differences in head and skull characteristics, but also by variations in residual noise (Picton et al., 2005). The use of relative or normalized measures such as SNR, percent detectability, thresholds, and phase-locking value/coherence in previous studies may therefore be less susceptible to bias.
The primary aim of the present study was to evaluate similarities and differences in speech-evoked EFRs between typically developing infants under 1 year of age and adults with normal hearing. The secondary aim was to evaluate the age-dependent change in EFR characteristics within the first year of life. In an attempt to improve frequency specificity and bandwidth of vowel stimuli used previously, we used vowels that were modified to elicit individual EFRs from the first (F1) and second and higher formants (F2+) and fricatives that were modified to enable eliciting EFRs past 3 kHz (Easwar et al., 2015b, 2015c). Using both coupler-based and in-ear calibration of stimulus level, we evaluated infant–adult differences using non-normalized and normalized response metrics, including EFR amplitude, EFR SNR (ratio of EFR to noise amplitude), and phase coherence. We hypothesized that infant–adult differences in EFR characteristics and age effects are dependent on the stimulus levels used and the stimulus frequency that determines the dominant cochlear region of EFR initiation. Given the need for higher stimulus levels for similar infant–adult EFR detectability and the earlier maturation of brainstem processing at mid-frequencies found in previous studies, we predicted that infant–adult differences would (a) be larger when the stimulus level in infants is calibrated individually in-ear, and (b) be smaller for mid-frequency dominant stimuli. For age effects, because of the limited age range of participants, we predicted that weaker associations between the age at test and EFR characteristics would likely be evident for stimuli that elicit adult-like (i.e., mature) EFR characteristics during infancy and for those that mature much later in childhood.
Methods
Participants
A total of 50 infants and 24 adults participated in the study. Written consent was obtained from all adult participants. In the case of infant participants, written consent was obtained from either parent. Otoscopy in the test ear revealed no contraindications for testing, such as occluding wax. None of the adult participants or parents of infants reported any health concerns, including neurological disorders. Infants were born full-term, passed newborn hearing screening, and did not have any history of high-risk factors for hearing loss. In infants, hearing screening for the present study used distortion product otoacoustic emissions (DPOAE) at 2, 3, 4, and 5 kHz (primary tone pairs, L1 and L2, presented at an L1/L2 frequency ratio of 1.24 and levels of 65/55 dB SPL; AccuScreen, Otometrics, Denmark). DPOAEs with 12 dB SNR for at least three of four frequencies were required to pass the hearing screening. In adults, eligibility was determined based on a hearing screening, assessed using pure tones between 0.25 and 8 kHz at 20 dB HL presented through insert earphones (GSI-61; Grason-Stadler, Eden Prairie, MN), and middle ear status, assessed using 226 Hz probe tone tympanometry (Madsen Otoflex 100; Otometrics, Denmark). All adults, except one who was subsequently excluded, detected tones at 20 dB HL and presented type A tympanograms bilaterally. Twenty-three adults remained in the study sample.
Of the 50 infants, 2 infants failed the hearing screen and 3 other infants had to be excluded because they did not settle adequately for testing. One additional infant was excluded because there were no discernable EFRs despite an adequate number of trials (n = 442) and acceptable recording conditions (average residual noise of 13.6 nV [SD = 5.7]). Because DPOAE screening does not rule out a retrocochlear pathology, the infant’s data were excluded. Forty-four infants remained in the study sample.
The 44 infants were divided into two groups based on stimulus level. The stimulus level in the first 21 infants, henceforth referred to as the level-matched group, was adjusted to match 65 dB SPL. The stimulus level in the next 23 infants, henceforth referred to as the higher-level group, was unadjusted (i.e., would be 65 dB SPL in an average adult ear). Details on stimulus level correction are explained in the Stimulus Presentation section. The ages of infants in the two groups did not differ statistically—level-matched-infant: meanage ± SDage = 0.55 ± 0.23 years, range = 0.22–1.06 years; higher-level-infant: meanage ± SDage = 0.54 ± 0.25 years, range = 0.16–1.15 years; t(41.99) = 0.21, p = .831. The mean age of the adult group was 24.02 years (SDage = 2.56; range = 20.43–29.82 years). The number of females were 12, 11, and 20 in the level-matched-infant, higher-level-infant, and adult group, respectively. The infant groups also did not differ in the LittlEARS auditory development questionnaire score (Tsiakpini et al., 2004; Wilcoxon rank-sum test; U = 239.5, p = .816) and fell within the expected normative range (Bagatto et al., 2011; see Supplementary Figure 1). Group assignment was not randomized; all infants in the level-matched group were tested first. Nonrandom assignment is unlikely to bias the reported results due to the similarity between groups in demographic and calibration characteristics.
The study protocol was approved by the Health Science Research Ethics Board at Western University (#102557). Participants were paid for their participation at the rate of $10/hr.
Stimuli
The token /susa∫i/ (2.05 s), spoken by a 42-year-old male (average f0 = 98 Hz) from Southwestern Ontario, was used as the EFR stimulus (Easwar et al., 2015b, 2015c, 2020a, 2020b). Speech recordings were made using a studio-grade microphone (AKG Type C 4000B) and SpectraPLUS software (v5.0.26.0; Pioneer Hill Software LLC, Poulsbo, WA, USA) and further modified using Praat (Boersma & Weenink, 2017), GoldWave (v5.58, GoldWave Inc., St. John’s, Newfoundland, Canada), and MATLAB (Mathworks, Natick, MA, USA). The vowels /u/, /a/, and /i/ were 386, 447, and 435 ms long, respectively. The fricatives /∫/ and /s/ were 234 and 274 ms long, respectively. The five phonemes in /susa∫i/ were modified to elicit eight EFRs in total, from one of low, mid, or high frequencies (Figure 1).

EFR Stimuli Spectra. F1 and F2+ refer to the first formant and second and higher formants of vowels, respectively. ISTS refers to the International Speech Test Signal (Holube et al., 2010) and LTASS refers to the long-term average speech spectrum. The shaded gray region represents the dynamic range (30th to 99th percentile) of the ISTS matched in RMS level to the /susa∫i/ stimulus. Adapted from Easwar et al. (2020b).
Vowels were modified to carry two f0: the natural f0 in the region of the second formant (F2+) and a lowered f0 in the region of the lower frequency first formant (F1; Easwar et al., 2015b). The rationale for such a modification was to improve frequency and place specificity of responses in comparison to those evoked with vowel stimuli with a single f0 (Easwar et al., 2019). The average f0 in F1 was lower than the original f0 in F2+ by ∼8.5 Hz. The lowering of f0 by 8.5 Hz reduced the possible contamination of one EFR on the amplitude estimation of the other simultaneously recorded EFR. Modification of vowels entailed the following steps in Praat: (a) The f0 of the original vowel was lowered using the pitch shift function. (b) F1 was low-pass filtered from the lowered-f0 full-bandwidth vowel using steep filter skirts at 715, 1130, and 1120 Hz for /u/, /a/, and /i/, respectively. The cutoff frequencies were chosen halfway between the first and second formant peaks. F1 included the first seven harmonics for /u/ and the first 12 harmonics for the vowels /a/ and /i/. (c) The original vowels were high-pass filtered at 715, 1170, and 1175 Hz to extract F2+ of /u/, /a/, and /i/, respectively, with no overlap in the harmonics between F1 and F2+. (d) F1 with the lowered-f0 was summed with the high-pass filtered F2+ at the original f0 and matched in overall level with the original full-bandwidth vowel.
Fricatives /∫/ and /s/ were high-pass filtered at 3 and 4 kHz, respectively, to improve their frequency specificity and sensitivity to changes in audible bandwidth past 4 kHz (Easwar et al., 2020a, 2020b). The cutoffs were chosen based on the lowest frequency in the prominent spectral peak of fricative productions (Boothroyd & Medwetsky, 1992; Boothroyd et al., 1994; Stelmachowicz et al., 2004). Post-filtering, the fricatives were amplitude-modulated at 100% depth at 93.02 Hz; the analysis window of both fricatives consisted of an integer number of cycles of the modulation frequency. The root-mean-square (RMS) level was matched pre- and post-amplitude modulation.
The phonemes were originally produced in the sequence /usa∫i/. The phoneme /s/ was copied and concatenated before the phoneme /u/ to minimize the abrupt transition between two stimulus repetitions that did not entail an interstimulus interval. We chose to repeat the same /s/ to enable averaging EFRs to both stimulus iterations. Stimulus spectra are shown in Figure 1.
Stimulus Presentation
Stimulus presentation and response recording were controlled using software developed in LabView (v8.5; National Instruments [NI], Austin, TX, USA). Digital-to-analog conversion of the stimulus and vice versa for the electroencephalogram (EEG) was completed using an NI PCI-6289 M-series acquisition card. The stimulus was sampled at 32,000 samples per second and presented using an Etymotic ER-2 insert earphone (Etymotic Research, IL, USA) shielded by mu metal (Intelligent Hearing Systems, FL, USA). The earphone was coupled with an appropriately-sized foam tip. Test ear was counterbalanced. The test ear was the right ear in 11 adults, 12 infants in the level-matched group, and 9 infants in the higher-level group.
Stimulus level was calibrated in flat-weighted Leq using a Brüel and Kjær (B&K) Type 2250 sound level meter in an ear simulator (B&K Type 4157) when /susa∫i/ was presented continuously for 30 s. Stimulus level was controlled by a Tucker Davis Technologies PA-5 and an SA-1 amplifier (Alachua, FL). In adults and in the higher-level infant group, the stimulus was presented with a PA-5 attenuator level that produced 65 dB SPL in the ear simulator. In the level-matched infant group, the stimulus was presented with an attenuator level that produced 65 dB SPL in the infant’s test ear. In-ear stimulus level was measured as the average level of three stimulus repetitions using an ER-7C probe mic system (Etymotic Research, IL, USA), with the probe tube extending ∼3–4 mm beyond the medial end of a pediatric foam tip (Bagatto et al., 2002; Sininger et al., 1997). To achieve 65 dB SPL in-ear in the level-matched infant group, the PA-5 attenuator was adjusted by the difference between in-ear and ear simulator levels. In-ear stimulus levels were measured in both infant groups and not in adults because the ear simulator represents an average adult ear. The level difference (in-ear – ear simulator) did not differ significantly between the two infant groups—infants, level-matched group: mean ±
The levels of stimuli varied as per the original production. When the /susa∫i/ token was presented at 65 dB SPL in the ear simulator, the relative RMS levels of F1 stimuli of /u/, /a/, and /i/ were 3.9, –0.5, and –0.7 dB, respectively. The relative RMS levels of F2+ stimuli of /u/, /a/, and /i/ were –15.5, –5.4, and –14.1 dB, respectively. The relative RMS levels of the fricatives /s/ and /∫/ were –6.6 and –6.5 dB, respectively.
Response Recording
A single-channel EEG recording (sampling rate = 8000 Hz) was made using high forehead (Fz) as the noninverting electrode, ipsilateral mastoid as the inverting electrode, and lateral forehead as the ground. Electrode impedance was measured using an F-EZM5 GRASS impedance meter at 30 Hz. Impedances at each electrode site were <5 k
Testing was completed in an electromagnetically-shielded double-walled sound booth with the lights turned off. Adults were seated in a reclined chair during testing and were encouraged to sleep. Most often, infants were held by their parent seated in the reclining chair in the sound booth. Occasionally, babies were tested in a stroller/car seat or crib placed in the sound booth. EEG recording in infants began only when the infants were observed to be asleep. The duration of EEG recordings in infants ranged between 30 min in those who slept continuously to about an hour in those who awoke in between. If the infant awoke, parents attempted to put them back to sleep and testing resumed. The examiner collecting data (first author) tracked epochs when the baby was observed to be asleep. EEG collected during these times was concatenated before noise rejection and averaging across trials.
Each trial (4.1045 s) consisted of /susa∫i/ in opposite polarities. In adults, the number of trials was fixed at 450, and EEG was collected over ∼30.8 min. In infants, the number of trials aimed for was 450; however, the achieved number varied across infants. The number of trials recorded were deemed to be adequate for all groups as EFR characteristics stabilized by 250 trials (see supplementary figures 2 and 3). The average number of trials was 455.86 (SD = 67.35; range = 313–580) and 464.74 (SD = 69.51; range = 279–580) in the level-matched- and higher-level-infant groups, respectively; the difference was statistically non-significant, t(41.8) = –0.43 p = .669.
EFR Analysis
Analysis was completed offline using MATLAB. Each trial was divided into four epochs of ∼1 s, and a noise metric was computed to set the artifact rejection threshold. The noise metric in each epoch was the average EEG amplitude between 80 and 240 Hz. For each participant, the artifact rejection threshold was based on their noise metric distribution and set at the third quartile + 1.5 × the interquartile range (IQR). Epochs with noise metric values higher than the artifact rejection threshold were excluded from further analyses. The artifact rejection threshold differed among the groups, χ2(2)=19.66, p < .001.The Steel-Dwass non-parametric multiple comparison procedure indicated a higher threshold in adults (median = 350.82 nV; IQR = 234.9–531.87 nV) compared to both infant groups (infants, level-matched: median = 128.52 nV, IQR = 105.52–161.79 nV;
Response amplitude was estimated using a Fourier analyzer for vowel-elicited EFRs and a discrete Fourier transform for fricative-elicited EFRs (Easwar et al., 2015b, 2015c, 2020b). Analysis times or boundaries for each stimulus were preselected such that the onset and offset stimulus ramps (stimulus level up and down) were mostly excluded to achieve a steady-state stimulus level in the analysis window. For all vowel stimuli, the analysis window was 350 ms. The analysis windows for the fricatives were 215 and 258 ms for /∫/ and /s/, respectively. For both vowel- and fricative-elicited EFRs, EEG was averaged across opposite stimulus polarities to emphasize responses to the envelope (Aiken & Picton, 2008) and 10 ms was used to correct for brainstem delay (Aiken & Picton, 2006; Choi et al., 2013; Easwar et al., 2015b). Responses to the two iterations of /s/ in each sweep were averaged in the time domain.
For the Fourier analyzer, the time course of f0 in each vowel was estimated using Praat. Reference cosine and sine sinusoids were created using the f0 frequency. Once the average EEG sweep was corrected for brainstem delay, the EEG was multiplied with the reference sinusoids to obtain real and imaginary components of the EFR. Independent averages of the real and the imaginary components were obtained from the entire analysis window. These averages were combined in a complex number that was used to estimate the amplitude and phase of EFRs (Choi et al., 2013).
Residual noise was estimated from EEG amplitude at frequencies surrounding the response f0, in the case of vowel-elicited EFRs, and the modulation frequency, in the case of fricative-elicited EFRs. In the case of vowels, estimates of EEG noise were obtained from 14 surrounding noise bins (6 below and 8 above each f0; Easwar et al., 2015b). The bin encompassing 60 Hz was excluded to reduce line noise contamination. Further, the two bins flanking each f0 were excluded to reduce contamination from potential response leakage. In addition, the response f0 of one EFR was excluded from the noise estimation of the other simultaneously elicited EFR. In the case of fricatives, estimates of EEG noise were obtained from eight and six noise bins for /s/ and /∫/, respectively, evenly distributed on either side of the modulation frequency. EFR amplitude estimates were unbiased (Picton et al., 2005). SNR was computed as the ratio of EFR to residual noise amplitude, and phase coherence was based on the sums of the sines and cosines of EFR phase (without using amplitude) obtained in every trial (Picton et al., 2003; Stapells et al., 1987).
Data Exclusion
Participant data were excluded from further analysis based on their residual noise relative to their group (adults, infants) distribution. If a participant’s residual noise exceeded the third quartile + 1.5 × IQR for at least three of the eight stimuli, they were excluded. Using this criterion, two adults (23.8 and 20.6 years females) and two infants (level-matched group: 0.36 years, female; higher-level group: 0.25 years, male) were excluded.
Statistical Analysis
Generalized estimating equations (Hardin & Hilbe, 2003) were used to assess between-group differences for each stimulus and metric (EFR amplitude, noise amplitude, SNR, and phase coherence). Due to positively skewed distributions for all stimuli and groups, EFR amplitude, noise amplitude, and SNR were treated as following a gamma distribution with log-link, exchangeable correlation structure among stimuli, and robust (i.e., sandwich) standard errors. Phase coherence was square root transformed prior to analysis to improve linearity and stabilize variance. Inferences about between-group differences were made using two approaches. In the first approach, a point null hypothesis, the most commonly used approach, was tested with false discovery rate (FDR)-adjusted p values (Benjamini & Hochberg, 1995). This approach tests the null hypothesis of no group difference (i.e., ratio equal to 1, in the case of EFR amplitude, noise amplitude, and SNR, and a difference equal to 0, in the case of phase coherence). The null hypothesis is rejected for p values of <.05. In the second approach, an interval null hypothesis was tested using second-generation p values (Blume et al., 2018). This approach tests the null hypothesis of no group difference based on an interval, set a priori, that represents non-meaningful or non-interesting differences either due to limited precision or practicality. A ratio of ±20% for EFR amplitude, noise amplitude, and SNR and a difference of ±0.05 for phase coherence were chosen as the cutoffs for scientifically and clinically meaningful changes. The cutoffs were chosen based on both the coefficient of variation in test–retest measures (Easwar et al., 2020b) and practical step sizes. Second-generation p values range between 0 and 1 and represent a proportion—a fraction of the 95% confidence interval that overlaps with the null interval. Therefore, values of 0 indicate meaningful between-group differences (i.e., no overlap with the interval null), values of 1 indicate no meaningful between-group differences, and values between 0 and 1 imply an ambiguous finding where data support some fraction of effects that are in fact null effects. Second-generation p values offer better control of Type I error by reducing the likelihood of false discoveries and therefore do not require post hoc adjustments for multiple comparisons (Blume et al., 2018). It is expected that a larger proportion of adult–infant comparisons will be statistically significant when assessed using the traditional FDR-corrected approach compared to the use of second-generation p values. Although we present results from both approaches, our discussion largely refers to results obtained through the latter approach that is based on a predetermined effect size.
To evaluate the effects of age on EFR characteristics, individual analyses were completed for each stimulus and each infant group due to stimulus-specific adult–infant differences and level effects evident in the first analysis. The effects of age on EFR amplitude and SNR were assessed using a generalized linear model assuming a gamma distribution and log-link. For phase coherence, a square root transform was used prior to fitting by ordinary least squares with alternative heteroskedasticity-consistent covariance matrix estimator to guard against potential non-constant variance. For analyses in the higher-level infant group, the level difference between in-ear and ear simulator (dB) was used as a covariate. The level difference was not included as a covariate for the infants in the level-matched group because the level correction equalized the stimulus level to 65 dB SPL in all test ears. Slopes were estimated for change in each response metric for every 0.3-year increase in age at test. Slopes reported for infants in the higher-level group were adjusted for the individually measured in-ear to ear-simulator level differences by entering the level difference as an additional explanatory variable in the multivariate model. FDR corrections were applied to the p values for each metric of eight stimuli (F1 and F2+ for each of the three vowels and two fricatives).
Results
Adult–Infant Differences in EFR Metrics Were Most Evident When the Stimulus Level in Infants Matched That of an Average Adult Ear
Figure 2 illustrates stimulus-specific individual and group data for each of the four metrics of interest. Statistically significant between-group differences, based on FDR-corrected p values, are indicated in Figure 2. Figure 3 illustrates between-group differences for the same four metrics along with the second-generation p values. Although all between-group comparisons were completed in the same analysis, we present adult–infant differences for all metrics prior to between-infant group comparisons.

EFR amplitude, noise amplitude, SNR, and phase coherence as a function of stimulus along the x-axis (color) and group (shape). Box plots that illustrate the group median (horizontal line within each box) and the IQR (upper and lower limits) are overlaid on individual data. Error bars extend to the largest and smallest observed values that are no further than 1.5 times the IQR above and below the 75th and 25th percentile, respectively. Horizontal black lines indicate FDR-corrected significant post hoc pairwise comparisons.

Between-group differences for each metric (in each panel; indicated on the right) and stimulus (in color). A between-group ratio is used for EFR amplitude, noise amplitude, and SNR due to the log scale. A between-group difference is used for phase coherence due to the square root transformation. Error bars represent the 95% CI of between-group differences. The light gray area represents the interval of nonmeaningful differences used for second-generation p values (listed per comparison). The gray area spans ratios of 0.8–1.25 for the top three panels and between ±0.05 for phase coherence in the bottom panel. A lack of overlap between the 95% CI and the gray area implies rejection of the interval null hypothesis (second-generation p value = 0). The black dashed line represents the point null hypothesis (ratio of 1, difference of 0). A lack of overlap between the 95% CI and the black dashed line implies rejection of the point null hypothesis (FDR-corrected).
Models indicated a significant interaction between stimulus and group for EFR amplitude, χ2(14) = 84.32, p < .001; noise amplitude, χ2(14) = 30.90, p = .006; SNR, χ2(14) = 67.11, p < .001, as well as phase coherence, χ2(14) = 113.32, p < .001, suggesting that between-group differences varied as a function of stimulus. Relative to infants in the level-matched group, EFR amplitudes in adults were significantly and meaningfully higher for all stimuli (Figures 2 and 3, top panels). Differences were largest for /i/ F1 (adult–infant amplitude ratio = 3.28; 95% CI [2.38, 4.53]) and smallest for /s/ (ratio = 1.68; 95% CI [1.35, 2.08]). When infants received a higher stimulus level, EFR amplitudes were still significantly higher in adults for all F1 stimuli (/u/ F1: ratio = 1.79; 95% CI [1.34, 2.40]; /a/ F1: ratio = 1.58; 95% CI [1.16, 2.17]; /i/ F1: ratio = 2.13; 95% CI [1.62, 2.80]), /a/ F2+ (ratio = 1.44; 95% CI [1.17, 1.77]), and/i/ F2+ (ratio = 1.39; 95% CI [1.12, 1.73]), and /∫/ (ratio = 1.29; 95% CI [1.02, 1.64]). However, meaningful differences existed only for /u/ and /i/ F1—the two lowest frequency stimuli (spectral distribution in Figure 1).
Noise amplitudes were significantly higher in adults compared to both infant groups for most stimuli (Figures 2 and 3, second panels). The difference between adults and infants, level-matched, did not vary in a stimulus-specific manner, χ2(14) = 5.50, p = .599; the adult mean noise amplitude was, on average, 42% higher than infants (ratio = 1.42; 95% CI [1.20, 1.68]). However, the difference did not meet the cutoff for meaningful differences; 1% of the 95% CI overlapped with the interval null. In contrast, noise amplitudes in adults were higher than infants in the higher-level group in a stimulus-specific manner; noise amplitudes in adults were as much as 51% larger for /a/ F1 (ratio = 1.51; 95% CI [1.26, 1.81]) to as low as 10% larger for /u/ F2+ (ratio = 1.10; 95% CI [0.94, 1.30]). Nonetheless, the only stimulus for which the differences surpassed the meaningful cutoff was /a/ F1.
SNRs were higher in adults compared to both groups of infants for a subset of stimuli (Figures 2 and 3, third panels). SNRs in adults were significantly higher than infants in the level-corrected group for /u/ F1 (ratio = 1.91; 95% CI [1.39, 2.62]), /i/ F1 (ratio = 2.64; 95% CI [1.91, 3.64]), /u/ F2+ (ratio = 1.85; 95% CI [1.35, 2.53]), /i/ F2+ (ratio = 1.83; 95% CI [1.39, 2.42]), and /∫/ (ratio = 1.43; 95% CI [1.09, 1.88]). Differences reached the meaningful cutoff only for F1 and F2+ of both /u/ and /i/. Relative to infants in the higher-level group, SNRs in adults were significantly and meaningfully higher for only the low-frequency stimuli, /u/ and /i/ F1 (ratio = 1.59; 95% CI [1.21, 2.09] and ratio = 1.73; 95% CI [1.26, 2.37], respectively).
Adult–infant differences in phase coherence mostly paralleled SNR findings when stimulus level was matched (Figures 2 and 3, bottom panels). Phase coherence was higher in adults compared to infants in the level-corrected group for /u/ F1 (mean difference [transformed] = 0.13; 95% CI [0.06, 0.20]), /i/ F1 (difference = 0.15; 95% CI [0.09, 0.22]), /u/ F2+ (difference = 0.11; 95% CI [0.03, 0.19]), /i/ F2+ (difference = 0.12; 95% CI [0.06, 0.18]), and /∫/ (difference = 0.12; 95% CI [0.05, 0.18]). Stimuli for which the differences reached the meaningful cutoff were /u/ F1, as well as both F1 and F2+ stimuli of /i/, and /∫/. In contrast, phase coherence did not differ between adults and infants in the higher-level group for any of the stimuli.
In summary, adult–infant differences tended to be larger and spread across the frequency range, when stimulus level in infants was calibrated in the ear instead of to the ear simulator representing an average adult ear. The adult–infant differences in EFR amplitude and SNR were limited to low-frequency stimuli when infants received a higher stimulus level. When infants received a higher stimulus level, phase coherence did not differ from adults.
Higher In-Ear Stimulus Level in Infants Improved Characteristics of EFRs Elicited by Mid- to High-Frequency Stimuli
A higher stimulus level in infants significantly increased the amplitude of EFRs elicited by all stimuli except /a/ F1 (Figure 2, top panel). The largest improvement was evident for /u/ F2+ (infants, higher-level to level-matched amplitude ratio = 2.51; 95% CI [2, 3.14]), whereas the smallest change was evident for /a/ F2+ (ratio = 1.21; 95% CI [1.02, 1.43]). Meaningful changes were evident only for /u/ and /i/ F2+ (infants, higher-level to level-matched amplitude /u/ F2+ ratio = 2.51, 95% CI [2, 3.14]; /i/ F2+ ratio = 1.83, 95% CI [1.48, 2.27], respectively). Although the noise amplitudes were numerically higher in the infant group that received the higher stimulus level for /u/ F1 and F2+, none of the noise ratios reached the meaningful cutoff (Figures 2 and 3, second panels). SNRs improved with higher levels for /i/ F1 (ratio = 1.53; 95% CI [1.15, 2.02]), /u/ and /i/ F2+ (/u/ F2+ ratio = 1.85; 95% CI [1.43, 2.28]; /i/ F2+ ratio = 1.59; 95% CI [1.26, 2.01], respectively), and /∫/ (ratio = 1.61; 95% CI [1.25, 2.08]); the differences reached the meaningful cutoff only for the last three stimuli (Figures 2 and 3, third panels). Similarly, phase coherence improved with higher level for /i/ F1 (difference = 0.07; 95% CI [0.02, 0.13]), /u/ and /i/ F2+ (difference = 0.18; 95% CI [0.12, 0.23]; difference = 0.12; 95% CI [0.07, 0.16], respectively), and /∫/ (difference = 0.07; 95% CI [0.02, 0.12]); the differences reached the meaningful cutoff only for the two F2+ stimuli (Figures 2 and 3, bottom panels).
EFR Characteristics Changed With Age Only for a Few Stimuli
As shown in Figure 4, the majority of slopes were not significantly different from 0—that is, the EFR characteristics were not associated with age. There were three exceptions. First, a significant positive slope of 0.27 per 0.3 years (∼4 months) in EFR SNR was found for /i/ F2+ in the infant group that did not receive a stimulus level correction (Figure 4 middle row; this would translate to a 31% increase in SNR every 0.3 years due to the use of log-link). The improvement in SNR for /i/ F2+-elicited EFRs paralleled the age-related trends in EFR amplitude and phase coherence for the same stimulus although neither reached statistical significance (Figure 4 top and bottom panels, respectively). Further, the improvement in SNR for /i/ F2+-elicited EFRs paralleled the age-related trends in EFR SNR evident for other F2+ stimuli, although they too did not reach statistical significance (Figure 4 middle panel).

The three rows of scatter plots display the distribution of EFR amplitude, SNR, and phase coherence as a function of age. Infants in the level-matched group are represented in red squares, and infants in the higher-level group are represented in the blue circles. The group and stimulus-specific estimated slopes in each panel (upper-left [blue] and lower-right [red]) indicate the degree of change every 0.3 years and are color-coded by group. The slopes are adjusted for varied in-ear stimulus levels in the higher-level infant group. Significant slopes are indicated with *. An additional decimal is used for phase coherence slopes to better represent the small values.
The second and third exceptions were significant negative slopes of 0.06 per 0.3 years in phase coherence found for /u/ and /a/ F2+ in the infant group with a stimulus level correction (Figure 4 bottom row). Similar to the first instance of age effects, decreases in phase coherence paralleled decreases in EFR amplitude as well as SNR, although none of the latter measures reached statistical significance.
Discussion
The aims of the study were to evaluate similarities and differences in speech-evoked EFRs between typically developing infants and adults with normal hearing, and the effect of age on EFR characteristics in the first year of life. We found that (a) adult–infant differences were larger for low-frequency stimuli compared to higher frequency stimuli, and when the stimulus level in infant ear canals was corrected using in-ear calibration to match that of an average adult ear, (b) infants demonstrated adult-like intertrial phase consistency for all stimuli when they received a higher stimulus level, (c) the effect of level on EFRs varied by stimulus in infants, (d) residual noise tended to be lower in infants compared to adults, and (e) EFR phase coherence and SNR varied with age for only a few F2+ vowel stimuli.
EFRs Elicited by Higher Frequency Stimuli Possibly Mature Earlier Than Those Elicited by Lower Frequency Stimuli
The present study findings suggest that immaturity in f0-rate EFRs likely exists in infants under 1 year of age for speech or vowel stimuli and that it may be more apparent when assessed in a stimulus frequency-specific manner (e.g., by using dual-f0 vowels like in the present study). This finding is contrary to previous vowel-evoked EFR studies that demonstrate mature vowel-evoked EFRs by the first 3 months of life (Anderson et al., 2015; Jeng et al., 2010; Skoe et al., 2015; Van Dyke et al., 2017). However, immaturity in EFRs elicited at f0-range envelope rates in the first year of life is generally consistent with the immaturity observed in ABRs (Eggermont & Salamy, 1988; Ponton et al., 1992; Salamy, 1984)—responses that have overlapping brainstem generators with EFRs (Bidelman, 2018). In the following, we discuss the nature of frequency and level-specific adult–infant differences in EFRs in more detail.
Adult–infant differences in EFR characteristics tended to be larger for low-frequency stimuli (mainly /u/ and /i/ F1) than for higher frequency fricative stimuli. Such patterns persisted even when infants received a higher stimulus level (Figures 2 and 3). These results suggest that EFRs elicited by stimuli >3 kHz are likely adult-like before those elicited by lower frequency stimuli. Although earlier maturation of EFRs elicited by higher frequency stimuli has been reported in some tone-evoked EFR studies (Lins et al., 1996; Luts et al., 2006), it is not supported by some findings (Levi et al., 1993). Earlier maturation of EFRs elicited by higher frequency stimuli compared to very low-frequency stimuli is, however, supported by ABR studies (Eggermont et al., 1991; Ponton et al., 1992). Indexed by the ABR Wave I-V latency difference, adult-like values are achieved a few weeks earlier for high frequency (5.7 kHz) compared to low-frequency stimuli (0.7 kHz; Ponton et al., 1992). Although the high-low frequency difference resembles ABR studies, the earlier maturation of EFRs elicited by high frequencies compared to midfrequencies in the present study is not supported; in the study by Ponton et al. (1992), the mid-frequencies (1.4 to 2.8 kHz) were the earliest to reach adult-like values. The discrepancy in frequency-specific maturational trends in ABRs and EFRs may (a) suggest different developmental trajectories for encoding stimulus onsets and phase-locking to stimulus envelopes and/or (b) reflect the use of masking techniques in ABR studies to ascertain cochlear-place-specific mapping (Ponton et al., 1992).
One may note stimulus differences that exist between vowel and fricative stimuli; however, these are unlikely contributors to the observed development-related changes. The first stimulus factor that differentiates vowels from fricative stimuli in the present study is the number of EFRs simultaneously elicited—all vowel stimuli were designed to elicit two EFRs, whereas the fricative stimuli were designed to only elicit one EFR at a time. The simultaneous elicitation of two EFRs by vowel stimuli does not significantly influence EFR amplitudes in adults (Easwar et al., 2019); however, the impact has not been measured in infants for vowel stimuli. In a tone-evoked EFR study (Hatton & Stapells, 2011), 6- to 38-week-old infants demonstrated a significant reduction in EFR amplitude when four tones were presented simultaneously compared to when they were presented individually. The reduction, on average, ranged from 3% for 500 Hz tones to 30% for 4 kHz tones; however, the effect of frequency was statistically non-significant. Although such evidence raises the possibility that larger adult–infant differences in vowel-evoked EFRs could, in part, be due to the number of EFRs elicited at the same time, it does not explain the lack of differences for /a/—a stimulus that was designed to elicit two EFRs akin to /u/ and /i/. Therefore, the impact of two versus one EFRs on adult–infant differences is likely minor, if one exists. The second stimulus factor that differentiates vowel and fricative stimuli is the envelope rate at which EFRs are elicited. The average f0 of F1 vowel stimuli was 89.9 Hz, whereas the envelope rate of the fricatives was slightly higher (93.03 Hz). The difference in envelope rate is also likely not a confound because the two envelope rates are close enough to minimize the drop-off in EFR characteristics evident in the envelope-rate transfer function (Mijares Nodarse et al., 2012; Purcell et al., 2004), and phase-locking to lower-rate envelopes generally develops earlier than to higher-rate envelopes (Brugge et al., 1993).
The frequency-dependent pattern in adult–infant differences may also reflect the methodological choice of correcting for the overall level of the stimulus. It is well known that the smaller ear canal volume of infants boosts the higher frequencies by a larger amount compared to the lower frequencies (Bagatto et al., 2002, 2005; Rance & Tomlin, 2006; Sininger et al., 1997). Therefore, the use of an overall downward level correction (an approximate average of frequency-specific differences) may have led to some level boost at the higher frequencies compared to lower frequencies. Such a level correction could also have resulted in lower-than-intended stimulus levels for the low-frequency dominant stimuli. In infants who received the higher stimulus level, the overall level boost would likely have provided the most benefit to the EFRs elicited by high frequency stimuli. Because level improves EFR characteristics in general (Easwar et al., 2015c, 2021), the possibility of higher stimulus levels for higher frequency stimuli could have reduced adult–infant differences for fricative stimuli relative to lower-frequency vowel stimuli.
Higher Stimulus Level in Infant Test Ears May Obscure Some Developmental Trends in EFRs
Smaller or no differences between infants and adults when infants receive a higher stimulus level parallel previous threshold-based studies in tone-evoked EFRs and ABRs (Rance & Tomlin, 2006; Sininger et al., 1997). Increases in stimulus level lead to higher EFR amplitudes, SNR, and phase coherence (Easwar et al., 2015b, 2020a, 2020b). Likewise, development-driven changes often result in improved EFR characteristics. Therefore, when infants receive a higher stimulus level naturally arising from a smaller-than-adult infant ear, the effect of neural immaturity on EFRs is possibly hidden or offset by the effect of higher stimulus level on EFRs. In contrast to tone-evoked EFRs, vowel-evoked EFR characteristics in infants have been found to demonstrate adult-like characteristics even when the stimulus level was lowered by 5 dB in infants—a correction based on the average real-ear-to-coupler difference for 1-month-old infants (Jeng et al., 2010). Differences in findings between the present study and Jeng et al. (2010) are not readily explained by age; the average ages of infants used in the present study and Jeng et al. (2010) were 6.6 and 5.7 months, respectively. Differences may, therefore, be due to one or more of the following: (a) smaller correction factor used in Jeng et al. 2010 (5 dB vs. an average of 14 dB in the present study), (b) the use of broadband vowel stimuli, especially with a rising f0 contour in Jeng et al. (2010), and (c) the use of varied outcome metrics of interest making direct comparison more challenging.
The lack of adult–infant differences for EFR phase coherence when considering infants in the higher-level group is supported by a previous study using broadband vowel stimuli (Van Dyke et al., 2017). In the study by Van Dyke and colleagues, EFRs were measured from 2- to 12-month-old infants and adults in response to /a/ in /ba/ and /ga/, presented monaurally (no level correction was used). The phase-locking value, broadly similar to phase coherence used in the present study, was adult-like in all infants for the 60 ms long steady-state portion of the vowel. Together, these results suggest that stimulus level influences the developmental patterns that are observed: with higher stimulus levels, EFR characteristics in infants begin to approximate adult values.
Level-Dependent Change in Infant EFRs Is Likely Frequency-Dependent
Although the primary purpose of the study was to evaluate adult–infant differences in EFRs, our study design permitted the evaluation of stimulus level in infants in a between-subject manner. The stimulus level in the higher-level-infant group was, on average, ∼14 dB higher than the level in the level-matched-infant group. Improvements in EFR characteristics were evident in amplitude, SNR, and phase coherence and for a subset of low-, mid-, and high-frequency stimuli (Figure 2). However, changes greater than our a priori cutoffs were evident mainly for F2+ vowel stimuli and the fricatives (Figure 3). The stimulus or frequency dependency for level effects on EFRs parallels our previous study in adults (Easwar et al., 2015b); however, the pattern varies. Whereas infants in the present study demonstrated larger changes for mid- to high-frequency stimuli, adults in the Easwar et al. (2015a) study demonstrated larger changes for low-frequency stimuli. While such discrepancies may indicate developmental changes, it is also possible that some of the differences are due to the range of stimulus levels (or sensation levels) used. The range of stimulus levels used matters because of the non-linear rate of growth in EFR characteristics, irrespective of stimulus or its frequency (Easwar et al., 2021). Nonetheless, comparisons with studies in adults need to be interpreted with caution as they mostly use within-subject designs. Differences in level effects between infants and adults and across stimuli of different frequencies within infants have also been reported for speech-evoked cortical potentials (Purdy et al., 2013). Given the growing interest in using cortical evoked potentials and EFRs for evaluating audibility of speech in infants with and without hearing aids, such adult–infant differences in level effects caution generalization of findings from adults and emphasize the need for evaluation in the target population.
Residual Noise Levels Are Lower in Infants Than in Adults
Relative to adults, the residual noise levels in both infant groups tended to be lower despite a similar number of stimulus trials (Figures 2 and 3, second panel). These differences expectedly parallel the higher noise rejection thresholds in adults (see the Methods section). Lower noise levels in infants relative to adults have been reported in previous work (Luts et al., 2006) and likely arise from differences in resting state during EEG recordings. In the case of infants in the present study, EEG recordings were completed only when infants were observed to be asleep or nearly asleep. However, in the case of adults, EEG recordings continued irrespective of the sleep state as long as they appeared to be resting and minimized movements per instructions. In general, the lower residual noise in infants is clinically favorable as it will likely facilitate statistical detection of smaller amplitude EFRs (Picton et al., 2005). Although noise levels varied between infants and adults, and the differences in noise levels likely influence EFR amplitude estimates (Picton et al., 2005), the use of unbiased EFR amplitudes and SNR in the present study reduced possible confounds in interpreting between-group differences.
EFR Characteristics for the Majority of Stimuli Remain Steady Within the First Year of Life
Quantification of changes in EFR characteristics with age is useful to determine whether age-specific normative data are necessary for comparisons with clinical populations of interest. Our findings indicate no significant associations with age for infants in the first year of life for most stimuli (Figure 4). These results generally concur with previous studies, despite using more frequency-specific stimuli. Previous studies that have assessed the effect of age on EFR characteristics, either using age as a continuous variable (with regression or correlation; Anderson et al., 2015) or a categorical variable (< or >7 months; using analysis of variance; Van Dyke et al., 2017), have shown improvements in phase-locking to the fine structure or the higher order harmonics of f0 but not to f0 itself. The lack of association between age and EFR characteristics during infancy given adult–infant differences (for example, for low-frequency stimuli) suggests that development-related changes in envelope encoding and EFR characteristics likely continue beyond the first year of life.
The first exception for age effects is an SNR improvement in EFRs elicited by /i/ F2+ in the infant group that did not receive a stimulus level correction (Figure 4 middle panel). Age-related improvements in SNR during development is an expected pattern and may reflect better precision in phase-locking to the stimulus envelope due to improved myelination (Sano et al., 2007) and synaptic efficiency (Hecox & Burkard, 1982; Ponton et al., 1992), improved transfer efficiency of the conductive pathway (review by Abdala & Keefe, 2011), and possibly increased central contributions to the scalp-recorded EFR with ongoing postnatal development of cortical and thalamocortical pathways (Moore & Guan, 2001). Although an improvement is evident with age, there were no group differences compared to adults (Figures 2 and 3). Together, these results suggest that the majority of development-related changes for /i/ F2+-elicited EFRs likely occur within the first year.
The second and third instances with significant age effects in the present study were for phase coherence of EFRs elicited by /u/ and /i/ F2+. Phase coherence decreased by ∼0.18 over the age range evaluated when in-ear stimulus levels equaled 65 dB SPL. The direction of change differs from that for infants in the higher-level group and the positive slope in the higher-level infant group persisted even when in-ear to ear-simulator level difference was not used as a covariate. The underlying cause for such a difference is unclear. The reduction is neither explained by neural development nor by changes in the conductive pathway because age and hearing status were similar in the two infant groups. It is possible that the constant overall 65 dB SPL (in-ear) led to a disproportionate decrease in stimulus levels above ∼1 kHz (the F2+-dominant region) compared to lower frequencies, as age increased (Voss & Herrmann, 2005). Alternatively, the patterns observed may reflect nonlinear interactions between stimulus or sensation level and EFR generation, not captured with the covariate used. Nonetheless, the disparity emphasizes the importance of controlling for in-ear stimulus level calibration and the challenge in generalizing findings across studies with different level calibration methods.
Summary and Conclusions
The present study investigated the nature of EFRs elicited by band-limited vowel and fricative stimuli in normal-hearing infants younger than 1 year of age. Contrary to previous studies in vowel-evoked EFRs, adult–infant comparisons revealed frequency-specific effects of neural immaturity in phase-locking. Specifically, larger adult–infant differences in EFR characteristics were evident for low- to mid-frequency dominant vowel stimuli, especially when stimulus level in infant ear canals was controlled for. Significant adult–infant differences were evident for low-frequency stimuli in some EFR characteristics even when higher stimulus levels were used in infants. Except for three instances, there were no significant age-dependent changes in EFR amplitude, SNR, or phase coherence during infancy. Together, the present study draws attention to frequency-specific developmental trends of EFRs during infancy that are additionally influenced by in-ear stimulus level.
Supplemental Material
sj-jpg-1-tia-10.1177_23312165211004331 - Supplemental material for Characteristics of Speech-Evoked Envelope Following Responses in Infancy
Supplemental material, sj-jpg-1-tia-10.1177_23312165211004331 for Characteristics of Speech-Evoked Envelope Following Responses in Infancy by Vijayalakshmi Easwar, Susan Scollie, Michael Lasarev, Matthew Urichuk, Steven J Aiken and David W Purcell in Trends in Hearing
Supplemental Material
sj-jpg-2-tia-10.1177_23312165211004331 - Supplemental material for Characteristics of Speech-Evoked Envelope Following Responses in Infancy
Supplemental material, sj-jpg-2-tia-10.1177_23312165211004331 for Characteristics of Speech-Evoked Envelope Following Responses in Infancy by Vijayalakshmi Easwar, Susan Scollie, Michael Lasarev, Matthew Urichuk, Steven J Aiken and David W Purcell in Trends in Hearing
Supplemental Material
sj-pdf-3-tia-10.1177_23312165211004331 - Supplemental material for Characteristics of Speech-Evoked Envelope Following Responses in Infancy
Supplemental material, sj-pdf-3-tia-10.1177_23312165211004331 for Characteristics of Speech-Evoked Envelope Following Responses in Infancy by Vijayalakshmi Easwar, Susan Scollie, Michael Lasarev, Matthew Urichuk, Steven J Aiken and David W Purcell in Trends in Hearing
Footnotes
Acknowledgments
The authors thank Dr. Marlene Bagatto for providing normative data for the LittlEARS questionnaire and for preliminary discussions regarding in-ear stimulus level measurement in infants.
Author Contributions
V. E. designed the study, collected infant data, analyzed EFR data, and wrote the article. S. S. designed the study and reviewed the article, M. L. completed statistical analysis and reviewed the article, M. U. collected adult data and reviewed the article, S. J. A. consulted regarding study design and reviewed the article, and D. P. designed the study, analyzed EFR data, and edited the article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by a Collaborative Health Research Project grant from the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council of Canada (grant #493836-2016; Western University) and by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS; grant #UL1TR002373; University of Wisconsin-Madison).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
