Abstract
Background
The fascinating ability of brain to integrate information from multiple sensory inputs has intrigued many researchers. Audio–visual (AV) interaction is a form of multisensory integration which we encounter to form meaningful representations of the environment around us. There is limited literature related to the underlying neural mechanisms.
Purpose
Quantitative EEG (QEEG), a tool with high temporal resolution can be used to understand cortical sources of AV interactions.
Methods
EEG data was recorded using 128 channels from 30 healthy subjects using audio, visual and AV stimuli in ‘object detection task’. Electrical source imaging was performed using s-LORETA across seven frequency bands (lower alpha 1, lower alpha 2, upper alpha, beta, delta, gamma, theta) during AV versus unimodal conditions across 66 gyri.
Results
The cortical sources were activated in the theta, beta, and gamma bands in cross modal versus unimodal conditions, which we propose, reflect neural communication for AV interaction network. The cortical sources constituted areas involved with visual processing, auditory processing, established multisensory (frontotemporal cortex, parietal cortex, middle temporal gyrus, superior frontal gyrus, inferior frontal gyrus, precentral gyrus) and potential multisensory areas (paracentral, postcentral and subcallosal).
Conclusion
Together, these results offer an integrative view of cortical areas in frequency oscillations during AV interactions.
Introduction
Multisensory integration affects the way we perceive the things around us. Multiple sensory systems may detect objects and events, and inputs from several sensory channels are frequently merged to generate a coherent perception, creating an environment rich in information. Our perceptual system can effortlessly integrate the information received from different modalities and phenomenon is known as cross-modal interaction or multisensory integration. 1 It may enhance the perception in terms of greater accuracy and faster reaction time.2, 3 Alternatively, the interactions between conflicting information emanating from different sensory modalities may diminish the perception or may even completely alter it producing an illusion. 4
Since perception determines cognition and human behaviour to a great extent, it has always fascinated researchers to explore where and how information through auditory and visual sensory channels interact within the nervous system to provide us with a meaningful representation of the outer world. Though the robustness of the behavioural responses in multisensory integration is well documented, little is known about the underlying neural mechanisms. 5
One of the earliest demonstrations of multisensory interactions at primary sensory cortices was provided by visual activation of auditory cortex during lip reading using fMRI. 2 Several other researchers observed such interactions which were further confirmed by local field potential recordings from auditory cortex.6, 7 Similarly, effects of auditory stimuli were shown over visual areas suggesting that these areas were not strictly unimodal and the crossmodal interaction was initiated much before it was believed to occur.8, 9 It is known to involve hierarchical levels of brain processing, including superior colliculus, primary visual and auditory cortices, and superior temporal sulcus and intraparietal areas. 10
As the process of cross-modal interactions is greatly influenced by the nature of stimuli and the task, various techniques and paradigms have been used to explore the same. 11 Quantitative EEG (QEEG) is one of the tools which can be used to assess the cortical sources during audio–visual (AV) interactions with superlative temporal resolution in comparison to other neuroimaging modalities like fMRI and PET and with the advancement in the number of electrodes, with 128 channel EEG improved spatial resolution as well.
Thus, the present study is aimed to understand the neural signatures of cross-modal interactions as compared to the unimodal stimuli using QEEG. Three types of stimuli were designed to delve into the same viz. visual, auditory, and cross-modal (AV) using object detection tasks. These stimuli consisted of pictures and sounds of 20 familiar animals with characteristic voices and 20 familiar artefacts (non-living objects) with characteristic sounds. Further single trial 128 channel EEG was acquired, and source analysis was done across seven frequency bands (lower alpha 1, lower-alpha 2, upper-alpha, beta, delta, gamma, theta) in cross-modal versus unimodal conditions.
We hypothesise that specific EEG frequencies with their sources are implicated in AV integration, which is over and above respective unimodal interactions.
Methods
Participants
Data was obtained from 30 right-handed healthy volunteers of either gender (20 males and 10 females; mean age 27.2 ± 7.2 years) after written informed consent. Participants were excluded if they self-reported any past or present history of psychiatric/neurologic disorders/drug abuse. Due to the experimental requirement, all of the participants had normal or corrected-to-normal vision (none of the participants were colour-blind) and normal hearing capabilities. Ethical clearance for conducting the study was obtained from the Institute Ethics Committee (IECPG233-30-3-2016).
Stimuli and Task
An object detection task was used consisting of two kinds of blocks, that is, detect animals and detect artefacts. Pictures and sounds of 20 familiar animals with characteristic voices and 20 familiar artefacts (non-living objects) with characteristic sounds were taken from free internet databases. To ensure uniformity in terms of size, resolution and sound pressure levels, pictures were edited and standardised using Adobe Photoshop CC, (resolution: 1024×768;72 dpi) to JPEGs with a grey background, and sounds were edited and standardised using Audacity software 2.2.1 to .wav files of 2 seconds duration and 50–70 dB of sound pressure level.
Each block was subdivided into three sub-blocks with 30 trials each. Visual object detection, auditory object detection and AV object detection. Each sub-block had equal numbers of Animal and Artefact stimuli. These sub-blocks were presented in a random manner across different blocks (Figure 1).
Object Detection Task Block and Sub-block. (a) Detect Animal Block. (b) Detect Artefact Block.
At the initiation of each block, an instruction was given to detect the block-specific object (e.g., Animal in Detect Animal Block) in each trial of presented stimuli viz. pictures, audio or AVs. The stimuli were presented for a maximum period of 2000 msec with a response terminating the stimulus. The inter-stimulus interval of 1500 msec was provided between two trials. 12 Response was obtained with the help of a response pad, where pressing key 1 was for ‘Yes’, if the subject detected the object to be identified and key 2 for ‘No’ if they did not (Figure 2).
Study Design of Object Detection Task Representing Single Trial; ISI: Inter-stimulus Interval.
Stimuli presentation and response collection were done using the E prime v2.0 software. The subject’s monitor was a 17” LCD screen, which was placed 62 cm in front of the participant on a monitor stand to facilitate easy visibility of the centre of the screen where the stimuli appeared. Speakers for the auditory stimuli were placed next to the screen, 26 cm on either side at the same distance from the participant. 13 The experimenter used a similar monitor with a mirrored display of the subject’s monitor to enable verification of the stimuli being presented to the subject (Figure 3). Mean Reaction Times and Error rates were computed for all the object detection tasks under the three subtypes of auditory, visual, and AV tasks.
Experimental Setup with the Subject Performing Object Detection Task.
EEG Acquisition Set Up
A 128-channel Hydrocel Geodesic Sensor Net (HCGSN) from Electrical Geodesics, Inc. Oregon, USA (MagStim now) was used for acquisition while subject was performing the task at 1000 Hz sampling rate with Cz as recording reference. The HCGSN had a dense array Hydrocel high impedance (50kΩ) Ag/AgCl electrodes held in a tessellated geodesic forming a network that stretches across the head. The HCGSN was connected to the Net Amps 300 by means of a hypertronics connector. Net Amps 300 is a differential amplifier with an input impedance of 200MΩ with a dynamic range of ±200mV, a 4000 Hz anti-aliasing in-line filter. It uses a 24-bit analogue to digital converter to digitise the acquired EEG and transfers the digital data via firewire to the data acquisition computer for storage and live display.
EEG Data Processing
EEG data was bandpass filtered at 0.1–100 Hz with 50 Hz notch filter. The EEG data was filtered into seven frequency bands: delta (0.1–3 Hz), theta (4–6 Hz), lower alpha 1 (7–9 Hz), lower alpha 2 (10–12 Hz), upper alpha (12–13 Hz), beta (14–30 Hz), gamma (31–100 Hz). Further, segmentation was done into 800 ms long epochs (200 ms pre-stimulus to 600 ms post-stimulus) of trials with correct responses 14 in all three categories, that is, visual, auditory and AV from all the four blocks (detect animal & detect artefact) (Figure 4). Artefactual activity was identified and removed using Independent Component Analysis (ICA) in the EEGLAB toolbox v14.0 on MATLAB R2017b (Mathworks Inc., USA). 15
Segmentation Strategy.
EEG Source Imaging
The raw EEG data was filtered into the 0.1 Hz–100 Hz band and was subjected to pre-processing. Geosource function of Netstation 5.3 was used for source localisation. Electrical source imaging was performed on all seven frequency bands EEG data using sLORETA on GeoSource toolbox of Netstation 5.3. 16 It provides current source density in nano-amperes per square metre (nA/m2) of the probabilistic locations, according to average MRI atlas of the Montreal Neurologic Institute (MNI), of 2447 source voxels in 66 gyri. Three orthogonal dipole moments (x, y, z) were defined and solved for each of the source voxels. The source space was restricted to 2447 cortical voxels (7 mm3) that were each assigned to a gyrus on the basis of the Montreal Neurological Institute probabilistic atlas. All inverse modelling was performed using GeoSource (EGI) in that 2447 voxels were clustered as 66 gyri. The final output is a data matrix of gyri*time*csd. It was done for all 30 subjects in AV versus unimodal conditions.
Statistical Analysis
Thirty-five subjects were interviewed and screened using Edinburgh handedness inventory, out of which three subjects were excluded as they were lefthanded. Thirty-two subjects performed the task, and their EEG were recorded. Data of two subjects were discarded due to noise in the EEG data. Thirty-two subjects (20 males, 10 females, mean age = 27.2 years) were analysed further.
Results
Behavioural Data
Kolmogorov-Smirnov normality test was applied. The data was non-parametric in distribution. Comparisons of reaction times and error rates of all the categories were performed using related samples from Friedman’s two-way analysis of variance by ranks. Post hoc pairwise comparisons were performed with Bonferroni’s correction taking alpha = 0.05
The behavioural data was calculated as reaction time and error rates for each of the three tasks, as represented in Table 1.
Behavioural Data (Reaction Time [ms] and Error Rates [%]) for Visual, Auditory and AV Tasks.
The error rates were significantly higher in the unimodal auditory and visual object detection tasks compared to the AV task. The reaction time in the unimodal auditory task was significantly higher than in the visual and AV tasks.
EEG Cortical Source Analysis
The mean current source densities across 66 gyri during different task categories for the correct responses were obtained for whole 800 ms epochs, but statistical comparison was done only for first 300 ms segment post-stimulus into three temporal frames of 100 ms each, as this period has been reported to be the most robust temporal window for AV interaction. 17 The corresponding temporal frames of different categories (AV vs. visual and AV vs. auditory) were compared in all 30 subjects. Statistical analysis was performed on the Statistical Analysis toolbox of MATLAB R2017B, Mathworks. Kolmogorov-Smirnov goodness of fit test was performed for normality. Wilcoxon signed-rank test was performed on the non-parametric data using the sign rank function in MATLAB to compare brain region activity during AV versus unimodal (visual or auditory) conditions. To avoid multiple comparison problems, Bonferroni correction was performed by dividing the critical P value (alpha) by the number of hypothesis (66 pairs of current source density of gyri for each condition was tested), that is, 0.05/66. The same was done for all seven frequency bands. Significantly activated gyri were plotted on orthogonal views of the cerebral cortex using the LORETA software. The results were obtained in the form of statistically significant activation/deactivation (changes in the current source densities) of each frequency band across 66 gyri during AV versus unimodal (visual or auditory) conditions.
The cross-modal (AV) changes were compared to unimodal (audio or visual) across the seven frequency bands (mentioned above). Further, the cortical areas of the frequencies predominantly involved in AV processing were ascertained. The results are depicted as:
Frequency Responses
Higher activations (higher current source densities) were seen in theta, beta and gamma frequency bands during AV tasks over and above unimodal auditory and visual tasks (Figure 5).
The cortical source activation during AV versus unimodal task conditions, that is, visual and auditory, have been plotted in form of orthogonal views in sLORETA. Red colour denotes activation of the cortical gyri during three time points, that is, (a) 0–100 ms, (b) 101–200 ms and (c) 201–300 ms across; (A) Theta, (B) Beta and (C) Gamma frequency bands.
Cortical Source Responses
Cortical source imaging was performed on EEG data using sLORETA on GeoSource toolbox of Netstation 5.3. The source plots showing mean difference in the current source density (csd) for gyri with significant activation were plotted for the above-mentioned frequency bands in AV versus unimodal interactions (Figure 5).
The significant activations are enlisted in table below (Table 2) and represented in orthogonal plots in theta (Figure 6), beta band (Figure 7) and gamma (Figure 8).
Cortical Areas with Increased Current Source Density (csd) in AV Versus Auditory and Visual.
Cortical sources showing differential theta activation during AV conditions over and above the unimodal conditions. Red colour denotes activation of the cortical gyri.
Cortical sources showing differential beta activation during AV conditions over and above the unimodal conditions. Red colour denotes activation of the cortical gyri
Cortical sources showing differential gamma activation during AV conditions over and above the unimodal conditions. Red colour denotes activation of the cortical gyri.
Discussion
The current study was conducted to look into the cortical sources of AV (cross-modal) versus unimodal (visual or auditory) processing during an object detection paradigm. 128-channel QEEG was acquired, and source analysis was done using sLORETA across seven frequency bands. Average visual reaction time was significantly lower than average auditory reaction time in unimodal conditions. Normally auditory processing is faster than visual processing for simpler stimuli due to faster mechano-transduction and lesser number of synaptic connections in the auditory sensory pathway.18, 19 But for complex stimuli visual is the dominant stimulus with respect to the auditory 20 ; visual processing exhibiting dominance over auditory underlies the phenomena like visual capture or ventriloquist effect.21, 22 Bimodal stimuli like the AV stimuli have significantly lower error rates suggesting AV integration facilitates object recognition.1, 23
The theta, beta and gamma frequency band responses showed significant activity to cross-modal as compared to unimodal stimuli. We propose that the activity in these frequency bands reflect the neural communication attributed to crossmodal AV network. Similar observations were reported by some previous studies. 24 Many studies suggest that gamma facilitates feature binding during recall of multisensory objects.25, 25 Gamma oscillations represent active processing in attentional networks, while theta-band activity may be related to top-down regulation. 26 Theta activity in frontal areas indicates multisensory divided attention. Freise et al investigated attention-modulated cortical oscillations in a novel multisensory paradigm and found higher gamma frequency in the following gyri namely: right medial frontal, inferior frontal, precentral, left middle temporal and decreased beta band activity in cingulate and superior temporal cortices by using magnetoencephalography. 24 These studies are in line with the findings in the current study. Contrary to our findings, Van et al. reported changes in the alpha phase dynamics between the sensory cortices, which may help to modulate cross-modal interactions. 27
Some researchers opine that not the individual frequency bands but it’s their coactivation that serves as a neural mechanism for crossmodal processing. Theta and gamma coactivation are possibly responsible for successful deployment of attention during goal-directed behaviour. It also facilitates segregation of relevant information from irrelevant ones during AV scene. 28 While Daume et al. documented that there occurs high theta-beta phase amplitude-coupling during AV task as compared to unimodal. 29 Contrary to this, Sakowitz et al., implicated theta and alpha band oscillations, especially in the fronto-central parietal areas. 30 Based on literature, it is suggested that theta band is involved in cognitive control, while alpha oscillations aid in preservation of sensory information and decreased distractions during multisensory integration. 31 However, Schelenz et al. reported suppression of alpha oscillations in bilateral occipital lobe on using multisensory stimuli, unlike those reported earlier and compared to our findings. 32 This discrepancy could result from emotional stimuli and distinct paradigms in their study. Keil et al recently stated that bottom-up processing engages local networks in the gamma band (>30 Hz), whereas top-down control through long-range integrative processing engages lower frequency bands (<30 Hz). 33 Our findings are in line with the above observations, wherein there is an interplay of low and high-frequency EEG bands for cross-modal interactions.
Further, we aimed to elucidate the cortical sources activated in AV tasks in comparison to unimodal tasks as shown in Table 1. Numerous researchers have proposed that synchronisation of neuronal signals in multiple brain regions have role in establishing connections across different sensory modalities. 33 Previous literature has provided evidence of cross-modal integration in traditional sensory-specific cortices with the help of direct anatomical connections as well as functional connectivity.2, 34 In the current study, for better understanding these areas could be divided into: areas of visual processing (Middle occipital gyrus, Insula, medial frontal gyrus) areas of auditory processing: Supramarginal gyrus, Transverse temporal gyrus, Superior parietal lobule and inferior parietal lobule. 35 Recently, Park & Kayser that superior parietal cortex is a hub of AV interactions using whole brain magnetoencephalography during ventriloquist task. 36 Several other areas known to get activated were the multisensory areas as previously described in literature. These include frontotemporal cortex, superior frontal gyrus and inferior frontal gyrus. Frontal lobe has been postulated in the integration of multisensory (auditory and visual) modalities by various researchers using neuroimaging tools.37, 38 Similarly, in our study activation in superior frontal gyrus, inferior frontal gyrus and frontotemporal cortices emphasise their role in multisensory integration. Further, activation in parietal cortex, middle temporal gyrus and precentral gyrus was also seen. These have been reported as multisensory areas previously in the literature.39, 40 In fact, Brang et al. conceptualised multisensory convergence model wherein the information is relayed from primary sensory areas to temporal, parietal and frontal cortices leading to behavioural facilitation in motor cortex. 41
Additionally, we found certain cortical areas possessing crossmodal activity, which have not been identified in literature as of now. These were paracentral gyrus, precentral gyrus and subcallosal gyrus. Thus, these may be potential cross-modal areas having a role in AV interactions. These areas may further be explored based on region of interest analysis to envisage their role in multisensory integration.
Conclusion
So, we conclude that different frequency domains originating in distinct subsets of the cortex are correlated with distinct connectivity patterns of these cortical areas. This complex interplay of frequency and cortical areas forming a communication network possibly determines the dynamic role of cortex during cross-modal AV processing.
To the best of our knowledge, this is the first study reporting cortical sources of AV processing using 128 channels (superlative spatiotemporal resolution) in all seven frequency bands. Potential areas of cross-modal interaction have also been reported using EEG, which is not available in literature. These crossmodal areas may further be used for rehabilitative purposes in patients with sensory disabilities.
The limitations of our study are that other EEG analyses investigate connectivity, microstate changes during cross-modal AV interactions. Further, causality analysis of EEG data needs to be performed to understand directionality of functional brain connectivity.
Footnotes
Acknowledgements
The authors would like to thank the volunteers for participating in the current study.
Authors’ Contribution
MD: Recruitment of subjects, conduction of study, analysis and drafting of manuscript, approved the final version of the manuscript; VC: EEG analysis, drafting of manuscript, approved the final version of the manuscript, PT: Approved the final version of the manuscript; RS: Overall supervision of the study, approved the final version of the manuscript; SK (corresponding author): Conceived the idea, interpretation of results, drafting of manuscript, approved the final version of the manuscript
Data Sharing
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Statement of Ethics
The current study was undertaken on human volunteers. The informed consent for the same and due ethical clearance has been taken the Institute Ethics Committee (IECPG233-30-3-2016). The procedures used in this study adhere to the tenets of the Declaration of Helsinki.
