Abstract
There is a substantial corpus of evidence about the shared brain mechanism between imagery and perception. However, the psychophysiological exploration concerning possible mechanisms underlying modality-independent mental imagery and how much it involves or depends on other cognitive processes, such as attention and memory, is still scant. To address this, we performed an electroencephalographic (EEG) study to compare brain activity related to auditory/musical and visual imagery to identify shared and specific activity patterns. The electrical brain activity of 12 musicians was measured while they perceived or imagined either auditory or visual stimuli with two levels of complexity. Mean power differences were calculated between tasks and conditions for alpha, beta, and gamma bands. For the alpha band, an increase in mean relative power was observed in occipital and parietal areas throughout the paradigm. Regarding level of complexity of the imagery tasks, same-modality contrasts (i.e., auditory simple vs. auditory complex) showed small-magnitude differences in power that were only significant for a few locations along the midline. However, when comparing across imagery modalities (i.e., auditory simple vs. visual simple), the significantly different locations increased in number and were mainly distributed in central and occipital regions, where differences were positive for mean alpha power and negative for beta and gamma bands. With respect to resting (baseline) power, the main differences were observed as a wider distribution of beta power increments in relation to visual imagery tasks, and similarities were identified for both auditory and visual imagery tasks as an increase in gamma power in right frontal locations, increased frontal and pre-frontal beta power, and some bilateral parietal increased beta power. These results seem to support the hypothesis that mental imagery, to some extent, involves areas related to memory retrieval, attention, semantic processing, and motor preparation, in addition to the primary areas corresponding to each sensory modality.
Introduction
In a historical review on the subject, Kosslyn et al. (1995) described how mental imagery constituted the basis of the first theories of mental activity. Beginning from a philosophical approach (e.g., see description of phantasia in Aristotle’s De Anima iii 3, 414b33–415a3) and eventually through methods of cognitive neuroscience, research into mental imagery has made substantial progress. However, the private and subjective nature of this cognitive process poses considerable methodological difficulties, particularly to evaluating its structure and function. This is especially the case for auditory/musical imagery, considering the temporal essence of the sound, ergo its supposedly inherent difficulty to be manipulated as “easily” as visual objects. As stated by Halpern, “in studying any mental imagery, the challenge is to externalise what is essentially an internal experience to examine what it means to have, in the case of musical imagery, a ‘tune inside the head’” (Halpern, 2001, p. 180). Nevertheless, it is precisely because of the complex nature of sound, and specifically music, that to study it can yield great benefits for understanding human cognition (e.g., Hubbard, 2010; Kraemer et al., 2005). There are several definitions of music imagery (Halpern, 2001; Intons-Peterson et al., 1992; Kraemer et al., 2005; Liikkanen, 2012; Navarro Cebrian & Janata, 2010; Zatorre & Halpern, 2005), but despite some differences, all definitions coincide in one aspect: its phenomenological resemblance to music perception, without external stimulation. Through evidence from several studies, it is now well known that music imagery is created in real time and contains precise information about musical features, such as tempo, pitch, melodic and harmonic relations, and similar sensory qualities as the perceptual experience (Brodsky et al., 2003; Jakubowski, 2020). There is also evidence that musicians perform better in tasks involving auditory imagery than non-musicians (Aleman et al., 2000; Janata & Paroo, 2006; Lotze et al., 2003). Prior and related to this, Carl Seashore (1937) stated that music imagery is necessary for crucial aspects of music practice: learning, recognition, memory, recall and anticipation of musical elements. From neuroimaging and electrophysiological studies, evidence has shown that brain activity related to music imagery resembles that from music perception regarding the participation of temporal lobe areas (e.g., Bunzeck et al., 2005; Ding et al., 2019; Herholz et al., 2012; Kraemer et al., 2005; Martin et al., 2018; Schaefer et al., 2011, 2013; Zatorre & Halpern, 2005). However, Zhang et al. (2017) noted that music imagery recruited a much more complex pattern of brain activations compared to music perception, positively correlated to the attention network and the motor control network in the prefrontal cortex and negatively correlated with the default-mode network. Other brain areas that have been identified as being involved in music imagery are the frontal, parietal and occipito-parietal, and supplementary motor areas (Ding et al., 2019; Foster et al., 2013; Halpern & Zatorre, 1999; Hickok et al., 2003; Meister et al., 2004; Meyer et al., 2007; Schaefer et al., 2011; Schürmann et al., 2002; Yoo et al., 2001). From these results, having been considered first as a kind of auditory imagery, “it is now clear that musical imagery also often comprises components of motor and visual imagery” (Jakubowski, 2020, p. 187). Among all the evidence stated above, specific differences are reported not only in considering brain activations possibly related to different types of stimuli (i.e., isolated frequencies, naturalistic music, etc.) (Schaefer et al., 2011, 2013) but also inter-individual differences (Hubbard, 2010; Schaefer et al., 2011). Furthermore, although evidence has shown that mental imagery widely shares the recruitment of brain areas involved in the processing of the corresponding sensory modality, it is still unclear whether it is possible to identify neural networks accounting for imagery processes independent of modality. In a functional magnetic resonance imaging (fMRI) study comparing auditory and visual imagery with perception in both modalities, Daselaar et al. (2010) reported the identification of what they called a “core” imagery network, independent of imagery modality. This network includes the posterior cingulate, lateral parietal, and medial and superior prefrontal cortices. The authors also suggest that the default mode network (DMN) may be involved in a “multimodal constructive process underlying mental imagery.”
Zvyagintsev et al. (2013) carried out another fMRI study aimed at investigating supramodal and modality-specific brain networks related to mental imagery of auditory and visual information. The authors designed an experimental paradigm with longer imagery periods (28 s) compared to those used by Daselaar et al. (2010) (3 s), considering that in such a short period some relevant brain activation related to the several stages of mental imagery (Sack et al., 2002, 2005, 2008) might have been missed. Their results were consistent with those reported by Daselaar et al. (2010), but they reported activation also in brain areas related to memory processing (parahipoccampal gyrus and inferior frontal gyrus bilaterally), attention-related areas (left superior gyrus and medial frontal gyrus), semantic processing areas (left inferior frontal gyrus), motor preparation areas (supplementary motor area), and multisensory integration areas (left medial temporal gyrus and left angular gyrus). Overall, they report that imagery in both visual and auditory modalities was associated with changes in activity in widely distributed areas, including frontal, parietal, temporal areas, and the cerebellum. To gather more information on this issue, we decided to compare electrical brain activity during auditory imagery and visual imagery of equivalent stimuli using an electroencephalographic (EEG) paradigm. On the basis of the above evidence, we not only expected that each imagery modality would maintain its own specific behavior, namely activity in temporal areas related to music imagery and activity in occipital areas related to visual imagery, but we also hypothesized that brain activity shared by both modalities (auditory and visual), which could reflect modality-independent imagery activation, would be observed mainly in frontal, prefrontal, and parietal areas. Second, as specific differences have been reported in relation to different types of stimuli (Schaefer et al., 2011, 2013), we decided to include stimuli with two different levels of complexity, to identify specific brain activity related to the features of the stimuli. Schaefer et al. (2013) suggested that the overlap between perception and imagery may differ depending on the cognitive processing of different types of stimuli. To address this, they performed a meta-analysis of four EEG experiments using principal components analysis (PCA), comparing four sets of perception and imagery tasks of four different types of stimuli (accents, rhythms, isochronous melodies, and naturalistic music). They reported central, fronto-central, and parietal activation during both tasks (perception and imagery); in terms of complexity, they found that activation was substantially similar with subtle differences—i.e., some components were absent or did not show common components in some experiments. Their results suggest that there is some aspect of brain activation during imagery that does not depend on stimulus complexity but that aspects of imagery that are common to perception do vary with stimulus type.
In a previous study, Schaefer et al. (2011) reported that there do not appear to be specific responses related to the task or stimulus processing but that the same network appears to be differentially modulated related to interpersonal differences of music processing or different imagery strategies. To explore this, for our study we decided to include only participants with a musical background, considering that they would have more experience implementing systematic auditory imagery strategies that could reflect some consistency in brain activation patterns.
Based on both studies, we hypothesized that an overall increase in alpha power would be observed in parieto-occipital areas during all music imagery tasks compared to perceptual tasks. Regarding the comparisons between low- and high-complexity tasks, we hypothesized that alpha activation might show inter-individual specific behavior and that different activation patterns would be observable in fronto-central and central areas related to complexity levels during imagery tasks.
Materials and Method
Participants
An invitation to participate in the study was published on a social media platform; after expressing their interest in participating, volunteers were asked to respond to an online questionnaire (Google Forms) in order to assess their suitability for the study. Sixteen individuals participated in the study: 10 female, 6 male. Ages were 28–52 years (M: 37.25; SD: 6.77). While both musicians and non-musicians can evoke auditory/music images, as we were interested in assessing possible individual effects related to imagery strategies, musical expertise was chosen as an inclusion criterion. It has been argued that differences in listening biographies (Altenmüller, 2001) and imagery strategies, which are probably related to musical expertise (Schaefer et al., 2011), could account for individual differences in brain activation during music imagery. Furthermore, professional musicians report the implementation of different types of mental representations during their daily practice (Pérez-Acosta, 2018). Therefore, participants were required to have at least 10 years of musical practice, either amateur or professional. Participants included in the study reported to have had between 12 and 38 years of musical practice (M: 22.5; SD: 9.01). All participants reported not to have any sensory or neurological disorders, except one who reported having suffered a stroke and a slight hand coordination disability; however, we decided not to exclude him from the study as cerebellar involvement was not expected. Audiometric evaluation was performed, and all participants had normal hearing between 500 and 8,000 Hz with thresholds between 10 and 50 dB (Audiometer Brüel & Kjaer type 1800 with 20 dB dimmer). All participants gave their written informed consent for inclusion, collection/use of data, and publication, having been given assurance of full anonymity.
Equipment
For the EEG recordings, a g.USBamp (g.tec) system was used with 64 channels (16 × 4 amplifiers) placed according to the 10–20 system. Signals were bandlimited within 0.1–100 Hz, using an additional 60-Hz notch filter and with a sampling frequency of 256 sps. The paradigm was programmed and run with the BCI-2000 software and platform (Schalk et al., 2004). Auditory stimuli were recorded and processed using Audacity 2.2.2 and played for the participants via commercial in-canal earphones. Simple visual stimuli were created using PowerPoint for Mac 16.28. Both types of stimuli were integrated to the stimulation program through the BCI-2000 software API. In the visual condition, stimuli were displayed to the participants through a computer monitor (Dell, 70-Hz refresh rate). Presentation of paradigm cues and paces was also performed visually on this monitor. The procedure was carried out in an anechoic chamber. EEG data were transferred to a computer outside the chamber.
Stimuli
To assess effects related to stimuli features and sensory modality, we included auditory and visual stimuli with two levels of complexity designed to be as equivalent as possible between modalities (Figure 1). For the auditory stimuli labelled “simple” (AS), two pairs of two musical pitches were presented in sequence (one ascending: C4–A♭4; one descending: B♭4–E4) with the same duration for each pitch (4 s). The stimuli were recorded using a sampled piano sound in GarageBand ’09 software. For the simple visual stimuli (VS), two pairs of colored squares were presented in sequence; each square was displayed on the monitor for 4 s, with the second one in a different position emulating an ascending or descending motion. For the complex auditory stimuli (AC), two fragments were extracted from the Mexican tune “Marcha de Zacatecas.” 1 This tune was selected as it is well known by the Mexican population, and we considered it to be a solid example of natural music that would be easy to imagine and could avoid the involvement of short-term memory, as suggested by Schaefer et al. (2011). These authors propose that using short overlearned musical phrases as stimuli minimizes working memory operations. The melodic line was recorded with a flute. The complex visual stimuli (VC) were two fragments of two different video clips. For one of them we selected a scene from an old film called “Vámonos con Pancho Villa.” Auditory content of the film was eliminated. We chose this film because it could be somehow related to the tune selected and could allow us to evaluate possible context-related effects an additional analysis. Following that idea, for the second stimulus, a fragment of a non-context-related video clip was selected: a computer screen saver with a ball bouncing on the walls and the floor (no auditory content). All the stimuli had a duration of 8 s.

Stimuli. These were designed according to two levels of complexity. (a) Simple auditory (AS): two pairs of musical pitches with the same duration. Simple visual (VS): two pairs of colored squares were presented in sequence. (b) Complex auditory (AC): two fragments of the Mexican tune “Marcha de Zacatecas.” Complex visual (VC): two fragments of two different video clips. Each of the eight stimuli was presented randomly three times.
Procedure
Participants were instructed not to consume coffee or other stimulants for 24 hr prior to the recording session. A PowerPoint presentation was sent to them via email with a description and a small simulation of the paradigm (with different stimuli examples), for them to be aware and get familiar with the procedure. In the recording session, after placing the electrode cap, participants were seated in the anechoic chamber in front of the monitor and with a plastic mat under their feet to avoid direct contact with the metal grid of the chamber floor. The experiment was explained to them once again, and they were asked if they had any questions. Participants remained seated alone and with the lights off during the whole experiment. They were instructed to move as little as possible during the tasks and to use the resting period to stretch if they needed to. The experimental paradigm consisted of the following procedure (Figure 2): The trial began with a baseline recording of 1 min, with no task associated. After this, a visual cue was displayed indicating the modality of the task, whether it was going to be auditory or visual, followed by an expectancy period of 8 s for the stimulus to be presented. The expectancy period was included to observe whether brain activity associated with an expectant state in relation to the task to be performed could be identified. Then, the stimulus was either played through the earphones for the auditory tasks or displayed on the monitor for the visual tasks (8 s). After 3 s had elapsed, an auditory cue was introduced to indicate that they had to initiate the imagery task accompanied by a 3-s visual countdown display (3 … 2 … 1 …). For this task, they were instructed to imagine what they had heard or seen as accurately as they could (8 s). As visual tasks were being measured and in order not to deal with artifact activity related to opening and closing the eyes, participants kept their eyes open during the whole experiment. So, a cross (+) was displayed in the monitor during the imagery tasks, and participants were instructed to fixate their gaze on it. A resting period followed (10 s) where participants were instructed to rate out loud with a number from 1 to 6 the vividness of the image they had just evoked (1 = I couldn’t imagine the stimulus; 6 = exactly as the stimulus). This procedure was chosen because the participants were asked to move as little as possible and were seated with the lights off and with no access to any kind of keyboard. Their responses were recorded using Audacity 2.2.2 software via a microphone connected to a computer located outside the chamber. Once again, participants had to wait for the next cue to be displayed. The 8 stimuli were presented randomly 3 times each, for a total of 24 trials (18 min, including the 1-min baseline). Each participant did the experiment twice, with a 5-min break in between. At the end of the session, participants were asked to answer a post-test questionnaire (Google Forms), in which they were asked to rate, overall, the difficulty of the imagery tasks on a 5-item Likert scale (1 = very easy; 5 = very difficult). They were asked if they were familiar with the selected tune (Marcha de Zacatecas) and if they had recognized the fragment of the film and if so, what they had identified. For both modalities, they were asked to indicate the strategies they had implemented for the imagery tasks. In the last section, they were asked to rate, also on a 5-item Likert scale, aspects in general of the experiment: how tiring they had found the experiment (1 = not tiring at all; 6 = very tiring); level of attention during the trials (1 = very distracted; 6 = absolutely concentrated); if they had identified any relation between the stimuli and if so, to describe it. Finally, they were asked to describe in general their performance of the experiment and anything else they considered relevant to mention.

Experimental paradigm. The trial started with a 60-s baseline period. Then, participants received a visual cue (3 s) indicating the modality of the task (auditory or visual), followed by a period of “expectation” (expectation task) for the stimulus to be presented (8 s). Perception task: the stimulus was played via earphones or displayed in the monitor according to the indicated modality (8 s). Imagery task: after 6 s, participants had to imagine the stimulus that had just been presented (8 s), followed by a resting period of 10 s where they had to rate how well they thought they had been able to recreate the image of the stimulus, and then wait for the next cue. Stimuli were presented randomly. EEG recordings were measured during the whole session.
EEG Signal Processing and Analysis
Preprocessing
Both visual inspection of the multichannel EEG plots and specific incident notes taken throughout the experiments were used to replace the occasional badly recorded channels using spherical interpolation (Perrin et al., 1989) before any further ensemble processing was performed, interpolating, in general, one channel per subject at the most. The complete recording was re-referenced to common average reference (CAR) (Yao et al., 2019). Task epochs were segmented based on the state flags from the BCI2000 files, and the epochs were inspected for relevant artifacts and restored using independent component analysis (ICA) decomposition and semi-automatic removal of artifact sources. Processing and analysis were performed using custom written scripts in Python 3.
Bandpower Estimation
Linear-phase, 100th-order FIR filters were designed for alpha, beta, and gamma bands and applied individually to the relevant epochs from each channel. Total power per band was computed from each filtered signal using the Welch periodogram, followed by numerical integration. Task-based average power was computed from the corresponding epochs for each channel, and relative power indices were computed normalizing to the sum of the three band power estimates per channel, per task (hereafter, all references to “power” in this paper should be interpreted as to “relative power”). Using normalized values “decreases the effects of total power changes on spectral power of sub-bands’”(Bandarabadi et al., 2015), in this case considering that the orders of magnitude of each band vary widely. Topographic maps for each subject and experimental condition were produced and inspected for abnormalities that could have remained from deficient preprocessing. In addition, population maps for mean power were also produced.
Task/Condition Comparisons
Non-parametric statistical hypothesis tests (Wilcoxon signed-rank for paired samples) were performed to determine electrode- and band-wise statistically significant population differences among pairs of conditions and tasks. Rejection of the null hypothesis (power differences among conditions are symmetric about zero) was defined for an alpha level of 0.05. Rest-vs-task, expectation-vs-perception, expectation-vs-imagery, perception-vs-imagery, and auditory/visual-vs-auditory/visual imagery mean power difference maps were produced and marked for electrodes where statistically significant differences were found.
Results
Electroencephalographic Findings
From the 16 participants that participated in the study, only 12 successful recordings were obtained since it was determined that relevant epochs could not be adequately restored through artifact removal and/or local interpolation in the remaining four cases. Mean relative powers for alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–50 Hz) bands were computed for each electrode and condition (expectation, perception, and imagery), and topographic maps were produced to assess the overall behavior of the test. These maps are presented in Figure 3.

Bandpower grand averages per condition (expectation, perception, and imagery, corresponding to each of the three quadrant rows of alpha, beta, and gamma relative power topographic maps) and task (above: auditory simple (left) and complex (right); below: visual simple (left) and complex (right)).
Largest power is observed for occipital alpha, with observable alpha suppression for the visual perception and imagery tasks. Beta and gamma powers are predominantly frontal, and a slight overall increase in these band powers is observed when entering into the perception or imagination conditions, more so in the case of the visual tasks. Comparison of task/condition power distribution with respect to rest (baseline or reference) power distribution is shown in the form of mean difference maps in Figure 4. The direction and strength of the power change is encoded in the colormap, while sites with statistically significant differences are shown with larger location markers. All task/condition maps show a widespread positive increment in mean beta and gamma powers, with statistically significant differences mostly in fronto-central sites. It should be noted that, in addition to these observations for the whole paradigm, there are also many of these significant sites for beta power when performing the visual tasks. In the case of alpha power, for the auditory tasks, mean power changes slightly with respect to the rest condition, irrespective of the condition, and only a few sites show statistically significant differences. A widespread alpha suppression is clear and significant for the visual tasks under perception and imagery conditions, while there is small and typically non-significant reduction in alpha power for the auditory tasks.

Mean average power differences with respect to resting power (baseline) for each condition (expectation, perception, and imagery, corresponding to each of the three quadrant rows of alpha, beta, and gamma relative power topographic maps) and task (above: auditory simple (left) and complex (right); below: visual simple (left) and complex (right)). Electrode locations where statistically significant differences were found are indicated with larger location markers.
In addition to the task/condition-vs-baseline comparisons just presented, condition contrasts within each task were also explored. Expectation-vs-imagination, expectation-vs-perception, and perception-vs-imagination mean power differences per task are presented in Figure 5. Again, the direction and strength of the power change is encoded in the colormap, while sites with statistically significant differences are shown with larger location markers. Power changes are mostly negative for beta and gamma bands in all task/condition combinations, except for visual tasks when comparing expectation-vs-imagery conditions, where mean beta and gamma powers in expectation are larger than in imagery conditions. Statistically significant differences are found mostly for the alpha band in various task/condition combinations. However, in general, statistically significant differences arise when comparing the expectation condition with the other two, but differences between perception and imagination are scarce.

Mean average power differences for combinations of task conditions (expectation-vs-imagery, expectation-vs-perception, and perception-vs-imagery, corresponding to each of the three quadrant rows of alpha, beta, and gamma relative power topographic maps) and task (above: auditory simple (left) and complex (right); below: visual simple (left) and complex (right)). Electrode locations where statistically significant differences were found are indicated with larger location markers.
Finally, in the interest of exploring the patterns of activity for the imagery condition, mean power was compared between simple and complex tasks of the same modality (auditory, visual) and between simple or complex tasks across modalities. Using the same representations as in the previous maps, mean power differences and locations with statistically significant differences are depicted in Figure 6. Same modality contrasts are shown in the first two rows (AS-vs-AC, VS-vs-VC) and, while there are small-magnitude differences in power, only for a few locations in the midline are these significant. For this reason, possible contextual effects related to the two complex visual stimuli will be further explored in a within-subject analysis. However, when comparing across imagery modalities in the next two rows (AS-vs-VS, AC-vs-VC), the distribution of significantly different locations is clearly modified, increasing in number and concentrating in the central and occipital regions. More alpha power was observed in these regions in auditory than visual, more predominantly for the complex stimuli. Conversely, less power was observed in beta—again in auditory rather than visual—predominantly for the simple stimuli, and in gamma for the complex stimuli.

Mean average power differences between auditory and visual imagery conditions. For each row, topographic maps of alpha, beta, and gamma relative power differences are presented. Electrode locations where statistically significant mean power differences were found are indicated with larger location markers. From top to bottom: simple auditory (AS) vs. complex auditory (AC), simple visual (VS) vs. complex visual (VC), AS vs. VS, and AC vs. VC.
Behavioural Data
Post-test Questionnaire Results
As seen in Figure 7, participants rated the difficulty of the imagery tasks as relatively low, with mostly higher ratings for the two complex visual stimuli (fragments of video clips). Regarding how tiring participants experienced the whole experiment, ratings were also low (M = 2.56; SD = 0.81), and for how attentive they managed to be during the tests, the ratings were high (M = 4; SD = 0.52). Fifteen of the participants reported being familiar with the selected tune, and only one reported to have recognized some specific aspects of the film (the historic period and the main character).

Difficulty ratings mean and SD for the imagery tasks for each pair of stimuli: AS = simple auditory; AC = complex auditory: VS = simple visual; VC1 = screen saver; VC2 = film.
Strategy Reports
The participants reported the strategies they implemented for both imagery tasks in an open-ended question in the post-test questionnaire. For the visual tasks, contrary to what was expected due to the complexity of the stimuli, participants reported that both videos—but more so the fragment of the film—had been easier to “recreate” compared to the geometric figures. This report seems to contradict the difficulty ratings seen in Figure 7, where the ratings for the two VC stimuli are higher compared to the ratings for the VS stimuli; it is important to remember that difficulty ratings were given immediately after the task, where having fewer elements at hand could be perceived as “easier.” As the experiment progressed, it seems that participants found it easier to reconstruct the “narrative” of the scenes with more elements than to “display” the figures in their minds with fewer elements, and this is what they reported in the post-test questionnaire. In general, for the geometric figures (VS), they reported having paid attention to their color, shape, and position on the screen, and some of the participants reported that later on they had related them to the pairs of musical pitches. One of them reported having started only thinking in “concepts related to the figures,” but that it was not until she could associate them to the pairs of pitches that she had been able to “see them in her mind.” For the videos (VC), they reported having paid attention to the details: For the screensaver (the bouncing ball), they reported having counted the lines of the background, the times the ball bounced on the floor, etc. For the film, they reported having focused on counting the number of people and horses in the scene, their clothing, their actions, and the place on the screen that they appeared. There are two elements that become relevant: 1) participants reported that, from the beginning, stimuli that integrated movement had been easier to imagine and that it had been one of the elements that helped them to perform the task. Several of them reported that they used the “narrative” of the scene to recreate it in their minds; in the case of the bouncing ball, they used the “pulse” at which it bounced on the floor; 2) almost all of the participants reported having created associations with elements outside the visual images in order to perform the tasks—mainly auditory associations. For example: “… I sometimes associated the ball with sounds from Nature …”; “… I associated the geometric figures with sounds of specific duration …”; “… for the video I tried to ‘see’ the details and at some point also ‘musicalizing’ it …”; “… I associated the geometric figures with sound intervals (pitches and tonalities) …”; “… the bouncing of the ball with a synchronized sound …” “… the sequence of horses and soldiers with the sound that would result from their actions …” Three quotes summarize key aspects of the strategies reported: “… I had to concentrate on different details at once to remember …”; “… as memory happened, I could pick up more details …”; “… mental images usually overlap …”
For the auditory imagery tasks, strategies or associations remained related to the auditory/musical domain. Only one subject reported to have used bi-directional associations with the visual stimuli; that is, she reported that the geometric figures appeared in her mind when she imagined the pair of pitches (AS) and the video of the film (VC2) when she imagined the fragments of the tune (AC). For the pair of pitches (AS), participants reported to have “just remembered” the intervals, the tones, their duration and timbre; to try to “sing them in the mind”; to associate them to familiar melodies or “symphonic fragments”; to think of their direction and type of interval, and whether they were high or low pitches; to use memory and ear training; to “keep the sound in the mind”; to analyze their interval relations; to use melodic analysis; to “imitate” the sound in the mind; to have “applied” harmony to the intervals. Just one subject reported to have associated them to a special location “right” or “left.” For the fragments of the tune (AC), in general participants reported imagery to have been fairly easy and quite vivid, due to its familiarity. Description of the strategies in this case include to “have followed” the melody, the sound of the breathing of the flutist and the sounds of the strokes of the keys from the flute; to imagine “someone playing” it; “… the memory of the tune remained in my mind during the whole session …”; to use memory of the phrase and other musical aspects. Only two participants reported having experienced extra-musical associations to specific episodes from their lives, e.g., when being at elementary school or working for the military. In summary, for the visual stimuli, participants reported using more elaborate strategies to perform the imagery tasks—for example, associating the pairs of pitches with the pairs of geometric figures for the simple stimuli, or imagining the sounds “produced” by the elements of the scenes for the complex stimuli; or elaborating on the information from the videos to facilitate their mental recreation, e.g., counting the number of people, the trajectories of the movements of the objects, forming a “narrative” of the scenes, etc. In contrast, for the auditory stimuli, some participants reported having elaborated the stimuli in order to imagine them, but using only musical/auditory elements, e.g., music analysis (i.e., intervallic, melodic, or harmonic associations), timbre quality, etc. Other participants reported recreating the stimuli directly as they had perceived them without any difficulty.
Discussion
The purpose of this EEG study was to compare brain activity associated with auditory and visual imagery tasks, with the idea that identifying similarities might provide evidence for a possible neural substrate underlying modality-independent—or possibly cross-modal—mental imagery. In agreement with Zvyagintsev et al. (2013), our results showed that both auditory and visual imagery involve widespread brain activity, but with particular behavior for each band analyzed: alpha, beta, and gamma. In terms of band-power grand average, the greatest power was observed in right occipital alpha for all conditions, with observable suppression for the visual tasks, especially the perceptual task. For this discussion, we will focus on the analysis of the results related to the auditory and visual imagery tasks. Activity in the alpha band has been linked to internally driven mental operations and top-down processing (Cooper et al., 2003), and found to reflect cortical activations in the absence of sensory input (Senkowski et al., 2008). Specifically for music imagery, several studies (Iwaki et al., 1997; Ruiz et al., 2009; Schaefer et al., 2013; Van Dijk et al., 2010) have shown alpha-band activity related to different aspects of music processing, both in music perception and imagery (for a review see Schaefer et al., 2011). This activity has been reported mainly in parietal and occipital areas. Our results for the mean relative power for each electrode and condition confirm these suggestions. The observed suppression in primary visual areas is consistent with similar findings reporting deactivation in these areas during visual imagery tasks (Cui et al., 2007; Daselaar et al., 2010; Mellet et al., 2000; Zvyagintsev et al., 2013).
The most relevant differences observed were the largest number of significant locations for beta power when performing the visual tasks. As mentioned above, beta-band activity has also been associated with top-down processing (Engel & Fries, 2010; Villena-González et al., 2018). In a review article on the potential role of different frequency bands, Engel & Fries (2010) suggested that beta band activity “appears to be associated with a continuation of the cognitive set and the dominance of endogenous top-down influences that override the effect of potentially novel or unexpected external events” and hypothesized that mental imagery should be associated with this. Indeed, we observed increased beta power for both auditory and visual imagery tasks, but with a wider distribution for the visual tasks, involving virtually all cortical regions: frontal, prefrontal, parietal, temporal, and occipital. This type of behavior has been described in studies of multisensory or cross-modal conditions. Degerman et al. (2007), in an audiovisual attention fMRI experiment in which participants had to attend to either visual or auditory stimuli or both, observed that audiovisual attention modulated activity in the same frontal, temporal, parietal, and occipital cortical regions as auditory and visual attention. In a similar EEG study, Senkowski et al. (2005) reported beta responses associated with multisensory (audio-visual) interactions over frontal, occipital, central, and sensorimotor regions. Moreover, in a subsequent study, Senkowski et al. (2008) found that increased low-beta activity could be considered as an index of simultaneous processing of visual and auditory information. Considering the above, the analysis of the strategies reported by the participants could explain the differences we found in the beta-power distributions, as they showed a clear differentiation for both modalities. For the visual imagery tasks, they reported two elements as particularly helpful in performing the tasks: the presence of movement or temporal sequences of the stimuli and the use of different types of auditory associations. In other words, they were not actually performing a “pure” visual imagery task but what could be described as a multisensory imagery task. For the auditory imagery tasks, the reported associations remained mainly in the auditory/musical domain. In this case, increased beta power was mainly observed in fronto-central, left temporal, and some parietal areas. It is important to note that the participants were all musicians, and this may explain the “dominance” of the use of auditory/musical elements even when performing visual tasks. In future studies, it would be interesting to replicate the experiment with, for example, visual artists, to see if they use reverse strategies—visual associations to perform auditory imagery tasks—and observe if the beta-activity distribution appears similar to our findings, supporting the hypothesis that this behavior is related to multisensory imagery.
In contrast, gamma-band activity showed almost identical behavior between tasks, with right-frontal performance enhanced for both auditory and visual imagery tasks. Gamma-band activity has also been associated with top-down processes, i.e., anticipatory modulation (Schadow et al., 2009), attention (Debener et al., 2003), and working memory and anticipatory ability (Bhattacharya & Petsche, 2001). Fitzgibbon et al. (2004), in a study designed to monitor sustained gamma oscillations induced by a variety of complex mental tasks, reported widespread gamma activation during mental activity and that tasks requiring active subject involvement showed the most significant increase in gamma power. Gamma-band activity has also been described as a “mechanism for binding various complex aspects of object perception into a unified whole” (Bhattacharya & Petsche, 2001; Engel & Singer, 2001). This function is related to the comparison of perceived object features with content stored in memory (Herrmann et al., 2010). With respect to musical images, Ding et al. (2019), in an electrocorticography experiment comparing music listening and recall, reported a time-dependent sequential behavior for gamma-band activity related to bottom-up/top-down processing. That is, during the music listening task they observed high-gamma responses from the temporal to the frontal lobe, and vice versa during the music recall tasks: High-gamma-band activity preceded that of the temporal lobe. It is important to note that in their results they also reported a slight right lateralization of stronger high-gamma power during music recall, which relates to previous studies of right lateralization in pitch perception or spectral processing (Zatorre et al., 2002) and tonal processing (Tervaniemi & Hugdahl, 2003), and reports of interactions with the frontal lobe, especially in the right hemisphere. They conclude that their results “confirm the right hemisphere lateralization of music-related cognitive functions and, in particular, reveal a stronger lateralization during music recall.” With respect to visual imagery, Sviderskaya et al. (2006), in an EEG study aimed at identifying EEG correlates of the experience of visual imagery in humans, compared changes in measures of spatial organization of biopotentials (spatial synchronization, spatial disorder, coherence, and spectral power) between “experts” (graphic design students) and “non-experts,” and found increases in spatial disorder of biopotentials, i.e., non-linear processes with increasing task difficulty, particularly in anterior areas of the right hemisphere, which were more pronounced in experts than in non-experts. They explain this as corresponding to an increased workload in the creative process. Based on all the above, and the strategies reported by the participants, we can hypothesize that the increased gamma power observed in our results may be related to the extraction of features of the stimuli and their retention in memory in order to recreate the image. The question remains as to the interaction between the frequency bands (beta and gamma) associated with the two imagery tasks; further temporal analyses may shed light in this direction.
The aim of identifying a possible neural substrate underlying modality-independent mental imagery cannot be fully addressed due to the strategies used by the participants—especially for the visual imagery tasks—as they used additional associations, especially in the auditory domain, but also based in temporal aspects, to perform the tasks. However, we can identify some common activations in both modalities that are not related to primary processing: increased right frontal gamma power, increased bilateral frontal and prefrontal beta power, and some bilateral parietal (less consistent) increased beta power. These findings are consistent with previous reports of evidence supporting the hypothesis of a modality-independent imagery network, including areas related to memory retrieval, attention, semantic processing, and motor preparation (Daselaar et al., 2010; Olivetti Belardinelli et al., 2004; Zvyagintsev et al., 2013).
The discussion about the extent to which mental imagery, a complex cognitive process, is related to or dependent on other processes such as memory (Küssner et al., 2023) and attention (Pearson et al., 2015) remains active. Meanwhile, several studies aimed at observing brain activity related to different types of mental imagery have provided evidence that not only are some of the brain areas involved in the perception of stimuli (for the corresponding sensory modality)—as mentioned previously—involved in their imagery task, but so also are areas and neural networks related to memory and attention (Zvyagintsev et al., 2013). The question remains whether brain behavior specifically related to modality-independent mental imagery processes can be identified. Assessing this question poses serious methodological difficulties given the phenomenological nature of mental imagery itself. This is evident in EEG studies of imagery tasks that report identifiable general behavior between participants but with some level of individual differences that are reported to be due not only to stimulus properties but also to participants’ personal strategies for solving the tasks (Fitzgibbon et al., 2004; Schaefer et al., 2011). Having said that, it is important to point out that recent music-related imagery EEG studies have provided consistent evidence of brain activity patterns, which heightens its potential as a powerful tool for identifying neural correlates of particular features of mental imagery—namely, its dynamic behavior. This evidence includes—additional to previously observed posterior alpha band suppression (Fachner et al., 2019)—beta and gamma modulation in various areas (Villena-González et al., 2018; Hashim et al., 2024), all consistent with our results.
Finally, no significant differences were found between the perceptual and imagery tasks—which was expected based on previous reports—or related to the complexity of the stimuli, with a few exceptions. We believe that the latter issue could be addressed with a larger sample, as some indices of differences related to complexity are mainly observed in parietal areas.
Limitations and Recommendations for Future Work
After analyzing our results, we identified some limitations in the design and the implementation of the experimental paradigm. It was decided to include only musicians in the sample, as this population is used to implementing different auditory/musical strategies in their daily practice, and we were aiming to identify possible effects, related to different imagery strategies, in specific patterns of brain activation. Moreover, we reported only mean average power differences. Regarding electrical brain activity evoked by complex cognitive tasks, it is important to analyze individual behavior, as problem-solving strategies are related to individual biographies, and this may explain some apparently conflicting results. Within-subject analysis should be performed to assess possible individual differences and effects related to the different stimuli, mainly related to levels of complexity, that are not apparent in the mean average. However, as mentioned above, our population was inherently biased toward the auditory domain, which may explain why they also used auditory elements even to perform the visual imagery tasks, which prevented us from observing brain activity related to “pure” visual imagery. This could have been prevented by instructing them to avoid using auditory/musical associations during the visual tasks. We also aimed to have equivalent stimuli in both modalities (visual and auditory), but we did not counterbalance familiarity with the complex stimuli. This may also explain why the difficulty ratings for the visual complex stimuli were slightly higher than for the auditory complex stimuli and why participants needed more associations to perform the visual tasks. For future work, in addition to the above, it would also be important to include a group of non-musicians to assess possible effects related to musical expertise.
Conclusions
Characterization of neural activity underlying mental imagery remains a very complex methodological challenge for research, mainly because in addition to experimental design challenges, individuals use different strategies to perform the tasks, and even spontaneous associations tend to occur. In this study we aimed to observe and compare neural activity related to auditory/musical and visual imagery to identify differences and similarities, considering that by identifying similarities light could be shed on a neural network common to mental imagery regardless of the sensory modality. Alpha-band behavior was consistent with previous reports, with observable increases in mean relative power in occipital and parietal areas. The main difference observed was a widespread distribution of increased beta power during the visual imagery tasks, which we consider was a result of cross-modal (audiovisual) associations used and which was reported by the participants during these tasks. For the auditory/music imagery tasks, participants did report using associations, but they remained in the auditory domain. Similar activations for both auditory and visual imagery tasks were observed, particularly as an increment of gamma power in right frontal sites, increased frontal and pre-frontal beta power, and some bilateral parietal (less consistent) increased beta power. These results support the hypothesis that mental imagery involves areas related to memory retrieval, attention, semantic processing, and motor preparation, additional to primary areas according to each sensory modality.
Footnotes
Action Editor
Mats Küssner, Humboldt-Universität zu Berlin, Department of Musicology and Media Studies.
Peer Review
One anonymous reviewer; Liila Taruffi, Hong Kong Baptist University, Academy of Music.
Contributorship
Gabriela Pérez-Acosta conceived and designed the study, ran the experiments, and wrote the first draft of the manuscript. Oscar Yanez-Suarez ran the experiments, carried out data analysis and wrote the results section of the manuscript. Miguel Ángel Porta-García carried out data analysis and produced the figures. José-Luis Díaz guided the study design. All authors reviewed and edited the manuscript and approved the final version.
Ethical Approval
The Ethics and Research Committee of the Faculty of Music of the National Autonomous University of Mexico approved this study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
