Abstract
Musically evoked narrative imaginings (MENI) are mental stories that listeners construct in response to shifts in musical features such as dynamics, tempo, and timbre. Event segmentation – the cognitive process of identifying transitions in continuous stimuli – shapes these narratives. In music-colour synaesthesia (MCS), sounds involuntarily evoke colours and visual forms, offering perceptual anchors that may enhance event segmentation and narrative coherence. This article presents a novel theoretical framework integrating ideasthesia – the view that synaesthetic experiences are conceptually mediated rather than purely sensory-driven – with event segmentation theory to explain how MCS shapes MENI. It argues that synaesthetic percepts align with musical transitions to increase segmentation precision, strengthen narrative continuity, and deepen emotional and thematic engagement. Rather than treating MCS as an isolated phenomenon, it is situated within a broader continuum of music cognition and narrative construction, suggesting it magnifies processes shared across listeners. This perspective calls for further research into how multimodal mental imagery influences music perception, emotional response, and memory for musical narratives.
Keywords
The central claim is that music-colour synaesthesia (MCS) enhances musically evoked narrative imaginings (MENI) by improving structure, narrative flow, and thematic consistency. This is achieved through sensory anchors that aid event segmentation and align with the underlying musical framework. Rather than viewing MCS as a niche or anomalous, this article treats it as a model system that magnifies implicit cognitive processes – offering insight into how conceptually mediated perception shapes musical meaning and narrative construction. Building on the premise that some synaesthetic responses are conceptually driven (Curwen, 2018, 2022; van Leeuwen et al., 2015; Ward, Huckstep, et al., 2006) rather than purely perceptual (Grossenbacher & Lovelace, 2001; Ramachandran & Hubbard, 2001), this article integrates ideasthesia (Nikolić, 2009) with event segmentation theory to explore how conceptually mediated synaesthetic experiences may enhance the perception and interpretation of narrative events in music.
MENI are fluid stories listeners construct in response to music, shaped by interpretations of its emotional and structural elements (Margulis, 2017; McAuley, Wong, Bellaiche, et al., 2021). Although most listeners construct musically evoked narratives, synaesthetes experience an additional layer of engagement through visual imagery triggered by sound. Music-colour synaesthesia, or ‘chromesthesia’, is a form in which musical sounds evoke stable and involuntarily percepts of colour, and sometimes shapes, textures, and moving landscapes (Eagleman & Goodale, 2009; Ward, Huckstep, et al., 2006; Ward & Mattingley, 2006). The relationship between music-colour synaesthesia and narrative coherence is best understood by situating it within broader theories of music cognition and mental imagery.
Through a synthesis of music cognition, event segmentation theory, and synaesthesia research, this article frames MCS within a continuum of imaginative capacities responding to Margulis and McAuley’s (2023) call for a broader examination of music-evoked imaginings. Additionally, I build upon Glasser’s (2023) work on the relationship between music-related synaesthesia and mental imagery. Glasser’s study found that synaesthete musicians reported complex, detailed visual imagery in response to music, including scenes, shapes, and colours. Notably, these experiences were spontaneous and consistent with synaesthetic criteria, even when participants were not specifically prompted about synaesthesia. This suggests that synaesthetic imagery may not be a distinct experience, but rather a unique manifestation of typical mental imagery shaped by synaesthesia. To clarify, typical mental imagery is usually voluntary and flexible, such as imagining a red apple or a remembered place. Synaesthetic imagery, by contrast, is automatic, stable, and resistant to change. Building on this, the current article proposes that MCS enhances the structure and unfolding of MENI by providing perceptual anchors that support event segmentation. These anchors align with musical transitions, helping organise the narrative in response to changes in musical features.
The first section of this article introduces MENI, examining how listeners construct dynamic and culturally influenced narratives in response to music. The second section explores MCS and introduces ideasthesia (Nikolić, 2009), showing how listeners derive meaning from auditory cues and conceptual associations, which can elicit perception-like experiences. The third section examines how synaesthetic responses anchor event segmentation and narrative coherence, forming the basis of the model linking MCS and narrative imagining. The article concludes by discussing cognitive processes involved in musical narrative construction, highlighting those shared by synaesthetes and non-synaesthetes, as well as those unique to synaesthetes. It also evaluates methodological limitations and offers recommendations for future research.
Musically evoked narrative imaginings
MENI refers to dynamically unfolding mental stories or imagined scenarios that arise in response to music. Most research has focused on instrumental music to explore narrative construction without the influence of lyrics (Margulis et al., 2019; Margulis, Miller, et al., 2022; McAuley, Wong, Bellaiche, et al., 2021). This reflects the suitability of instrumental music as a neutral canvas for examining narrative engagement, free from the semantic and cultural cues embedded in lyrics.
What shapes narrative engagement?
Narrative events are guided by shifts in musical features, such as dynamics, tempo, and timbre, which listeners interpret as moments of tension and resolution (Deliège, 1989; Margulis, Miller, et al., 2022; McAuley, Wong, Mamidipaka, et al., 2021). These imaginings emerge in response to musical changes and are often shared across listeners within the same cultural framework (Margulis, Williams, et al., 2022). Additionally, listeners’ engagement is shaped by familiarity with the music and its emotional expression. Jakubowski et al. (2024) found that when participants listened to unfamiliar instrumental excerpts and described their thoughts, greater familiarity with a musical style increased thought occurrence, often triggering autobiographical memories, whereas unfamiliar music fostered more novel imaginings. They also observed that emotional expression shaped thought content, with positive music evoking positive thoughts and negative music prompting introspection or memory-related responses.
To measure imaginative engagement Margulis et al. (2019) developed the Narrative Engagement (NE) scale. It invites listeners to rate four aspects of their imagined story: (a) how easily they imagined a story while listening; (b) how vivid it appeared; (c) whether it included a clear setting, characters, and events; and (d) whether the events occurred during the music rather than afterward. Building on this, Margulis, Williams, et al. (2022) demonstrated that music with heightened contrast and dynamic shifts enhances narrativity by creating salient moments of tension and resolution. They used real-time event-marking and tension-rating tasks to demonstrate that listeners perceive narrative events as unfolding dynamically in response to changes in musical features. These contrasts served as narrative cues, prompting listeners to imagine distinct events and perceive significant transitions in the music, akin to narrative structures in literature or film (Jakubowski et al., 2024; Margulis, Miller, et al., 2022; McAuley, Wong, Mamidipaka, et al., 2021).
Why listeners’ narratives differ
However, the characteristics of MENI are not fixed and can vary depending on factors such as cultural background, individual experience, musical context (McAuley, Wong, Bellaiche, et al., 2021; McAuley, Wong, Mamidipaka, et al., 2021), and the music’s emotional expression (Jakubowski et al., 2024; Loui et al., 2023). Some musical pieces evoke shared, intersubjective narratives, whereas others evoke highly personal ones that lack intersubjective agreement (McAuley, Wong, Bellaiche, et al., 2021; McAuley, Wong, Mamidipaka, et al., 2021). Personality traits may also shape narrative engagement. Greenberg et al. (2015) found that individuals high in openness to experience – one of the Big Five traits (McCrae & John, 1992) – tend to engage more deeply in imaginative activities.
Two key factors in narrative engagement are topicality and contrast. As discussed by McAuley, Wong, Bellaiche, et al. (2021), topicality refers to culturally learned associations between specific musical patterns and themes. For instance, descending harp arpeggios are often associated in Western media with dream-like states. These associations, shaped by repeated exposure to cultural conventions, tend to produce more consistent narrative interpretations within a cultural group. Contrast involves shifts in musical elements such as dynamics, instrumentation, or tempo. Unlike topicality, contrast appears to be a more universal cue for narrative interpretations across cultures. However, it leads to greater variability in listener interpretations, whereas topicality results in more consistent engagement. This suggests that learned cultural associations play a stronger role in shaping coherent musical narratives (McAuley, Wong, Mamidipaka, et al., 2021).
The variability in MENI highlights how listeners construct unique stories in response to music, shaped by cultural, emotional, and contextual influences (Margulis & Jakubowski, 2024). Although non-synaesthetes rely on musical contrasts and emotional cues to mark narrative events, music-colour synaesthetes experience an additional layer of engagement (Küssner & Orlandatou, 2022). Their visual experiences often align with changes in pitch, timbre, and dynamics, producing shifts in colour, brightness, and spatial form (Gallace & Spence, 2006; Küssner et al., 2014; Küssner & Leech-Wilkinson, 2014). This suggests that synaesthesia may amplify MENI by providing consistent, multimodal cues that guide narrative interpretation.
Music–colour synaesthesia
Historically described as ‘a union of the senses’ (Cytowic, 2002, p. 325), synaesthesia involves the automatic, involuntary triggering of one sensory modality (the concurrent) by another (the inducer) (Grossenbacher & Lovelace, 2001). It affects an estimated four percent of the population, with around one third of these identified as music–colour synaesthetes (Glasser, 2023; Ward, Huckstep, et al., 2006). Although frequently characterised as perceiving colours in response to sound or notes, MCS can be triggered by a broader range of musical features, including compositional style, timbre, tonality, and pitch (Peacock, 1985). The condition is highly idiosyncratic, and synaesthetes may disagree on the colours evoked by the same stimulus. The phenomenon commonly extends beyond colour to include shapes, textures, and spatial landscapes (Eagleman & Goodale, 2009). Importantly, such associations are stable and remain consistent over time (Cytowic, 2018; Simner et al., 2006). Synaesthetic experiences extend beyond individual pitches or timbres, reflecting consistent, holistic mappings between extended musical passages and complex synaesthetic images (Mills et al., 2003).
Shared cross-modal tendencies in the general population
Notwithstanding the unique nature of synaesthetic experience, similar cross-modal correspondences are evident in the general population. Sensory modalities such as sound and colour are frequently linked, with certain auditory features naturally evoking specific visual impressions. For example, in studies with Western participants, high-pitched notes often evoke brightness, whereas low-pitched notes suggest darkness (Gallace & Spence, 2006; Marks, 1987, 2004; Spence, 2011; Spence & Di Stefano, 2023; Walker et al., 2010; Ward, Huckstep, et al., 2006). Cross-cultural research suggests these associations may exist universally, though specific metaphors and expressions vary by culture (Eitan & Timmers, 2010). However, rather than mapping sound directly to colour, MCS – like the mediating factors of MENI in non-synaesthetes – is often shaped by contextual influences such as the mood or thematic progression of a musical piece (Curwen, 2018; Glasser, 2023; Mroczko-Wąsowicz & Nikolić, 2014; Ward, 2004).
Stronger emotional responses frequently intensify synaesthetic experience, suggesting that affective and narrative elements shape the colours perceived (Isbilen & Krumhansl, 2016; Marks, 2004; Ward, 2004). Although stronger emotions may heighten the brightness or vividness of colours, there is limited evidence that they consistently change the colour category itself (e.g., red to blue). Instead, emotional valence often modulates saturation (colour intensity or vividness) and lightness (brightness or luminance), with positive emotions aligning with lighter, more saturated hues and negative emotions with darker, muted tones (Ward, 2004). Similar emotion-mediated patterns appear in non-synaesthetes, where music–colour associations correspond to emotional tone, suggesting a shared conceptual basis (Palmer et al., 2013; Isbilen & Krumhansl, 2016).
Variability in synaesthetic experiences
Two significant distinctions emerge among synaesthetes: the associator–projector dichotomy and the higher–lower classification.
Associators experience synaesthetic perceptions internally, as mental constructs or visualisations (Dixon et al., 2004; Dixon & Smilek, 2005). For example, an associator might ‘know’ the colour blue when hearing a specific note. In contrast, projectors externalise their experiences, perceiving colours or shapes as overlaid on the external environment (Smilek et al., 2001).
Higher synaesthetes rely on conceptual cues, such as recognising musical motifs, to trigger their synaesthetic experiences (Curwen, 2018, 2022). Lower synaesthetes respond directly to sensory input, such as specific pitches or timbres (Nikolić, 2009; Ramachandran & Hubbard, 2001). These categories are not mutually exclusive, and many synaesthetes exhibit overlapping traits. Such variability highlights the need for a framework beyond fixed sensory mappings towards one that incorporates conceptual understanding and meaning.
Ideasthesia
Ideasthesia proposes that synaesthetic responses can be triggered by conceptual activation, not just sensory stimuli (Mroczko-Wąsowicz & Nikolić, 2014; van Leeuwen et al., 2015). Nikolić (2009) defines it as ‘a phenomenon in which a mental activation of a certain concept or idea is associated consistently with a certain perception-like experience’ (p. 28). The term – literally ‘sensing concepts’ – proposes that perceptually-led conceptual understanding, rather than sensation, can drive the response (Mroczko-Wąsowicz & Nikolić, 2014, p. 4; van Leeuwen et al., 2015). For example, synaesthetes exhibit Stroop (1935) interference when responding to written musical notation or key signatures, even without hearing the sound (Curwen, 2022; Ward, Tsakanikos, et al., 2006). This suggests that conceptual processing blends with perception, shaping synaesthetic experience (Curwen, 2018, 2022; Mroczko-Wąsowicz & Nikolić, 2014). Nanay’s (2020) theory of synaesthesia as multimodal mental imagery reinforces this idea, framing synaesthetic experiences as percepts shaped by top-down conceptual associations. For example, a synaesthete might experience the key of A minor as a ‘greenish melancholy’ through a meaningful, internally encoded association.
Evidence for conceptual priming comes from several studies. Nikolić et al. (2011) found that synaesthetic swimmers experienced colour associations during both physical activity and vivid mental imagery of swimming strokes. Dixon et al. (2000) reported that arithmetic problems (e.g., ‘5 + 2’) triggered the colour of the answer (‘7’) even when ‘7’ was not shown. de Thornley Head (1985) described a tone–colour synaesthete who retained colour associations to the original key, despite surreptitious transposition, indicating a response anchored in conceptual understanding, not acoustic input. Similarly, Ward, Tsakanikos, et al. (2006) showed that responses to musical notation were shaped by pitch and meaning, as shown in a reverse Stroop task where participants were slower to name colours when their synaesthetic colour conflicted with the printed one. Curwen (2022) found that written key signatures alone could trigger colour responses in music–colour synaesthetes.
Rapid adaptation and conceptual flexibility
The theory of ‘practopoiesis’ (Nikolić, 2015) supports this perspective, proposing that rapid neural adaptation facilitates conceptual learning. Mroczko-Wąsowicz et al. (2009) found that grapheme–colour synaesthetes quickly extended associations to unfamiliar Glagolitic characters, indicating concept-driven mechanisms. Although ideasthesia emphasises conceptual mediation, it does not exclude rapid or automatic responses. Reaction time studies show that synaesthetic concurrents can emerge within milliseconds (e.g., Mroczko-Wąsowicz et al., 2009; Nikolić et al., 2007). Stroop interference has been observed within minutes of training new pairings, supporting the view that even fast responses may be conceptually primed. Rather than undermining ideasthesia, such automaticity reflects the efficiency of top-down semantic decoding (Nikolić, 2016). Sensory-driven and conceptually mediated experiences may coexist along a continuum shaped by familiarity, attention, and context.
Conceptual mediation in synaesthesia and MENI
Ideasthesia also accounts for variability in synaesthetic responses by considering the impact of emotion, cultural background, and other conceptual factors on inducer-concurrent pairings. For example, a musical theme may evoke different colours depending on whether it is interpreted as joyful or nostalgic, shaped by personal experience and cultural understanding (Day, 2005; Mroczko-Wąsowicz & Nikolić, 2014; Ward, 2004). This flexibility invites comparison with MENI which also emerge from top-down conceptual processes. However, the two phenomena differ in form and phenomenology. Ideasthesia produces automatic, percept-like experiences, whereas MENI involves the reflective construction of imagined scenes, characters, or emotional arcs that unfold over time (Jakubowski et al., 2024; Margulis, Williams, et al., 2022). Both reflect the influence of internalised meaning but differ in intentionality, immediacy, and sensory vividness.
Why connect MCS and MENI? Both involve rich, multimodal engagement with music, but MCS, uniquely, adds stable perceptual anchors that may enhance the salience of musical change. These vivid, consistent responses help listeners detect event boundaries, essential for constructing musical narratives. MCS thus offers a model system for examining how perception and imagination interact, shedding light on broader processes by which music is understood, segmented, and brought to life in the mind’s eye.
Event segmentation
A key process underpinning MENI (Margulis, Miller, et al., 2022; Margulis, Williams, et al., 2022) is event segmentation. This is not a listening style or interpretative strategy, but a domain-general cognitive mechanism through which the brain organises continuous information into discrete, meaningful units (Tversky & Zacks, 2013; Zacks & Swallow, 2007). It plays a central role in perception and memory across domains, from everyday activities to interpreting narratives in visual media or music. In storytelling, daily life, or music perception, segmentation enables the brain to process complex experiences by marking boundaries where one event ends and another begins (Margulis, Miller, et al., 2022).
This article highlights segmentation as a key structuring mechanism in MENI, not as a singular or exclusive listening mode, but acknowledging that it is just one aspect of a diverse range of listening styles. As Weining et al. (2025) note, listeners may engage in various ways, including diffuse, bodily, emotional-immersive, associative, structural, and reduced modes. Even in intuitive or emotionally driven listening, segmentation may function implicitly, influencing how events are experienced and recalled without requiring deliberate analysis. This cognitive capacity is not unique to synaesthetes or those experiencing ideasthesia, and may be more salient in some individuals or contexts.
In Margulis, Williams, et al.’s (2022) study, listeners identified event boundaries in both visual and auditory domains by detecting noticeable changes in the stimuli. Using a mouse click, participants marked perceived events in the music, often coinciding with shifts in tension. In the auditory domain, this included changes in tempo, harmony, dynamics, or rhythm that helped segment the experience into distinct events. In the visual domain, listeners marked boundaries in their imagined scenes, such as character movement or environmental shifts. Musical excerpts rated high in narrativity on the NE scale (Margulis et al., 2019) – indicating vivid, structured, and temporally aligned stories – prompted more narrative events and greater intersubjective agreement on when those occurred. This suggests that high-narrativity music guides listeners’ imaginations, with event segmentation unfolding as the music progresses. For instance, a sudden crescendo or a melodic shift often marked the onset of a new imagined event for many listeners at the same time point. This aligns with Tversky and Zacks (2013) and Zacks and Swallow (2007), who found that people naturally divide continuous experiences based on perceptual and conceptual cues.
Zacks and Swallow (2007) also demonstrated the link between effective event segmentation and memory recall where participants watched movies depicting everyday activities, such as setting up a tent. After viewing, participants identified event boundaries which were compared to a normative standard based on other viewers’ judgements. Participants then completed a memory test distinguishing still images from the movies and similar, but new, images. Those whose boundaries aligned more closely with the normative standard – those who segmented ‘well’ – had significantly better recall of visual details. This suggests that effective segmentation anchors information in memory and enhances later recall, mirroring the consistency observed in synaesthetic experiences.
The notion that event segmentation structures memory resonates with Margulis and Jakubowski’s (2024) view that memory and imagination shape how listeners engage with music. They propose that prior experiences with familiar patterns, structures, and cultural associations enable listeners to form expectations about how music will unfold. Recognising familiar musical cues allows listeners to anticipate changes in dynamics, harmony, or rhythm, enriching engagement and interpretation.
Synaesthesia and event segmentation
In MCS, perceptual anchors such as colour, sound, and movement (Zacks & Swallow, 2007, p. 81) sharpen the precision of event segmentation by aligning with structural and emotional shifts in the music. For example, a synaesthete in Curwen et al.’s (2023) study described a passage in Grieg’s Piano Concerto as ‘green with yellow ochre hue’ linking A minor to yellow. Such colour specificity is common among synaesthetes. Simner et al. (2006) found that grapheme-colour synaesthetes use more colour terms than non-synaesthetes. Although controls might describe a colour as ‘green’, synaesthetes use precise labels like ‘pea green’, ‘jade green’, or ‘lime green’. Simner suggests that this granularity arises because synaesthetes describe vivid, ‘actual’ visual sensations rather than abstract associations. In the context of MCS, this capacity for detailed and precise sensory descriptions may enhance event segmentation by providing consistent, finely tuned markers that align with shifts in musical dynamics, harmony, or timbre.
Consistency and predictability
The consistency of synaesthetic responses may aid not only event segmentation but also the interaction between memory and imagination during music listening. Drawing on Margulis and Jakubowski’s (2024) model, music-evoked imagination involves the dynamic recombination of semantic, contextual, and emotional associations. For music–colour synaesthetes, consistent cross-modal associations – such as recurring colours or textures – can function as perceptual anchors. These anchors stabilise evolving imagery and help structure MENI. For instance, if a chord progression consistently evokes a specific shade of blue, a modulation prompting a different hue may intuitively signal a narrative or emotional change.
Margulis and Jakubowski (2024) propose that music uniquely supports the interaction of memory and imagination: remembered fragments – whether episodic or semantic – are recombined into folding narrative imagery. Synaesthetic colour experiences, being vivid and stable, may sharpen this process by offering moment-to-moment interpretive cues. This may enable synaesthetes to segment music into emotionally meaningful events and perceive imagined scenes with greater clarity.
Consider a music–colour synaesthete listening to Beethoven’s Moonlight Sonata. The minor-key opening evokes deep, swirling shades of indigo and navy-blue, connotating feelings of sadness or introspection and prompting the image of a solitary figure walking through a dark forest. As the music transitions to a passage with a higher register and softer dynamic contrast, the colours shift to silver and pale violet, mirroring a change in mood. The narrative evolves in tandem: the lonely figure finds a moonlit clearing, signifying hope and reflection. A crescendo brings reds and oranges, signalling tension or urgency. The figure now faces a storm, representing inner turmoil or emotional conflict. For the synaesthete, these visual experiences are not merely decorative but serve as stable, cross-modal markers that guide narrative construction. The perceptual anchors created by colour shifts help the listener segment the music into distinct narrative events, enhancing the coherence and emotional resonance of their MENI.
Exploring how ideasthesia’s concept-driven mechanisms interact with anticipation, shaped by memory and imagination, could provide fresh perspectives on how synaesthetic listeners construct musical narratives. The consistent pairing of colours with emotional shifts allows synaesthetes to segment music into meaningful units with greater precision while linking colour perceptions to formative emotional memories. This heightened segmentation likely results in richer more detailed MENI, as synaesthetes connect their sensory experiences to evolving musical storylines.
Mechanisms and conceptual model
Figure 1 presents a conceptual model of the hypothesised relationships between MCS, event segmentation, and MENI. The figure builds on Margulis and Jakubowski’s (2024) framework, which outlines how music elicits semantic, contextual, and emotional associations that support memory and imagination. This adaptation proposes that structural musical features (e.g., key modulations, timbral shifts) trigger stable percepts via ideasthesia (e.g., colours, textures, spatial forms). These perceptual anchors enhance the salience of event boundaries, supporting segmentation and shaping dynamic narrative experiences.

Conceptual Model Illustrating How MCS Enhances Event Segmentation in MENI Through Ideasthesia.
The model also captures individual variability. MENI may emerge through multiple interacting pathways: memory, emotional, or cross-modal perception – depending on the listener. Cultural background, musical expertise, and personality influence interpretation, with or without MCS. The model clarifies the hypothesised causal roles of segmentation, perceptual vividness, and narrative development across listeners.
Although the model illustrates a general mechanism linking MCS, event segmentation, and MENI, it does not represent subtype-specific variations in synaesthetic experience. Projectors, attuned to vivid sensory changes, may delineate musical events through sensory cues such as shifts in timbre or dynamics. Associators, drawing on internal visualisations, may focus on thematic or conceptual shifts. Higher synaesthetes might segment music guided by narrative or thematic coherence, whereas lower synaesthetes respond more to immediate sensory features. Although direct evidence for segmentation differences is lacking, Nikolić’s (2009) distinction between top-down (higher) and bottom-up (lower) processing offers a plausible explanatory framework.
Drawing on Nanay’s (2020) account of synaesthesia as multimodal mental imagery, these subtypes may be interpreted as variations in phenomenology of synaesthetic imagery (e.g., with spatial presence or conceptual abstraction), rather than distinct causal mechanisms. This framing allows the model to account for experiential diversity while maintaining a shared functional architecture. From this perspective, both sensory-driven and concept-driven processes can support event segmentation and MENI, shaped by the individual’s synaesthetic subtype. Thus, although subtypes primarily influence the qualitative experience, they do not alter the core mechanisms represented in the model, which foregrounds the shared cognitive processes underpinning segmentation and narrative imagining.
Although synaesthetic responses offer vivid perceptual markers that may enhance segmentation precision, these experiences are highly idiosyncratic. As a result, segmentation strategies can vary considerably between individuals. In some cases, heightened sensitivity to subtle changes may lead synaesthetes to ‘over-segment’, identifying excessive boundaries that disrupt the coherence of the musical narrative. Despite these differences, both synaesthetic and non-synaesthetic listeners rely on shared cognitive mechanisms to segment events. Perceptual cues and conceptual frameworks support the organisation of music into meaningful units, with synaesthetes contributing a distinctive multimodal perspective.
Model generalisability and scope
The proposed model offers a structured account of how vivid, conceptually mediated synaesthetic percepts may enhance event segmentation and narrative construction. Yet, it is important to acknowledge its potential limitations. The model may best apply to cases of MCS involving exceptionally vivid, stable, and emotionally salient percepts. Individuals with less perceptually rich or more abstract synaesthetic experiences may not derive the same segmentation support from their concurrents. Additionally, the model presumes a degree of structural clarity in the musical input – such as dynamic shifts, modulations, or thematic development – that may be absent in ambient, minimalist, or aleatoric musical forms. In such contexts, the alignment between synaesthetic percepts and musical structure may be weaker or more diffuse, potentially limiting the model’s explanatory power. Future work should explore how the proposed mechanisms operate across varying levels of synaesthetic vividness and musical complexity, and whether alternative forms of perceptual anchoring emerge in less structured listening environments.
Discussion
The interaction between music–colour synaesthesia and event segmentation offers valuable insights into how multimodal experiences enrich narrative construction in music perception. Synaesthetic responses may act as vivid and consistent cross-modal anchors, enabling listeners to more effectively segment musical events and enhance the coherence and structure of MENI. However, narrative construction is not purely sensory. Both synaesthetic and non-synaesthetic listeners rely on a combination of sensory and conceptual cues to interpret and segment music. This shared reliance suggests that synaesthetic experiences lie along a broader continuum of imaginative capacities that span diverse listener profiles (Margulis et al., 2019; McAuley, Wong, Mamidipaka, et al., 2021). Situating music–colour synaesthesia within this continuum aligns with Margulis and McAuley’s (2023) call to investigate the shared mechanisms underlying music perception and narrative engagement, particularly the roles of multimodal mental imagery, conceptual processing, and individual variability.
Despite the unique perceptual advantages synaesthesia may offer, some researchers question whether it confers distinct cognitive advantages. Deroy and Spence (2013) argued that synaesthesia reflects atypical cross-modal correspondences rather than enhanced cognitive processing. From this perspective, synaesthetic experiences do not represent a unique cognitive capacity but rather an idiosyncratic expression of universal mechanisms. Similarly, Küssner and Orlandatou (2022) characterised synaesthesia as an intensified form of universal cross-modal correspondences, an amplified interaction between sensory and conceptual modalities rather than a distinct cognitive category. Yet, even if synaesthesia is seen as an extension of typical cognition, its heightened specificity and consistency may still support distinct modes of multimodal engagement. These vivid and reliable associations can enhance event segmentation, facilitate pattern recognition, and intensify emotional responsiveness, offering functional advantages in music perception.
Taken together, these perspectives suggest that although music–colour synaesthesia provides distinct perceptual anchors, narrative engagement depends on a broader array of sensory, cognitive, and contextual factors. Synaesthetic responses may enhance segmentation precision and narrative coherence, particularly in complex or ambiguous musical passages, but their role should be understood within a larger framework of universal cognitive processes.
Limitations and challenges
Synaesthesia research
There are several methodological challenges in synaesthesia research. Many studies rely on small sample sizes or single-case designs, limiting the generalisability of their findings. Additionally, the reliance on self-reported data introduces subjectivity, and diagnostic tools like the Synaesthesia Battery (Eagleman et al., 2007), although useful for identifying synaesthetic experiences, may oversimplify the complexity of music–colour synaesthesia. These tools tend to emphasise consistent, isolated concurrents – sensory experiences triggered by specific inducers (e.g., perceiving a particular colour in response to a musical note). This focus risks neglecting the dynamic and context-dependent nature of synaesthetic experiences, particularly in relation to musical narratives.
However, the assumption that consistency is a defining feature of synaesthesia has been questioned. Ward and Mattingley (2006) argued that variability in synaesthetic responses over time does not necessarily preclude a diagnosis, suggesting that consistency should be considered an ‘associated characteristic’ rather than a strict criterion. As Mills et al.’s (2003) case study of GS demonstrates, synaesthetic experiences can vary depending on contextual factors. GS reported that ‘sometimes the same note would be played, but [it would be a] different color. I don’t know why, even on the same instrument it’ll be a different color’ (p. 1364). This suggests that synaesthetic responses are not fixed but instead adapt to changing contexts, including the structure of the music, the listener’s mood, and the listening or environment. These dynamic qualities align with the argument that synaesthesia provides flexible perceptual anchors that enhance event segmentation.
Boundary identification
The methods used to study event segmentation also present challenges. Boundary identification tasks, such as mouse-click methods used to track event segmentation, may fail to capture implicit or nuanced forms of segmentation, particularly for non-synaesthetic listeners. These tasks tend to focus on explicit, overt boundaries, potentially overlooking the continuous, evolving nature of narrative construction in music. Furthermore, distinguishing between perceptual and conceptual segmentation strategies is important. Although synaesthetic listeners may rely more heavily on sensory markers, non-synaesthetic listeners may prioritise narrative coherence based on thematic or emotional cues. Without accounting for these differences, studies risk conflating distinct cognitive processes that contribute to musical engagement.
The role of musical expertise
Although studies of MENI have considered variables such as cultural background, emotional engagement, and familiarity, musical expertise remains underexamined. Research shows that musicians are more skilled at recognising melodic and harmonic structures due to their familiarity with musical syntax and conventions (Neuhaus et al., 2006). This expertise enables them to predict and identify phrase boundaries and cadential markers, even in unfamiliar musical contexts (Zhang et al., 2017). Burunat et al. (2024) found that musicians rely on specialised auditory processing circuits to decode the formal grammar of music, whereas non-musicians depend on broader, general networks to identify perceptually salient features, such as changes in tempo or loudness. Despite these differences, both groups identified event boundaries at similar moments in the music, suggesting that although musical expertise enhances segmentation granularity, some perceptual markers are universally accessible. Nonetheless, musicians may construct MENI with greater structural precision and coherence, aligning narrative events more closely with the compositional structure of the music. Their sensitivity to formal elements like harmonic progressions or thematic development allows for detailed and organised narratives. By contrast, non-musicians may rely more on broad emotional or perceptual cues, producing narratives that are less precise but potentially more flexible and emotionally focused. These differences indicate that although event segmentation is universally supported by perceptual cues, the depth and organisation of MENI may depend on the listener’s musical expertise.
Although musical expertise may enhance the ability to identify and interpret structural elements in music, it may also shape how expert musicians engage with narratives. Musicians, particularly those accustomed to analytical listening, may prioritise technical aspects – such as the quality of performance, stylistic fidelity, or compositional innovation – over imaginative or emotive engagement. For instance, rather than interpreting a musical passage as a dramatic narrative shift, a musician might focus on the precision of a challenging transition or evaluate harmonic complexity. This analytical orientation can result in MENI that are more descriptive or evaluative, potentially at the expense of emotive, story-driven interpretations. In contrast, non-musicians may respond more freely to the music’s emotional contours, constructing narratives that are affectively rich but less anchored in formal structure. Recognising these divergent tendencies is key to capturing the full range of narrative engagement across listeners with varying levels of expertise.
Although musical expertise is not the primary focus of this paper, it relates directly to the framework proposed here through its impact on event segmentation and narrative coherence. The ability to anticipate structural features such as key changes, modulations, or phrase boundaries may influence the salience and timing of segmentation markers, whether those are formal (in musicians) or sensory (in synaesthetes). In this way, comparing MENI in individuals with and without formal training offers a meaningful test case for the broader hypothesis: that synaesthetic percepts provide an alternative route to segmentation-based narrative construction, especially for listeners without formal training in musical structure.
Synaesthetic experience versus narrative imaginings
A key methodological and conceptual challenge is the inability to disentangle synaesthetic experiences from narrative imaginings in music–colour synaesthetes. The vivid colours, shapes, and textures elicited by musical stimuli are central to a synaesthete’s experience of music, shaping its phenomenal character – the subjective ‘what it is like’ to hear music (Nagel, 1974, p. 437). For synaesthetes, these cross-modal percepts are not supplementary to their engagement with music but constitutive of it, forming a fundamental part of their lived experience (Chalmers, 1996; Shoemaker, 1994).
Synaesthetes often express surprise upon discovering that non-synaesthetes do not share these perceptual experiences. It is impossible to ‘turn off’ their synaesthetic responses or to imagine engaging with music in any other way. As such, synaesthetic responses are not merely add-ons to the musical experience; they shape and define how synaesthetes interpret and segment music. Without a clearer understanding of how synaesthesia shapes narrative construction at both perceptual and conceptual levels, research may risk underestimating the unique contribution of synaesthetic experiences to event segmentation and MENI.
Future directions
To advance the understanding of MCS in narrative construction, future studies could explore:
Narrative Depth and Segmentation Clarity: Comparing the vividness and structure of MENI in synaesthetic and non-synaesthetic listeners using real-time event marking and narrative recall tasks.
Musical Features: Examining how tempo, timbre, and genre influence the alignment of synaesthetic responses with narrative shifts.
Neural Dynamics: Investigating whether MCS engages distinct multimodal networks, and whether synaesthetes – regardless of musical training – exhibit neural efficiencies like expert musicians.
One testable hypothesis is that synaesthetes – regardless of musical expertise – may demonstrate more vivid or structurally coherent MENI than non-synaesthetes. This advantage may be particularly pronounced in non-musicians, for whom synaesthetic responses might function as perceptual anchors that help delineate event boundaries in the absence of formal knowledge of musical structure.
Another promising line of enquiry involves exploring whether synaesthetic and non-synaesthetic listeners converge on shared narrative events when listening to music, particularly at moments of dynamic or timbral contrast. If synaesthetic colour changes correspond to perceived narrative shifts in non-synaesthetes, this may suggest a deeper cross-modal consistency in narrative construction than previously recognised.
These hypotheses could be explored using a mixed-methods design combining real-time event marking with qualitative narrative reports. By varying musical features such as tempo, genre, and timbre, researchers can test whether synaesthetic cues enhance segmentation and promote shared narrative engagement across listeners.
Broader implications
MCS provides a uniquely rich context for examining the construction of musically evoked narratives. Yet beyond theoretical insights, this relationship has broader implications. The proposed link between MCS and MENI has wider theoretical and applied significance. MCS offers a naturally occurring case in which cross-modal associations are stable, vivid, and meaning-laden, allowing us to examine how such associations shape the structure and affective contours of narrative imagination in music. As such, MCS can function as a magnifying lens with which to study ‘magnified lens’ general processes of event segmentation, imaginative engagement, and multimodal perception. Studying MENI in this population helps uncover principles of perceptual-affective integration that are likely present – but less overt – in the general population. These insights carry potential implications for several applied domains:
Music therapy: Synaesthetic cues could support clients with cognitive or emotional barriers by facilitating narrative-based interventions that aid memory, emotional processing, or identity reconstruction through music.
Music education: For neurodivergent or early learners, colour–emotion–shape mappings drawn from synaesthetic experiences might offer alternative routes to understanding abstract musical concepts such as mood, phrase structure, or key.
Creative arts: MCS-informed models of sensory integration could inspire new practices in multimedia composition, dance, and film scoring, enabling richer cross-sensory storytelling.
In this way, the relationship between MCS and MENI not only deepens our understanding of musical imagination and segmentation but also contributes to broader models of perception, creativity, and applied musical engagement.
Conclusion
This paper has explored how MCS can enhance MENI by providing vivid cross-modal anchors that align with salient musical boundaries. These perceptual markers, experienced as consistent and conceptually meaningful colours, shapes, and textures, strengthen synaesthetes’ ability to segment music into discrete events, thereby enriching their narrative engagement. By integrating ideasthesia with event segmentation theory, this paper has argued that synaesthetic responses are more than mere sensory quirks; they provide a structured, conceptual framework that enhances both the coherence and vividness of imagined musical narratives.
However, the construction of musically evoked narratives is not unique to synaesthetic listeners. Both synaesthetic and non-synaesthetic listeners engage in narrative processes shaped by shared cognitive mechanisms, such as the segmentation of musical structures into meaningful events. By situating synaesthetic experiences along a continuum of imaginative capacities, this article challenges the traditional binary distinction between synaesthetic and non-synaesthetic listeners, arguing that although sensory modalities may differ, the cognitive processes underpinning narrative construction are widely shared (Küssner & Orlandatou, 2022).
The notion that synaesthetic responses can be conceptually driven further underscores the interplay between perception and meaning-making in music perception. Unlike traditional models that view synaesthetic associations as purely perceptual phenomena, ideasthesia positions synaesthetic experiences as arising from the activation of conceptual knowledge. In this way, synaesthetic event segmentation can be understood as a process that operates at both perceptual and conceptual levels, enhancing listeners’ capacity to track musical changes, anticipate narrative shifts, and construct coherent, emotionally resonant stories in response to music.
Footnotes
Acknowledgements
I would like to thank Professor Kelly Jakubowski for her expert advice and guidance, and Dr Jacob Kingsbury Downs for his comments and language editing on an earlier draft of this paper.
Funding
The author disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by a Leverhulme Early Career Research Fellowship awarded to Dr Caroline Curwen (Grant no. ECF-2024-663).
