Abstract
Linguistic synesthesia, characterized by cross-modal sensory mappings, is frequently associated with metaphor and neurological synesthesia. However, while prior research has emphasized these latter phenomena, the cognitive and neural underpinnings of linguistic synesthesia remain poorly understood. This study employs the divided visual field (DVF) paradigm to examine the hierarchical organization and hemispheric lateralization of synesthetic transfers in Chinese word pairs, while establishing theoretical links between linguistic synesthesia mechanisms, metaphorical cognition, and neurological synesthesia. Results demonstrated that linguistic synesthesia in Chinese involves unidirectional, bidirectional, and biased-directional transfer patterns, with no consistent dominance of forward mappings (e.g., touch→sight) or backward mappings (e.g., sight→touch). Participants exhibited enhanced accuracy in taste-smell transfers, likely due to established neuroanatomical overlap, and reduced performance in auditory-visual transfers, consistent with hierarchical models of sensory processing. Hemispheric analysis revealed no significant lateralization, indicative of distributed bilateral neural networks. These findings posit linguistic synesthesia as a phenomenon rooted in sensory perception, sharing cognitive foundations with metaphor and neurological synesthesia while retaining distinct characteristics. The study illuminates the interplay between sensory perception, language, and cognition, offering insights for neuro-linguistic frameworks and cross-modal research.
Plain Language Summary
This study explores how Chinese words can describe one sense using terms from another—like saying a color is “warm” or a voice is “sweet.” While past research linked this to metaphors or brain conditions (like seeing colors when hearing sounds), little was known about how it works in Chinese. Using a visual test that separates left and right brain processing, the study found that Chinese sensory word pairs don’t always follow a fixed direction (e.g., touch words describing sight or vice versa). Some sense combinations, like taste and smell, were easier to process—likely because these senses overlap in the brain. Others, like sound and vision, were harder, matching known sensory hierarchies. Interestingly, both brain hemispheres worked equally in processing these word pairs, suggesting that the brain uses a network across both sides to mix these senses. The findings highlight how sensory experiences shape language and connect to broader thinking patterns, offering new insights for language and brain research.
Keywords
Introduction
Synesthesia, derived from the Greek words “syn” (connection) and “aesthesia” (feeling). In linguistics, it refers to the application of the concept of one sensory domain to describe the feelings of another (Williams, 1976), which is termed linguistic synesthesia (Zhao et al., 2022), also known as synesthetic metaphor (Strik Lievers, 2015; Winter, 2019b). For instance, in John Donne’s poem The Perfume, the phrase “a loud perfume” exemplifies this by using an auditory adjective (“loud”) to describe an olfactory noun (“perfume”). The systematic study of linguistic synesthesia traces back to Ullmann (1957), who proposed a linear hierarchy of sensory transfers based on poetic texts: touch → taste → smell → hearing → sight. Expanding on Ullmann’s framework, Williams (1976) further subdivided the domain of sight into color and dimension, and proposed that linguistic synesthesia might display universal patterns across languages. The general directionality has since been supported by cross-linguistic studies spanning both Indo-European and non-Indo-European languages (Fishman, 2022; Strik Lievers, 2015; Winter, 2019a; Yu, 2003; Zhao et al., 2018), reinforcing the idea that sensory transfers are not arbitrary but governed by underlying neurocognitive principles.
However, cross-linguistic research has also revealed inconsistencies in synesthetic transfer patterns as documented in English and Italian (Strik Lievers, 2015), Chinese (Zhao et al., 2018), Korean (Jo et al., 2022), and Turkish (Kumcu, 2021). These findings challenge the unidirectional nature of Ullmann’s hierarchy. Strik Lievers (2015) identified not only forward mappings (e.g., touch → sight) but also backward transfers (e.g., hearing → touch) in English and Italian corpora. Forward mappings occur from more accessible senses (e.g., touch) to less accessible ones (e.g., sight), while backward mappings occur in the opposite direction (e.g., hearing to touch; see Figure 1). It is noted that backward mappings are significantly less frequent than forward transfers. Further evidence from Chinese (Zhao et al., 2019) and Turkish (Kumcu, 2021) corpora supports this observation, introducing the concept of biased-directional transfers. As shown in Figure 2, while touch → vision mappings dominate, reverse mappings (vision → touch) also occur. Additionally, Zhao et al. (2019) also identified bidirectional transfers (e.g., touch ⇄ taste), where no clear directional preference exists. These findings—unidirectional, biased-directional, and bidirectional mappings—were subsequently replicated in Korean by Jo (2022), who argued that linguistic synesthesia operates as a rule-constrained yet frequency-sensitive metaphorical phenomenon. In a more recent study, Jo (2024) demonstrated that the directionality of synesthetic expressions in Chinese–Korean loanwords tends to follow the patterns of the source language (Chinese) rather than those of the recipient language (Korean). This observation suggests that the directionality of synesthetic mapping is shaped not by universal principles alone but also by language contact and historical factors. In brief, these findings reconcile cross-linguistic variations with underlying universal tendencies and highlight the need for further empirical investigation to refine our understanding of directional preferences in synesthetic transfers.

Linguistic synesthesia transfers in English (Strik Lievers, 2015).

Transfer directionalities of linguistic synesthesia based on Mandarin corpus data (Zhao et al., 2019, see p. 9).
To explain these directional patterns, earlier studies attributed them to embodiment asymmetry among sensory modalities. This account posits that synesthetic mappings generally proceed from more accessible sensory domains (e.g., touch) to less accessible ones (e.g., sight), as in warm color (touch → sight), while reverse mappings, such as round flavor (sight → taste), are rare (Popova, 2005; Y. Shen, 1997). However, recent empirical advances challenge this modality-based hierarchy. C.-R. Huang et al. (2025) introduced the Perceived Strength of Embodiment (PSE) model, showing that lexical-conceptual embodiment rather than fixed modality order better predicts synesthetic directionality. Similarly, Winter and Strik-Lievers (2025) through a large-scale meta-analysis, demonstrated that apparent hierarchical tendencies largely stem from a few dominant mappings (e.g., touch → hearing, vision → hearing). They proposed that a network-based model of sensory relations more effectively captures the asymmetric connections among sensory modalities.
Building on these empirical findings on directionality and sensory connectivity, another inquiry concerns the cognitive and semantic nature of linguistic synesthesia—specifically, whether synesthetic expressions should be regarded as metaphorical or not. Many studies have characterized synesthetic expressions as metaphorical due to their cross-domain mappings, typically from more embodied to less embodied sensory domains (Popova, 2005; Y. Shen, 1997; Strik Lievers, 2017; Yu, 2003; Zhao et al., 2022). Within this framework, Zhong et al. (2024) applied Conceptual Metaphor Theory in a questionnaire study and found that novel synesthetic metaphors following conventional mapping directions are perceived as more common, appropriate, and comprehensible than those that violate established principles. These findings suggest that the directionality of linguistic synesthesia is governed by embodied concept, rendering the mapping patterns at least partially predictable. In contrast, Winter (2019b) challenges the metaphorical nature of linguistic synesthesia, arguing that cross-modal lexical usages (e.g., sweet fragrance) may reflect literal rather than metaphorical meaning, particularly when involving closely integrated modalities like taste and smell. This controversy underscores the need for further empirical evidence, particularly through investigations of the cognitive mechanisms underlying literal versus metaphorical meaning processing. To account for these mechanisms, several theoretical frameworks have been proposed, including: the Standard Pragmatic Model (Searle, 1979), which posits sequential processing of literal meanings prior to metaphorical interpretations; the Graded Salience Hypothesis (Mashal et al., 2007), which emphasizes hierarchical processing based on semantic prominence; the Parallel Hypothesis (Glucksberg & Keysar, 1990), which advocates for simultaneous literal and metaphorical meaning computation.
To evaluate these models, researchers have employed diverse experimental paradigms, including: Sentence-final Word Paradigm (Coulson & Van Petten, 2002), Word Pair Paradigm (Arzouan et al., 2007), Divided-Visual-Field (DVF) Paradigm (Faust & Mashal, 2007), Stimulus1-Stimulus2 Paradigm (Yang et al., 2013). Among these, the DVF paradigm has been particularly influential in metaphor research due to its ability to isolate hemispheric contributions to figurative language processing. By leveraging the contralateral organization of the visual system—where LVF/RVF stimuli are processed by the RH/LH, respectively—studies have provided valuable insights into hemispheric specialization in figurative language processing. M. Huang et al. (2022) found right-hemisphere dominance in the comprehension of novel metaphors, a pattern consistent with Noufi and Zeev-Wolf (2024), who demonstrated that left-hand muscle contractions—induced by a grip action—activated perceptual–motor regions in the right hemisphere. This activation enhanced participants’ accuracy in processing literal expressions as well as conventional and novel metaphors, with the strongest facilitation observed for metaphorical language. Taken together, these findings emphasize the right hemisphere’s critical role in coarse semantic coding, which enables the integration of distant or novel semantic relations. In contrast, Zhu et al. (2022) reported bilateral involvement during metaphor comprehension, suggesting that both hemispheres may cooperate depending on metaphor familiarity and processing demands. Moreover, Hauptman et al. (2023) conducted a large-scale meta-analysis of fMRI studies and found that non-literal language processing—including metaphor, irony, and indirect speech—relies on the joint activation of the language-selective network and the Theory of Mind (ToM) network. These results challenge the traditional dichotomy between literal and non-literal processing. Despite these advances, the lateralization of metaphor processing remains an open question, and linguistic synesthesia—a potentially similar phenomenon—has received even less empirical attention.
Furthermore, the nature of linguistic synesthesia has also been examined through a neurobiological lens, with some scholars proposing that it may share underlying mechanisms with neurological synesthesia—a condition in which stimulation of one sensory modality automatically triggers perception in another without direct external input (Harrison & Baron-Cohen, 1996). This hypothesis is supported by observed parallels between linguistic synesthesia and cross-sensory integration in neurological synesthesia, as demonstrated through comparative analyses of linguistic and neurophysiological data (Ronga et al., 2012). Such findings suggest that linguistic synesthesia could represent a behavioral or peripheral manifestation of the same cross-modal perceptual processes that characterize neurological synesthesia (Marks & Mulvenna, 2013). However, there is significant scholarly debate regarding the neuro-cognitive mechanisms of linguistic synesthesia. Rothen and Meier (2013) observed metaphor-like abstract associations in neurological synesthesia, suggesting potential links to linguistic synesthesia, whereas Winter (2019b) argued that linguistic synesthesia reflects lexicalized sensory encoding rather than true synesthetic or metaphorical processes. Zhao et al. (2022) recently characterized linguistic synesthesia as involving conceptual metaphors of lexicalized sensory concepts, distinct from real-time neurological synesthesia. This debate highlights the need for more empirical studies that bridge linguistic patterns with neurocognitive frameworks to elucidate whether these phenomena arise from shared or distinct neural architectures.
Despite extensive research on cross-linguistic synesthetic mappings, sensory hierarchies, and metaphor processing, there is still a need for further investigation. First, although corpus studies have documented unidirectional, biased-directional, and bidirectional mappings across languages (Jo, 2022; Zhao et al., 2019), it is unclear whether these patterns are reflected in real-time behavioral processing. Second, while neurocognitive studies have identified right-hemisphere dominance and bilateral network involvement in metaphor comprehension (Hauptman et al., 2023; Huang et al., 2022), the neural mechanisms underlying linguistic synesthesia remain largely unexplored. Moreover, as linguistic synesthesia inherently combines sensory grounding with cross-modal metaphorical mapping, examining its potential role in linking conceptual metaphor and neurological synesthesia may provide important theoretical insights. Building on these considerations, the present study employs the Divided-Visual-Field (DVF) paradigm to address two key questions:
(1) Do behavioral patterns of linguistic synesthesia conform to frequency-based hierarchies of synesthetic transfer across sensory modalities?
(2) What neurocognitive mechanisms underlie the comprehension of linguistic synesthesia?
By adopting this multifaceted approach, we seek to elucidate how sensory perception, linguistic representation, and cognitive processing interact in shaping linguistic synesthesia.
Methods
Participants
A priori power analysis using G*Power 3.1 (Faul et al., 2009) determined that a sample size of 18 was required for the experiment, based on an effect size f of 0.25, α (significance level) of 0.05, and 1−β (power) of 0.95. We managed to recruit 46 postgraduates (16 males), aged 22 to 26 years old (M = 23.5, SD = 1.17), from a university. One female participant was excluded from analysis due to the lowest accuracy (below 75%), resulting in a final sample of 45 subjects (16 males, Mage = 23.4, SD = 1.13). All participants were right-handed native Chinese speakers. They were physically and mentally healthy, with no history of brain trauma or neurological disease, normal or corrected-to-normal vision, no dyslexia, and normal language ability. Informed consent was obtained from all participants prior to the experiment, and appropriate remuneration was provided after the experiment was completed. The study was approved by the Ethical Committee of Sichuan International Studies University.
Materials
The stimulus materials consist of three lexical types, synesthetic word pairs (semantically related sensory adjective + sensory noun), literal word pairs (semantically related adjective + noun) and semantically unrelated word pairs (semantically unrelated adjective+ noun). Based on the procedure for the extraction of synesthetic expressions from corpora proposed by Strik Lievers (2015), the experimental materials were generated through the following steps:
First, lots of adjectives and the nouns were randomly selected from the Modern Chinese Dictionary (Institute of Linguistics, Chinese Academy of Social Sciences, 2005), then they were artificially combined into literal and semantically unrelated word pairs. In terms of synesthetic word pairs, we first compiled a list of perception-related lexemes. A set of 100 high-frequency adjectives was selected as source-domain words from Embodied Conceptualization or Neural Realization: A Corpus-Driven Study of Mandarin Synaesthetic Adjectives, in which Mandarin synesthetic adjectives were comprehensively extracted and identified using a corpus-driven approach (Zhao, 2022). The target-domain words were extracted by querying collocations of source-domain adjectives with nouns in the Beijing Language and Culture University Corpus (BCC), a large-scale Chinese corpus comprising billions of characters. The BCC covers diverse text types (e.g., news, literature, science, and legal documents) across an extended temporal span, ensuring a representative sample of Mandarin linguistic features. Then, according to the perception-related vocabulary summarized by Strik Lievers (2015), the terms potentially using synesthesia, which characterized by the presence of words referring to different sensory modalities, were retrieved from all adjective-noun structures, and the remaining terms were manually judged. Finally, 78 synesthesia word pairs were selected as original experimental materials according to the frequency. Synesthesia expressions including sensory adjectives (source domain) and nouns (target domain) are shown in Table 1. Notably, the present study’s experimental materials were constrained to disyllabic words. This differs from Zhao et al.’s (2022) findings in the Sinica corpus, where Mandarin olfactory adjectives were primarily monosyllabic (e.g., 香 xiang “fragrant,”臭 chou “stinking”), with only three disyllabic auditory adjectives identified. Consequently, our experimental stimuli included no olfactory adjectives and only one instance from the auditory domain. Additionally, our BCC corpus analysis revealed an absence of tactile nouns and a scarcity of gustatory nouns (only “滋味-flavor” and “口感-taste”) eligible for synesthetic mappings with cross-domain adjectives.
Examples of Synesthesia Expressions Including Source Domain (SA) and Target Domain (TN).
Note. SA = source adjectives; TN = target nouns, “—” signifies the absence of expressions within the category (Zhao, 2022).
Second, questionnaires were distributed to gather experimental materials. Initially, we screened 78 synesthetic word pairs, 70 literal related word pairs, and 140 semantically unrelated word pairs for the efficacy evaluation test. Pilot surveys were conducted by 35 participants (Mage = 23, SD = 1.35), who had not participated in the behavioral experiment. They were informed of the evaluation criteria in advance, and, if necessary, received basic training in synesthesia judgment. Participants evaluated the familiarity and synesthetic degree of word pairs using a 5-point Likert scale, ranging from 1 (extremely difficult to understand or unfamiliar) to 5 (effortlessly comprehensible or highly familiar). Based on the results of evaluation, we obtained the final experimental stimuli, comprising three categories: 60 synesthesia pairs (e.g., 甜蜜-微笑, sweet–smile), designed to evoke cross-sensory associations; 60 literal semantically related pairs (e.g., 可爱-小猫, cute–cat), sharing direct conceptual or categorical relationships;120 semantically unrelated pairs (e.g., 锋利-开水, sharp–boiling water), with no apparent semantic or sensory linkage. The rational to keep the ratio of semantically related word pairs and unrelated word pairs 1:1 was to eliminate the response bias caused by different probability of materials. Table 2 showed the statistics of three categories. The synesthesia degree scores and familiarity were significantly different among different kinds of word pairs (F (2, 118) = 449.81, p < .001; F (2, 118) = 1599.15, p < .001). Repeated contrasts tests indicated that the synesthesia degree scores of synesthesia word pairs were significantly higher than the others (MD = 1.69/1.71). There was no significant difference in the synesthesia degree scores between literal related word pairs and semantically unrelated word pairs (MD = 0.02). Besides, the stroke numbers of different word pairs showed no significant difference, F (1, 59) = 1.657, p = .185.
Statistics of Three Categories on Synesthesia Degree, Familiarity and Number of Strokes.
p < .001.
Procedure
A within-subject, three-factor mixed experimental design was employed, with the following factors: 2 lexical positions (source domain first [甜蜜 微笑], source domain second [微笑 甜蜜]), 2 target stimulation visual field conditions (left and right visual fields), and 3 word types (synesthetic word pairs, literal related word pairs, and semantically unrelated word pairs). Participants were asked to perform lexical semantic judgments, specifically to determine whether the meanings of consecutive word pairs were meaningful. For example, word pairs such as 甜蜜-微笑 (sweet-smile) and 可爱-小猫 (cute-kitten) are considered meaningful, whereas pairs like 锋利-开水 (sharp-boiling water) are not.
Prior to the formal experiment, an introductory exercise using 20 word-pairs was conducted to familiarize participants with the experimental requirements, procedures, and operations. They were required to keep their gaze stable throughout the experiment, and their behavioral responses during semantic judgment tasks were recorded. In the formal experiment, participants were seated in a comfortable chair in a soundproof, well-lit laboratory, with their head resting on a chin rest. Positioned 60 cm from the center of a 23.8-inch Xiaomi LED monitor operating at a 100 Hz refresh rate, participants were tasked with carefully reading each word and completing the semantic judgment of word pairs. The words were displayed sequentially on a white background with black font. As shown in Figure 3 below, the specific presentation time and process of each corpus stimulus were as follows: each attempt started with the central fixation point (“+”) of the screen and lasted for 200 ms. Then the prime word was presented for 600 ms. The other gaze point “+” for 200 ms, and then the target word was presented for 180 ms. After the presentation of the target word, there was a question mark “?” presenting for 3,000 ms, followed by an empty screen 800 ms. After the question mark appeared, the subjects pressed the key to determine whether the pair of words they had just read was semantically related. Press “F” for relevant information and “J” for irrelevant information.

Experimental flow chart.
Results
Overall, participants correctly judged the word pairs at an average of 88.93% (SD = 3.7%) with an average reaction time of 654 ms (SD = 109 ms).
Different Synesthetic Transfers on ACC and RTs
Figure 4 presents the accuracy results for various synesthetic mapping patterns across different sensory modalities. Overall, participants achieved accuracies exceeding 60% for most types of synesthetic mappings, with notable exceptions being touch-smell (44.6%) and hearing-sight (37.3%) mappings.

Heatmap of accuracy results in different synesthetic transfers.
According to the mapping hierarchy of sensory transfers, synesthetic transfers involving sight as the source domain were classified as backward transfers, whereas all other mappings were considered forward transfers. The mean accuracy for mapping from source sensory modalities to other target senses was ranked as follows: Sight > Taste > Touch > Hearing > Smell. This ranking indicates that participants demonstrated significantly higher performance in synesthetic transfer when sight was the source domain compared to other sensory modalities. From the perspective of target domains, the accuracy rankings were as follows: Taste > Hearing > Smell > Sight > Touch. Taste emerged as the most accurate target domain with a mean accuracy of 76.4%, while sight was relatively lower at 58.6%. This suggests that taste is particularly receptive as a target modality for synesthetic transfer. Notably, the data suggest that backward transfers (sight as the source) generally yielded higher accuracy than forward transfers. Additionally, taste emerged as the most accurate target domain, followed by hearing and smell, while touch and sight showed higher accuracies as source domains.
Figure 5 illustrates the reaction times (RTs) associated with synesthetic transfers across various sensory modalities. The RTs for these transfers generally ranged from 667 to 948 ms.

Heatmap of reaction times in different synesthetic transfers.
Similar to the accuracy results, the RTs exhibit a pattern that reflects the efficiency of different sensory modalities in both source and target roles. When acting as the source domain, participants’ reaction times from source sensory modalities to other target senses were ranked from fastest to slowest as follows: Taste > Sight > Touch > Hearing > Smell. This indicates that Taste and Sight were more rapidly processed as source domains compared to the other senses. When considering the target domain, the ranking of reaction times was: Hearing > Smell > Taste > Sight > Touch. This suggests that Hearing is particularly effective as a target domain, with the shortest RTs observed when it served as the target. The data suggest that the efficiency of synesthetic transfer is influenced by both the source and target modalities. Specifically, Taste and Sight are more efficient as source domains; while Hearing and Smell are more effective as target domains. Additionally, backward transfers, where Sight serves as the source, generally resulted in lower RTs compared to most forward transfers, aligning with the accuracy results and indicating a consistent performance pattern across sensory modalities in synesthetic transfers.
ACC and RTs of Semantic Judgment Across LVF and RVF
Table 3 summarized the mean values(M) and standard deviations (SD) of the accuracy and reaction times of source domain put in front (SDF) and back (SDB) for participants in the left visual field (LVF) and the right visual field (RVF) across three specific conditions: synesthetic word pairs, literal related word pairs and semantically unrelated word pairs.
Accuracy and reaction times of semantic judgment under different word type conditions.
The results in Table 3 indicate that across all conditions, the accuracy of synesthetic word pairs was approximately 74%, with average reaction times ranging from 715 to 725 ms. In contrast, literal word pairs exhibited higher accuracy of 91.8% and the shortest average reaction times, around 603 to 609 ms. Unrelated word pairs achieved the highest accuracy, approximately 95%, though their average reaction times were slightly longer than those for literal word pairs, ranging from 646 to 659 ms. Despite the longer reaction times for unrelated stimuli compared to literal stimuli, they remained shorter than those for synesthetic stimuli.
As shown in Figure 6, a repeated-measure ANOVA on accuracy revealed a highly significant main effect of word pair type, F(2,88) = 121.316, p < .001. Pairwise contrast tests indicated that the accuracy for synesthetic word pairs was significantly lower than for both literal word pairs (MD = 17.8%) and semantically unrelated word pairs (MD = 20.9%). However, no significant differences were found for lexical collocation positions, F(1,44) = 1.295, p = .261, or for visual field conditions, F(1,44) = 0.01, p = .920. Additionally, the interaction effects between semantic correlation and lexical collocation positions, F(2,88) = 1.197, p = .303, and between semantic correlation and visual field conditions, F(2,88) = .058, p = .944, were not significant, neither were the three-way interactions, F(2,88) = .352, p = .638. Overall, the minor differences in accuracy across lexical collocation positions and visual fields suggest a subtle influence of spatial congruency and the left and right hemispheres on participants’ ability to respond accurately.

The accuracies of three-word type conditions (*p < .05, **p < .01).
The repeated-measure ANOVA on reaction times, as shown in Figure 7, revealed a highly significant main effect of word pair type, F(2,88) = 71.157, p < .001. Simple effects tests showed that the reaction time for synesthetic word pairs was significantly longer than for both literal word pairs (MD = 116.96 ms, p < .001) and semantically unrelated word pairs (MD = 70.12 ms, p < .001). However, there was no significant main effect of lexical collocation positions, F(1,44) = 1.089, p = .302, or visual field conditions, F(1,44) = 2.97, p = .092. Furthermore, no significant interaction was found between semantic correlation and lexical collocation positions, F(2,88) = 1.616, p = .211, nor was there a three-way interaction among semantic correlation, visual field conditions, and lexical collocation positions, F(2,88) = .181, p = .828. Importantly, a significant interaction effect was observed between semantic correlation and visual field conditions, F(2,88) = 3.296, p = .048.

The reaction times of three-word type conditions (***p < .001).
As shown in Figure 8, simple effect analysis found that there was no significant difference between right-visual-field (the dominance of left-hemispheric processing; RVF/LH) and left-visual-field (the preponderance of right-hemispheric processing; LVF/RH) in literal related word pairs (MD = 3.622, S.E. = 4.828, p = .457), neither was the synesthetic word pairs (MD = 10.833, S.E. = 6.164, p = .086). Crucially, there was a significant difference of visual fields in the condition of semantically unrelated word pairs (MD = 11.767, S.E. = 4.164928, p = .021 < .05). A notable finding emerged in the processing of semantically unrelated word pairs: stimuli presented to the left visual field/right hemisphere (LVF/RH) elicited significantly longer reaction times (RTs) compared to those presented to the right visual field/left hemisphere (RVF/LH) (p < .05).

The reaction time of the left and right hemispheres under three word-type conditions (*p < .05, NS = no significance).
Discussion
This section primarily revisits the hierarchical nature of Chinese linguistic synesthesia by analyzing participants’ accuracies and RTs in judging synesthetic word pairs. It further examines the neuro-cognitive mechanism of synesthesia across different visual fields and word types. Building on these findings, the discussion extends to the relationship among linguistic synesthesia, neurological synesthesia, and metaphor, aiming to clarify how these phenomena interact within broader sensory and linguistic processing systems.
Hierarchical Transfers in Chinese Linguistic Synesthesia
Based on the accuracy and RT results, the synesthetic transfer tendencies in Chinese exhibit three key patterns. First, the experimental hierarchy of synesthetic transfers is touch/taste → sight → smell/hearing, with touch and taste being most suitable as source domains and hearing and smell frequently serving as target domains. Second, participants exhibited comparable processing efficiency across all three synesthetic transfer types (unidirectional, biased-directional, and bidirectional) in Chinese linguistic synesthesia, demonstrating no statistically significant preference for forward mappings over backward mappings—a result that contrasts with the directional bias predicted by Ullmann’s linear hierarchical model. Third, participants performed the best in taste-to-smell synesthesia, while their performance was weakest in hearing-sight synesthesia among all tested mappings.
As illustrated in Figure 9, the accuracy and reaction time (RT) results reveal distinct patterns in synesthetic mappings. The arrows indicate the direction of transfer (source → target domain), with solid arrows denoting superior performance (ACC ≥ 50%, RTs ≤ 750 ms) and dotted arrows indicating inferior performance (ACC < 50%, RTs > 750 ms). The data demonstrate that synesthetic transfer in Mandarin follows a hierarchical structure: touch/taste → sight → smell/hearing, with the former domains predominantly functioning as sources and the latter as targets. This finding accords with Zhong et al. (2024), who demonstrated that synesthetic expressions conforming to the conventional directional principles are perceived as more common, appropriate, and comprehensible than those that deviate from them. Nevertheless, this pattern should not be interpreted as a strictly linear progression, as Ullmann originally suggested. Instead, as Winter and Strik-Lievers (2025) contend, linguistic synesthesia is more accurately characterized as a network-based phenomenon, in which sensory modalities are dynamically interconnected rather than rigidly ordered. Within this network, sight occupies a flexible position, functioning as both a source and a target domain. This prominence likely stems from vision’s dominance in sensory processing, supported by its rich lexical encoding in Chinese compared to other modalities. For instance, visual perception has dedicated terms for sensory qualities (e.g., color) and intensities (e.g., brightness), whereas olfaction often relies on adverbs to express intensity (Che et al., 2010). This lexical disparity may reflect the disproportionate reliance on visual input, which constitutes over 70% of human sensory perception (Wang, 2017). Furthermore, the dominance of vision is evident across both monosyllabic and disyllabic lexical structures (Zhao et al., 2019), reinforcing its pivotal role in cross-modal associations. These findings highlight the intricate interplay between sensory hierarchies and language-specific features in linguistic synesthesia, underscoring the need for further research to elucidate how modality-specific lexical gaps influence cross-modal mappings.

Directional patterns of Chinese synesthesia transfer based on experimental data (ACC means accuracy, RT means reaction time).
The aforementioned results demonstrate that synesthetic mapping patterns encompass all three types of transfers, consistent with findings from both Indo-European and Chinese linguistic synesthesia directionality (Jo, 2022; Zhao et al., 2019). Our findings highlight the following three key points: First, our examination of taste-sight synesthesia reveals that both forward (taste-sight) and backward (sight-taste) transfers exhibit comparable levels of accuracy and reaction times. Second, we observed that when touch and taste served as source domains, such as synesthetic mappings involving touch-hearing, taste-hearing, and taste-smell, participants performed at a high level. This indicates that touch and taste frequently serve as source domains, adhering to a rule-based unidirectional pattern. Third, the backward transfers of sight-hearing synesthesia were identified with higher accuracy and shorter reaction times than forward transfers (hearing-sight). The pattern is consistent with the biased-directionality transfers documented in Zhao et al.’s study. It can be observed that there are similarities between the current study and previous corpus-based studies: linguistic synesthesia follows directional tendencies, unidirectional (Y. Shen & Aisenman, 2008; Ullmann, 1957), biased-directional (Kumcu, 2021; Strik Lievers, 2015), and bidirectional transfers (Jo, 2022; Zhao et al., 2019). However, while the present study demonstrates bidirectional mappings between taste and sight, Zhao et al.’s (2019) findings revealed a biased-bidirectional pattern (e.g., more transfers from taste to sight than vice versa). This distinction highlights the complexity and variability of sensory interactions, suggesting that the nature of these transfers may be influenced by specific conditions or parameters not previously considered.
Notably, participants performed best with the taste-to-smell mapping, likely due to the sensory and anatomical proximity of these modalities, which can be perceived as intra-domain rather than cross-domain mappings. This result is consistent with neuroanatomical and perceptual evidence: despite their distinct receptor locations, taste and smell share common neural pathways that facilitate integration (Stevenson, 2009). For instance, the perception of sweetness can be similarly elicited by both gustatory and olfactory stimuli. This interconnection is further supported by clinical observations showing that taste perception impairments often co-occur with deficits in odor-induced taste perception (Stevenson & Miller, 2013). Corpus-based evidence also supports this tendency: Fishman (2022) reported that smell–taste mappings account for more than half of all cross-sensory associations. Moreover, the prominence of taste as a source domain resonates with C.-R. Huang et al. (2025), who found that taste—rather than touch—received the highest perceptual strength estimate (PSE) score in Mandarin, indicating a stronger degree of lexical-conceptual embodiment. Taken together, these findings indicate that synesthetic transfers emerge from the interaction of physiological, cognitive, and experiential factors, reflecting an embodied organization of the senses rather than a strictly hierarchical one.
In contrast, participants showed the weakest performance in auditory-visual mappings. This finding supports the hierarchical model of sensory transfer (Popova, 2005; Yu, 2003), which posits that mappings typically progress from lower (e.g., taste, touch) to higher (e.g., hearing, vision) sensory modalities. Auditory-visual transfers represent an inverse relationship that violates this conventional pattern, potentially explaining their cognitive difficulty. As hearing is considered a higher-level sense (Popova, 2005), its cross-modal mapping onto vision likely entails additional cognitive processing compared with lower-to-higher sensory transfers. These findings collectively suggest that while certain synesthetic mappings benefit from natural sensory affinities, others may be constrained by hierarchical cognitive patterns.
Hemispheric Processing of Chinese Linguistic Synesthesia
A significant main effect of word pair type was found on both RTs and accuracies with synesthetic word pairs showing lower accuracy and longer RTs compared to literal and semantically unrelated word pairs. Furthermore, no significant differences in synesthetic processing accuracy or RTs were observed between the left and right hemispheres. These findings suggest that linguistic synesthesia involves specialized processing mechanisms, with both the left and right hemispheres playing crucial roles.
The first finding revealed that the processing of linguistic synesthesia requires more cognitive resources compared to both literal and semantically unrelated word pairs. As the pre-test questionnaire data in this study indicated, synesthetic word pairs had significantly lower familiarity than that of literal word pairs. The far semantic mapping distance between the source and target domains in linguistic synesthesia complicates the integration of sensory and conceptual experiences from daily life. These findings are consistent with previous research (M. Huang et al., 2022; L. Shen et al., 2022) that supports the Graded Salience Hypothesis (GSH), demonstrating non-literal meanings requires more cognitive resources, as the literal meaning must first be processed before inferring non-literal interpretations through contextual and semantic mapping mechanisms. In addition, regardless of the processing sequence, the extraction of both literal and synesthetic meanings also makes the processing of linguistic synesthesia more difficult. Understanding non-literal meanings requires retrieving experiential image schemas and analyzing them in specific contexts (Tang et al., 2017). This process involves three types of information: the sensory input itself, evidence from other sensory modalities, and prior knowledge stored in the associative cortex (Leptourgos et al., 2022). When new concepts, such as those related to linguistic synesthesia, enter the brain, they are re-evaluated for their similarities, continuities, or causal relationships with pre-existing concepts. This continuous re-evaluation process updates the brain’s conceptual knowledge network, shaping how we perceive and interact with the world.
The second key finding shows that synesthetic processing engages both cerebral hemispheres, indicating that linguistic synesthesia arises from coordinated, rather than strictly lateralized, neural activity. This interpretation aligns with behavioral and neurophysiological evidence. For example, grapheme–color synesthesia activates not only localized regions such as V4 and the parietal lobe but also a distributed cortical network (Li & Zhao, 2014). Meta-analytic data likewise reveal no consistent right-hemisphere (RH) dominance in nonliteral language comprehension, indicating bilateral engagement: the left hemisphere (LH) supports literal language mechanisms, while the RH contributes to social inference and pragmatic integration, potentially exerting a stronger overall influence (Hauptman et al., 2023). However, some studies report an RH advantage during later or more complex semantic integration (M. Huang et al., 2022; Zhu et al., 2022), possibly reflecting methodological differences. Such inconsistencies likely stem from methodological differences in stimulus type, contextual richness, and timing parameters—particularly shorter stimulus onset asynchronies (SOAs), which may constrain semantic mapping—highlighting the need for greater experimental comparability in future research.
Notably, our results revealed a significant left-hemisphere processing advantage for unrelated word pairs (p < .048). Two plausible interpretations emerge for this unexpected result. First, the pronounced semantic disparity between unrelated word pairs precludes the formation of meaningful connections, obviating the need for right-hemisphere engagement in non-literal semantic integration. Second, the left hemisphere’s domain-specific specialization in literal meaning processing dominates in the absence of salient interstimulus semantic relationships. This interpretation suggests that when processing semantically unrelated pairs, the cognitive system prioritizes left-hemisphere-mediated literal meaning extraction over recruiting right-hemisphere mechanisms for metaphorical or associative computations. However, inherent limitations of the divided visual field (DVF) paradigm—including interhemispheric inhibition and confounding variables such as eye movements and attentional allocation—may complicate interpretations of hemispheric lateralization.
The Connections of Linguistic Synesthesia With Metaphor and Neurological Experience
The relationship between linguistic synesthesia and metaphor has been a focal point in cognitive linguistics, with many scholars considering linguistic synesthesia a subtype of metaphor due to their shared characteristics, such as cross-domain mappings (Popova, 2005; Strik Lievers, 2017; Zhao et al., 2022). Both phenomena involve domain transfers that may be unidirectional or bidirectional (Anaki & Henik, 2017), suggesting common cognitive foundations in embodied experience and sensory-motor interactions (Zhao et al., 2022). Like metaphor, linguistic synesthesia demonstrates creative potential in generating novel expressions that contribute to literary quality (Zhang et al., 2022). Behavioral evidence further supports this connection, as processing patterns for both linguistic synesthesia and metaphor fall between literal and unrelated word pairs in terms of reaction times and accuracy rates, reflecting the cognitive demands of cross-domain interpretation.
However, our findings also reveal differences that distinguish linguistic synesthesia from conventional metaphor. Unlike metaphors’ consistent concrete-to-abstract progression (Lakoff & Johnsen, 1980), linguistic synesthesia shows no stable preference for forward over backward sensory mappings (e.g., hearing→sight vs. sight→hearing). This distinction highlights that synesthetic transfers remain grounded in sensory perception, whereas metaphors bridge sensory and abstract conceptual domains. In terms of hemispheric processing, while metaphor comprehension shows right-hemisphere dominance for novel instances (M. Huang et al., 2022), linguistic synesthesia engages bilateral networks without hemispheric preference. This distributed processing likely reflects the automatic and sensory-based nature of linguistic synesthesia, contrasting with the higher-level cognitive integration required for metaphor comprehension.
The relationship between linguistic synesthesia and neurological synesthesia further elucidates the cognitive and neural underpinnings of cross-modal mappings. The taste-smell mapping advantage in our study—attributable to anatomical and perceptual overlaps—parallels neurological synesthesia’s reliance on distributed neural networks (Rouw, 2011). It is also likely that both synesthetes and non-synesthetes exhibit cross-modal correspondences in everyday life, bridging the gap between linguistic and neurological synesthesia. For example, English-speaking adults commonly associate letters like X and Z with black and O and I with white, regardless of their synesthetic status (Spector & Maurer, 2009). Similarly, the universal “bouba-kiki effect,” linking round shapes to “Bouba” and angular shapes to “Kiki,” demonstrates shared cognitive foundations of cross-modal associations (Bremner et al., 2013). These findings suggest that both linguistic and neurological synesthesia rely on a combination of perceptual and cognitive processes. However, some differences also exist between linguistic synesthesia and neurological synesthesia. The former, often considered a marginal form of neurological synesthesia, relies on incidental soft-wired connections based on conceptualization (Marks & Mulvenna, 2013), while the latter stems from automatic hard-wired cross-activation between brain regions (Banissy et al., 2013). Furthermore, the former may represent a universal capacity of mankind while the latter occurs in only 1% to 4% of the population (Simner et al., 2006).
Taken together, the present study underscores the intricate interplay between linguistic synesthesia, metaphor, and neurological synesthesia, suggesting that bridging these domains could yield significant theoretical and clinical advances. For example, investigating whether individuals with neurological synesthesia exhibit enhanced—or alternatively, constrained—metaphorical and linguistic synesthetic abilities could shed light on the plasticity of cross-sensory cognition. Notably, research indicates that people with neurological synesthesia often employ unconventional linguistic synesthetic descriptions (e.g., “tasting the shape”; Turner & Littlemore, 2023). Moreover, experimental approaches to identifying linguistic markers of synesthesia could not only aid in diagnosing synesthesia-related conditions but also provide early predictive indicators through language analysis. Our findings emphasize the need for unified theoretical frameworks capable of simultaneously addressing the automaticity of neurological synesthesia, the conceptual basis of linguistic synesthesia, and the abstract mappings characteristic of conventional metaphor.
Conclusion
This study investigated the directional preferences, and hemispheric processing of Chinese linguistic synesthesia and its connections with conceptual metaphor and neurological synesthesia using the divided visual field (DVF) paradigm. Three key findings emerge. First, Chinese synesthetic transfers encompass unidirectional, bidirectional, and biased-directional mappings, challenging conventional strict hierarchical models (Ullmann, 1957; Williams, 1976), but aligning with the latest model (Jo, 2022; Zhao et al., 2019). Second, linguistic synesthesia engages bilateral neural networks without lateral dominance, underscoring its automatic, sensory-grounded nature. Third, the tripartite relationship among linguistic synesthesia, metaphor and neurological synesthesia calls for unified models that account for perceptual, linguistic, and neurological dimensions. This study elucidates the intricate interplay between sensory perception, language, and cognition, advancing our understanding of the neuro-cognitive mechanism underlying linguistic synesthesia and its theoretical intersections with metaphor and neurological synesthesia.
Limitations and Future Work
While this study provides novel insights into the cognitive and neural mechanisms of linguistic synesthesia in Chinese, several limitations should be acknowledged, which also pave the way for future research.
First, the scope of our experimental materials presents a constraint. Due to the lexical characteristics of Chinese linguistic synesthesia, our stimuli included no olfactory adjectives and were limited in auditory adjectives and gustatory nouns. This lexical imbalance, while reflective of real-world usage, inevitably restricted the range of sensory mappings examined—particularly those involving smell as a source domain. Future studies could employ more flexible linguistic units, such sentences, to circumvent these lexical gaps and achieve a more comprehensive coverage of all potential cross-modal combinations.
Second, while the Divided Visual Field (DVF) paradigm effectively captures hemispheric tendencies, it also entails methodological limitations. Factors such as interhemispheric transfer, uncontrolled eye movements, and attentional shifts may obscure true lateralization effects. The observed bilateral involvement in synesthetic processing, though informative, requires confirmation through more direct neural measures. Future research should employ neuroimaging methods such as fMRI to identify spatial activation patterns, or high-density EEG and MEG to trace temporal dynamics of synesthetic integration. Additionally, paradigms like the unilateral hand-muscle contraction method (Noufi & Zeev-Wolf, 2024) could actively modulate hemispheric activation, enabling causal tests of each hemisphere’s contribution to linguistic synesthesia.
Third, emerging frameworks and computational models provide promising tools for advancing research on linguistic embodiment and synesthetic mapping. C.-R. Huang et al. (2025) introduced the Perceived Strength of Embodiment (PSE) as a quantitative measure of modality-based conceptual grounding, offering an empirical index for assessing embodiment in linguistic contexts. Integrating the PSE framework with neuroimaging could reveal how lexical-conceptual embodiment supports synesthetic transfer across behavioral and neural levels. Meanwhile, Zhao et al. (2025) developed a neural network model that automatically detects linguistic synesthesia by leveraging culturally embedded linguistic features—such as character radicals and part-of-speech tags. This model underscores the significance of contextual and lexical cues that extend beyond simple sensory modality mappings. Future research could adopt this approach to automatically detect linguistic synesthesia by incorporating culturally enriched linguistic features, offering promising applications in clinical diagnostics, and cross-linguistic cognitive studies.
Finally, the proposed theoretical triangulation among linguistic synesthesia, metaphor, and neurological synesthesia remains speculative based on behavioral evidence alone. Future research could directly compare individuals with neurological synesthesia and non-synesthetic participants using the same linguistic paradigms to examine whether heightened cross-modal perception yields enhanced or qualitatively distinct linguistic synesthetic abilities. Moreover, cross-linguistic investigations involving languages with richer lexical inventories in underrepresented sensory domains (Evans & Wilkins, 2000) could further clarify the balance between universal cognitive mechanisms and language-specific influences.
Footnotes
Acknowledgements
We extend our sincere gratitude to all the participants who generously contributed their time and effort to this study. Their involvement was essential to the success of this research.
Ethical Considerations
This study was approved by the Ethics Committee of Sichuan International Studies University (Ethics Code: 202300008) on May 10th, 2023. All participants provided informed consent prior to participation. The research was conducted ethically in accordance with the World Medical Association Declaration of Helsinki.
Author Contributions
Conceptualization, K.C. and S.C.; data curation, S.C.; original draft preparation, S.C.; review and editing, K.C and Y.C. All authors have read and agreed to the published version of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Chongqing Social Science Planning Project (Grant No. 2023NDYB167), the 2023 Annual Research Project of the American Studies Center in Southwest Jiaotong University (Grant No. ARC2023006) and General Project of the Humanities and Social Sciences Research Fund of the Chinese Ministry of Education (Grant No. 25YJA710007).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
