Sage Journals: Discover world-class research

Abstract

Conventional studies indicate that the strength of crossmodal correspondence, which represents the connection between multiple senses and actual human perception, may influence pleasant or unpleasant emotions in some combinations of sensory modality and stimulus type. In this study, sensory evaluation experiments were conducted to evaluate the crossmodal correspondence between visual and auditory sense mediated by ‘complexity’ using closed curve shapes and tone sequences generated based on curvature entropy and tone entropy. The relationship between the sensory evaluation values of the ‘aesthetic preferences’ of the shape/tone sequence combinations and the weakness of crossmodal correspondence, that is, the difference of curvature and tone entropy, was fitted to the Wundt curve, one of the models of pleasant emotion. As a result, a strong correlation (0.64) was confirmed between the two. Bias due to the musician's experience was confirmed in the sensory evaluation values of some tone sequences.

Keywords

crossmodal correspondences complexity aesthetic preference closed curve shape tone sequence

How to cite this article

Osugi, K., Hayashi, J., Kato, T., & Yanagisawa, H. (2026). Relationship between aesthetic preference and ‘complexity’ mediated crossmodal correspondence of shape and tone sequence. i–Perception, 17(1), 1–17. https://doi.org/10.1177/20416695261420290

Introduction

In recent years, product design strategies that take into account multiple sensory responses have emerged. Examples of this include food packaging designs that improve the significance of selection through shapes that correspond to taste (Spence & Ngo, 2012), and soap container designs whose colour corresponds to its fragrance to improve the sensory evaluation value of the soap's expected cleaning effect (Gatti et al., 2014). This connection between multiple senses and human perception is known as crossmodal correspondence, which is defined as ‘a compatibility effect between attributes or dimensions of a stimulus (i.e., an object or event) in different sensory modalities (be they redundant or not)’ (Spence, 2020). This crossmodal correspondence can be seen in a variety of stimuli, such as brightness and size (Maurer et al., 2006), taste and sound (Bronner et al., 2012; Crisinel & Spence, 2009), shape and texture (Juravle & Spence, 2024), and form and smell (Gal et al., 2007; Seo et al., 2010). The strength of crossmodal correspondence has also been suggested to influence emotion. Specifically, the combination of a blue colour and a smooth shape, which is considered a strong correspondence, has been shown to increase pleasant emotion more than the combination of a blue shape and a complex shape, which is considered a weak correspondence (Bar & Neta, 2007; Lin et al., 2021; Spence, 2011; Wilson & Brewster, 2017). On the other hand, combinations of red visual stimuli and superheated stimuli, which are considered to have a stronger correspondence, have been found to increase unpleasant emotions more than combinations of blue visual stimuli and heated stimuli, which are considered to have a weak correspondence (Wilson & Brewster, 2017). These studies indicate that the strength of crossmodal correspondence may influence pleasant and unpleasant emotions, despite differences in sensory modality and stimulus type.

Crossmodal correspondence is interpreted in the predictive coding theory of neuroscience as ‘a phenomenon where, when two stimuli from different modalities are experienced simultaneously (hereafter, multimodal learning), the presentation of a stimulus from one modality predicts a stimulus from the other modality. Consequently, the actual stimulus is perceived as slightly closer to the predicted stimulus’. (Huang et al., 2024; Talsma, 2015). Furthermore, audiovisual reaction time measurements have demonstrated that crossmodal correspondence strengthens as the number of multimodal learning sessions increases (Huang et al., 2024). Predictive coding theory posits that the brain, based on a Bayesian model, treats the product of the prior distribution (the probability distribution of predicted stimuli) and the likelihood function (the probability distribution of actual stimuli) as the posterior distribution (the probability distribution of stimulus perception). It further assumes that the prior distribution is updated to the posterior distribution (learning) (Berniker et al., 2010). Here, the cognitive misalignment implies that the posterior distribution – the product of the prior distribution and the likelihood function – becomes biased toward the prior distribution. This bias is known to decrease as the variance of the prior distribution increases, thereby increasing free energy (a measure of information representing the difference between the actual stimulus and its prediction), which is parameterized by the difference between the prior and posterior distributions (Yanagisawa, 2021). Sekoguchi and Yanagisawa (2020) mathematically hypothesized that this free energy corresponds to Berlyne's (1960) arousal potential (i.e., positive affect follows an inverted U-shaped Wundt curve as a function of free energy). This hypothesis has been verified in several studies concerning aesthetic preferences for car or butterfly shapes, and the pleasantness of music deviating from musical rules (Miyamoto & Yanagisawa, 2021; Sasaki et al., 2024; Schoormans & Robben, 1997; Van de Cruys & Wagemans, 2011; Yanagisawa, 2021; Yanagisawa et al., 2019). Here, the variance of the prior distribution increases with insufficient learning iterations (Berniker et al., 2010). Therefore, the variance is considered to increase when the number of multimodal learning iterations, which is a factor in crossmodal correspondence, is low (i.e., it decreases when crossmodal correspondence is strong). Furthermore, this increase in prior variance has been shown to increase free energy (Yanagisawa, 2021). Based on the series of research results described above, it can be inferred that positive emotion takes an inverted U-shaped form as a function of the strength of crossmodal correspondence. However, most studies on crossmodal correspondence have qualitatively analysed the presence or absence of crossmodal correspondence using two levels of stimuli; few studies have used multilevel parameterized stimuli. To fill this gap, Hayashi et al. (2024) conducted a quantitative study of crossmodal correspondence mediated by ‘complexity’ for visual and auditory stimuli using two-dimensional closed curve shapes (2D shapes) as visual stimuli and tone sequences consisting of seven pitches as auditory stimuli. The results revealed that shapes with low (high) curvature entropy are more readily selected for tone sequences with low (high) tone entropy. This suggests that the smaller (larger) the difference between the two entropies, the stronger (weaker) the crossmodal correspondence becomes. Therefore, similarly to Sekoguchi and Yanagisawa (2020), by using the relationship between free energy and pleasant emotion, it may be possible to clarify the relationship between the weakness of crossmodal correspondence and pleasant emotion using combinations of multilevel visual and auditory stimuli selected with the difference between tone and curvature two entropies as a parameter. Here, as the difference between the two entropies increases (indicating weaker crossmodal correspondence), the level of arousal also increases. Therefore, to align the relationship between the difference in entropy and the level of arousal, we will express the difference in entropy as the ‘weakness’ of crossmodal correspondence.

The present study aims to examine the influence of crossmodal correspondence weakness on pleasant emotions. The study also fits the Wundt curve to the sensory evaluation value of ‘aesthetic preference’ for the same value and the combination of shape and tone sequence. This could be applied to diverse fields such as product design, multimedia art, user experience, and multisensory marketing if it clarifies the relationship between the weakness of crossmodal correspondence and pleasant emotion. For example, in product design, it could provide guidelines for deriving specifications related to multiple senses (shape, surface characteristics, sound, etc.) to achieve pleasant emotion toward the product (such as beauty or aesthetic preference).

The remainder of the paper is organized as follows. Section Indices of Stimuli describes the curvature and tone entropies, which are indices of ‘complexity’ of shapes and sequences, and the index of weakness of crossmodal correspondence, which is expressed as the difference between them. Section Experiment describes the experiment in detail and analyses the results of the experiment, and Section Conclusion presents the conclusions and limitations of this study.

Indices of Stimuli

‘Complexity’ of Shapes

Fractal dimension (Spehar et al., 2003) and curvature entropy (Ujiie et al., 2012) have been proposed as quantifiers of 2D shape ‘complexity’. The former is applied to shapes with discrete features such as straight lines and texture patterns, while the latter is applied to shapes with continuous properties such as closed curve shapes. In this study, we used curvature entropy because it is appropriate for the ‘complexity’ of closed curve shapes (Biederman & Ju, 1988; Matsumoto et al., 2019), which are considered to have a significant impact on the impression of a product. The method for calculating curvature entropy is described in Equation (1).

H = - \frac{1}{\log_{2} V} \sum_{i = 1}^{V} \sum_{j = 1}^{V} q_{i} q_{i, j} \log_{2} q_{i, j} (0 \leq H \leq 1)

(1)

First, the closed curve shape is divided equally and the curvature at each division point is calculated. Next, since the value of curvature varies with the size of the curve shape, the curvature is non-dimensionalized by multiplying the maximum radius from the centre of gravity of the closed curve shape by the curvature to obtain the non-dimensionalized curvature. The calculated non-dimensionalized curvature is then used to obtain the non-dimensionalized curvature function, which is a function of curve length and non-dimensionalized curvature, and the range of the non-dimensionalized curvature function is discretized by dividing it by the number of states V. Finally, using the probability of occurrence $q_{i}$ of state i and the transition probability $q_{i, j}$ from state i to state j, the curvature entropy H is calculated. The value of H is larger when the probability of occurrence of each state and the transition probability between states are equal. In other words, curvature entropy quantifies irregular changes in curvature as ‘complexity’.

‘Complexity’ of Tone Sequences

Fractal dimension (Beauvois, 2007) and tone entropy (Delplanque et al., 2019) have been proposed as quantification measures of ‘complexity’ of tone sequence. The former is applied only for long tone sequences of at least 64 tones (Beauvois, 2007) to quantify repeated representations. The latter is, however, calculated by the probability of occurrence of individual tones or intervals in a more localized manner and is applied only for short tone sequences of around seven tones. In this study, we used tone entropy because it is suitable for the ‘complexity’ of tones as short as seven tones (Hsu et al., 2021), where the repetition effect does not cause changes in aesthetic preferences. For pitch entropy, a preliminary experiment (Appendix 2) was performed to determine the most suitable out of the three types. Specifically, first-order entropy represents the probability of the occurrence of a pitch, second-order entropy represents the probability of the occurrence of a transition in a pitch, and averaged entropy is the average of the two. The results of the preliminary experiment supported the use of mean entropy, which had the strongest correlation with tone ‘complexity’. The mean entropy $E_{A}$ is the average of the first-order entropy E₁ and the second-order entropy E₂ and is calculated as follows.

E_{A} = \frac{E_{1} + E_{2}}{2} (E_{1} = - \sum_{l = 1}^{N} p (x_{k}) \log_{2} p (x_{k}), E_{2} = - \sum_{l = 1}^{M} p (Δ x_{l}) \log_{2} p (Δ x_{l}))

(2)

First-order entropy E₁ is the entropy of the probability $p (x_{k})$ of the occurrence of each tone $x_{k}$ in a tone sequence consisting of N different tones $x_{k}$ (k = 1, 2,…, N). Second-order entropy E₂ is the entropy of the probability $p (Δ x_{l})$ of occurrence of $N - 1$ different pitches (the height separation between two pitches) $Δ x_{l}$ (l = 1, 2,…, $N - 1$ ), which can consist of any combination of N different pitches. It must be noted that the pitches are calculated as the difference in the order of ascending pitches, without distinguishing between semitones and whole tones. To describe this difference in the order of ascending pitches, we denote E3, F3, G3, …, E4 by integers such as 1, 2, 3, …, 8. The value of first-order entropy E₁ is larger the more equal the probability of occurrence of each pitch, and the value of second-order entropy E₂ is larger the more equal the probability of occurrence of the tone between each tone. Therefore, first-order entropy quantifies the irregularity of the occurrence of tones as ‘complexity’, while second-order entropy quantifies the irregularity of the transitions between tones as ‘complexity’. The mean entropy, which is the average of these, therefore quantifies the irregularity of the occurrence and transitions of tones as ‘complexity’.

Weakness of Crossmodal Correspondence

As mentioned above, a previous study (i.e., Hayashi et al., 2024) has confirmed that crossmodal correspondence occurs when tone sequences with low (high) tone entropy and shapes exhibiting low (high) curvature entropy are combined. The relative magnitude of these two entropies is determined not by absolute evaluation, but by relative evaluation within the entropy range of the presented visual or auditory stimuli. In this study, the weakness of crossmodal correspondence is defined as the absolute value of the difference between normalized curvature entropy H′ for the shape and the normalized averaged entropy E_A′ for the tone sequence |H′－E_A′|. Normalization was performed to obtain maximum and minimum entropy values of 1 and 0, respectively, for the presented stimulus group.

Experiment

Stimuli

Shapes

The presented closed curve shape was created using a cubic Bézier curve. Fourteen connection points were defined that could be moved within a certain range. The movable range was defined as half the distance to the nearest junction point. The curvature of the 14 connection points defined was used as the design variable, and the absolute difference between the curvature entropy calculated from the 14 curvatures and the target value was used as the target characteristic.

The maximum and minimum values of curvature entropy were obtained by using particle swarm optimization to search for the shape that maximizes and minimizes the curvature entropy without setting a target value. The range of these two curvature entropy values was divided into four equal parts and three levels of curvature entropy values were added to obtain a total of five levels of curvature entropy values, which were then set as the target values for particle swarm optimization, and a search was conducted to obtain a shape with five levels of curvature entropy. The circle shape with the minimum (0) curvature entropy was excluded to account for simple contact effects (Graf & Landwehr, 2015; Hekkert et al., 2013; Zajonc, 1968). This is because shapes with many contact opportunities overestimate pleasant emotions compared to shapes with few contact opportunities (Bornstein, 1989; Shimizu et al., 2024).

In conventional studies, the variability in sample shapes and tone sequences was greater than the variability among participants. Therefore, this study prioritized confirming that similar results could be obtained across different samples by conducting verification experiments divided into three groups. Consequently, the above procedure was repeated three times, and a total of 15 shapes with five levels of curvature entropy were created, three each (Table 1). The shapes were presented as closed curve shapes, which are considered to be correlated with the sensory evaluation of ‘complexity’ (Ujiie et al., 2012).

Table 1.

Generated 2D shapes and their curvature entropy value.

Tone Sequences

A tone sequence consisting of seven tones was randomly created from the eight tones from E3 to E4 (Delplanque et al., 2019). The seven tones of the created tone sequence were used as design variables, and the absolute value of the difference between the mean entropy calculated from the seven tones and the target value was used as the target characteristic. The design variables that minimize the target characteristics were searched for by GRG nonlinear programming with integer constraints using the following procedure to create a sequence of tones.

Without setting a target value for tone entropy, the maximum and minimum values of tone sequence entropy were obtained by using GRG nonlinear programming to search for the tone sequence that maximizes and minimizes the target value. The range of these two tone entropy values was divided into nine equal parts and eight levels were added to obtain a total of 10 levels of tone entropy values. A search was then conducted by setting the values of the 10 levels of tone entropy to the target values of GRG nonlinear programming, and tone sequences with 10 levels of tone entropy were obtained. Among the randomly generated tone sequences, the last three tones of a tone sequence whose last tone corresponded to the tone predicted from the first two tones were excluded. This is because preference tends to increase when the last tone sequence is predictable, even if it is a complex tone sequence (Graf & Landwehr, 2015).

For the presented tone sequences, following the same reason as for the presented shapes, we created three tone sequences with 10 levels of tone entropy, for a total of 30 tone sequences (Table 2). The tone sequences were presented using the sound of a grand piano, which is considered to be less likely to cause differences in recognition of chord structures and chord progressions as a result of differences in musical experience (Bigand et al., 1996; Miyamoto & Yanagisawa, 2021; Palmer et al., 2013, 2016).

Table 2.

Generated tone sequences and their tone entropy values.

Entropy Levels	Group I	Group II	Group III	E _A
a	F3 F3 F3 F3 F3 F3 F3	A3 A3 A3 A3 A3 A3 A3	B3 B3 B3 B3 B3 B3 B3	0
b	F3 F3 F3 F3 F3 F3 A3	A3 A3 A3 A3 A3 A3 F3	D4 B3 B3 B3 B3 B3 B3	0.430
c	F3 F3 F3 F3 G3 F3 F3	B3 B3 B3 A3 B3 B3 B3	A3 A3 A3 A3 A3 D4 A3	0.639
d	E3 E3 E3 E3 E3 F3 G3	C4 A3 F3 F3 F3 F3 F3	G3 A3 B3 B3 B3 B3 B3	0.716
e	A3 A3 A3 A3 A3 G3 E3	E3 A3 A3 A3 A3 A3 B3	F3 F3 F3 F3 F3 A3 G3	0.832
f	F4 A3 G3 B4 A3 G3 B3	E3 F3 G3 A3 B3 B3 B3	B3 C4 D4 E4 F3 F3 F3	1.056
g	C4 D4 E3 A3 E3 D4 D4	B3 C4 D4 E4 F3 G3 E3	C4 G3 B3 D4 G3 B3 F3	1.257
h	C4 E3 F3 E3 E3 A3 A3	D4 G3 D4 D4 F3 G3 F3	A3 C4 F3 A3 D4 F3 C4	1.435
i	C4 A3 D4 B3 F3 G3 G3	G3 A3 G3 B3 F3 C4 D4	F3 C4 G3 A3 F3 A3 D4	1.654
j	A3 G3 C4 E3 B3 F3 D4	C4 E3 B3 F3 D4 G3 A3	B3 D4 G3 F3 C4 E3 A3	1.869

Procedure

Thirty participants were divided into three groups as for the shape and tone sequence to ensure proper experimental load and generalization performance. Group I included 10 participants (6 males and 4 females) aged 21 to 24, Group II included ten participants (6 males and 4 females) aged 20 to 26, and Group III included 10 participants (5 males and 5 females) aged 20 to 24. The following are the procedures for the sensory evaluation of crossmodal correspondence between the shape and the tone sequence performed by each participant.

Participants were seated in front of a shape monitor (BenQ GW2760) while wearing headphones (SHURE SRH440A) for the presentation of tone sequences (Figure 1).

Before the start of the sensory evaluation, five sample shapes (Table 1) and 10 sample tone sequences (Table 2) were all presented to the participant.

Only the shapes were presented to the participants. Seven-point scales were used to rate ‘complexity’ (1: strongly simple, 7: strongly complex) and ‘aesthetic preference’ (1: strongly disagree, 7: strongly agree). The order in which the shapes were presented was randomized.

The participants were presented with only the tone sequences and asked to rate the ‘complexity’ and ‘aesthetic preference’ of the tone sequences in the same way as they rated the shapes.

Ten seconds after the shape was presented, the tone sequence was also presented, and the participants were asked to evaluate their ‘aesthetic preference’ for the combination of the shape and the tone sequence.

Participants were asked in a post-experiment interview about their impressions of the shapes and sound sequences treated in the experiment.

Figure 1.

Experimental environment.

Results and Discussion

Figure 2 shows the relationship between the sum of the aesthetic preferences when the shape and tone sequences are presented individually and when they are presented simultaneously. The plots in the figure are divided into five groups, from Group 1 (low) to Group 5 (high), according to the magnitude of the difference between curvature entropy and tone entropy, a measure of the ‘complexity’ of the shape and tone sequences (Gap group). If the weakness of crossmodal correspondence does not affect aesthetic preferences, the plots are placed near the right ascending straight line. Figure 2 also shows that Gap groups with small differences in ‘complexity’ between shapes and tone sequences, such as Group 1 and 2, are placed higher than Group 4 and 5, which have larger differences. This suggests that the smaller the weakness of crossmodal correspondence is, the greater the aesthetic preferences when the shapes and tone sequences are presented simultaneously.

Figure 2.

Relationship between the sum of aesthetic preferences.

We calculated the difference between the sum of the aesthetic preferences for the combination of shape and tone sequences and the sum of the aesthetic preferences for the shape and tone sequences alone (Difference of aesthetic preferences). This index represents the amount of change in aesthetic preferences due to weak crossmodal correspondence. We conducted a two-way ANOVA with the experimental group (Group I vs. Group II vs. Group III) and Gap group (Group 1 vs. Group 2 vs. Group 3 vs. Group 4 vs. Group 5) as factors. The results showed that the main effect of the experimental group was not significant (p = 0.99), while the main effect of Gap group was significant (p = 0.00).

Next, a box-and-whisker plot of Difference of aesthetic preferences for Gap group and the results of a Bonferroni multiple comparison test were performed (Figure 3).

Figure 3.

Relationship between complexity difference and deviation of aesthetic preferences (mean of least squares error). *means p < 0.05, ****means p < 0.0001.

Figure 3 shows small (p < 0.05) and very small (p < 0.0001) p-values between some groups. Here, the effect size η² is 0.41, confirming a large effect. Furthermore, while difference of aesthetic preference was positive and tended to increase in Groups 1 and 2, where the differences in shape and ‘complexity’ of tone sequences were small, while in Groups 3, 4, and 5, Difference of aesthetic preference changed from positive to negative values. On the other hand, Difference of aesthetic preference decreases from positive values to negative values in Groups 3, 4, and 5. Furthermore, focusing on Groups 1 and 2, where the Gap is small, we see that Difference of aesthetic preference in Group 1, where the Gap is small, is small, while in Group 2, where the Gap is large, Difference of aesthetic preference is large. These results suggest the same characteristics as the Wundt curve, that is, when the Gap is small (large), pleasant emotions are obtained (or not obtained), but when the Gap is too small, pleasant emotions are not obtained. In other words, the relationship between the weakness of crossmodal correspondence and aesthetic preferences is similar to the inverted U-shaped relationship of the Wundt curve.

Figure 4 shows a scatter plot of the sensory evaluation values for crossmodal correspondence weakness and aesthetic preference, with an approximation of the Wundt curve added. The explanation of the Wundt curve function and parameter selection is shown in Appendix 3.

Figure 4.

Relationship between weakness of crossmodal correspondence and aesthetic preference.

Some tone sequences (Group I tone sequence f (denoted as tone sequence I-f), tone sequence I-g, tone sequence II-g, and tone sequence III-f) deviate from the Wundt curve, although the inverse U-shaped trend of the Wundt curve with a high coefficient of determination can be confirmed (R² = 0.64). Three reasons for the deviation of these tone sequences from the inverted-U trend can be cited. First, as in tone sequences I-g, II-g, and III-f, the magnitude of the change in the tone sequence affects the evaluation of ‘complexity’. This was evident in the post-experiment interviews with the participants, where many commented that they felt more complex when the transitions between tones were large. Second, there are individual differences in the assessment of ‘complexity’ and ‘aesthetic preference’ among participants. This was inferred from the fact that the variation in sensory evaluation scores for ‘complexity’ and ‘aesthetic preference’ differed between participants (e.g., some participants gave extreme evaluations while others gave moderate evaluations). Third, participants felt that tone sequences with pitches not predicted by music theory models (Narmour, 1992), such as Sequence I-f and Sequence I-g, were complex. This was evident in the post-experiment interviews with the participants, where many of them commented that when a different scale from the prediction exists in the first half, the subsequent scales become unpredictable, and the entire sequence is perceived as complex. This aligns with the implicit realization model, which determines whether the prediction of the third tone that occurs after hearing two tones matches the third tone actually heard (Zbikowski & Narmour, 1993). According to this model, tone sequence I-f in Table 1 differs from the prediction of the fourth tone from the first three tones, and tone sequence I-g differs from the prediction of the third tone from the first two tones. Since such model predictions are more likely to occur in subjects with higher sensitivity to music, it is possible that these subjects overestimated the ‘complexity’ of such tone sequences more than other subjects (Margulis & Beatty, 2008, Roeder, 1993). In fact, during post-experiment interviews with participants, several participants with over 3 years of experience with musical instruments commented: ‘After A3 and G3, F3 would have been natural, but B3 came instead, making the tone sequence I-f feel complex’, and ‘Based on major and minor scale rules, after C4 and D4, E4 would have been natural, but E3 came instead, making the tone sequence I-g feel complex’.

In order to exclude the overestimation of ‘complexity’ due to musical knowledge, we analysed sensory evaluation values of crossmodal correspondence weakness and aesthetic preference when subjects with musical experience (more than 3 years of instrumental experience) were excluded (Figure 5).

Figure 5.

Relationship between weakness of crossmodal correspondence and aesthetic preference without participants with musical experience.

Figure 5 shows that the sensory evaluation values for tone sequences I-f, I-g, II-g, and III-f, which previously did not follow the trend, are now located near the Wundt curve, confirming an inverse U-shaped trend of the Wundt curve with a high coefficient of determination (R² = 0.69).

Conclusion

In this study, we examined the effect of the weakness of crossmodal correspondence on pleasant emotions using shapes and tone sequences generated based on a ‘complexity’ index. First, we proposed a quantification index for the weakness of crossmodal correspondence using the difference between the ‘complexity’ indices of the shapes and tone sequences. Next, we fitted the Wundt curves to the sensory evaluation values of ‘aesthetic preference’ for the same values and combinations of shape and tone sequences. The results suggest a possible influence of the weakness of crossmodal correspondence on pleasant emotions, and that the qualitative trend of the Wundt curve is consistent with that of its qualitative trend.

Specifically, the relationship between the weakness of crossmodal correspondence and pleasant emotion was found to be consistent with the qualitative trend of the Wundt curve, except for some tone sequences. Additionally, it was confirmed that the magnitude of the change in tone scale affected the evaluation of ‘complexity’. The corresponding tone sequence deviated from the trend of the Wundt curve in the relationship between the weakness of crossmodal correspondence and pleasant emotion. The greater the weakness of crossmodal correspondence, the smaller the aesthetic preference when the shape and tone sequence are presented simultaneously.

Regarding music knowledge, experienced musicians perceived tone sequences with tones not predicted by the model based on music theory as more complex. Therefore, it was confirmed that the corresponding tone sequences deviated from the trend of the Wundt curve in the relationship between the weakness of crossmodal correspondence and pleasant emotions.

The following study limitations – related to index and participant characteristics (discussed separately) – should be considered when interpreting the results of this study. Regarding index, the following two points are noted. First, it fails to consider indices of ‘complexity’ in tone sequences other than tone entropy. Therefore, it is necessary to verify whether similar results to this study can be obtained using other ‘complexity’ indices, such as the sum of pitch magnitudes in piano (Prince & Pfordresher, 2012) or the total number of deviations from strong beat in the hi-hat within drums (Mezza et al., 2023). Second, there is no guarantee that the proposed index of weakness of crossmodal correspondence (the difference between two entropies), which was based on the experimental results of a single prior study, is optimal. Therefore, it is necessary to compare the proposed index with other indices. Regarding participant characteristics, the following three points can be noted. First, it fails to account for participants’ cultural background (such as nationality, age, occupation, customs, history, and education), cognitive style (directions of mental information processing like empathizing or systemizing), and emotional state (such as energy or depression). These factors have been shown to influence aesthetic preferences (Greenberg et al., 2015; Lee et al., 2025; She et al., 2025). Therefore, experiments using participants of non-Japanese students and experiments collecting data on cognitive styles and emotional states are necessary to investigate their influence on aesthetic preferences. Second, while we excluded participants with a high level of music theory understanding based on the criterion of ‘three or more years of instrumental experience’, there is no guarantee that this criterion is optimal. Therefore, it is necessary to verify criteria that include factors such as the type of instrument played and the intensity of practice. Third, while the experiment was conducted with a sample size comparable to that of prior studies on crossmodal correspondence (Krugliak & Noppeney, 2016; Vi et al., 2020), the number of participants was limited compared to other conventional studies (Bonetti & Costa, 2017; Gurman et al., 2021). Therefore, to achieve greater generalizability, it is necessary to conduct experiments with a larger number of participants.

Footnotes

ORCID iDs

Kohei Osugi

Jumpei Hayashi

Takeo Kato

Hideyoshi Yanagisawa

Author Contribution(s)

Kohei Osugi: Conceptualization; Data curation; Formal analysis; Investigation; Writing – original draft.

Jumpei Hayashi: Conceptualization; Project administration; Writing – review & editing.

Takeo Kato: Conceptualization; Formal analysis; Supervision; Writing – review & editing.

Hideyoshi Yanagisawa: Conceptualization; Writing – review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Japan Society for the Promotion of Science, (grant number 23K11746, 25H01132).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix 1

Appendix 2

Appendix 3

References

Bar

Neta

(2007). Visual elements of subjective preference modulate amygdala activation. Neuropsychologia, 45(10), 2191–2200. https://doi.org/10.1016/j.neuropsychologia.2007.03.008

Beauvois

M. W.

(2007). Quantifying aesthetic preference and perceived complexity for fractal melodies. Music Perception, 24(3), 247–264. https://doi.org/10.1525/mp.2007.24.3.247

Berlyne

D. E

. (1960). Conflict, arousal, and curiosity. McGraw-Hill Book Company. https://doi.org/10.1037/11164-000

Berniker

Voss

Kording

(2010). Learning priors for Bayesian computations in the nervous system. PLoS ONE, 5(9), e12686. https://doi.org/10.1371/journal.pone.0012686

Biederman

(1988). Surface versus edge-based determinants of visual recognition. Cognitive Psychology, 20(1), 38–64. https://doi.org/10.1016/0010-0285(88)90024-2

Bigand

Parncutt

Lerdahl

(1996). Perception of musical tension in short chord sequences: The influence of harmonic function, sensory dissonance, horizontal motion, and musical training. Perception & Psychophysics, 58(1), 124–141. https://doi.org/10.3758/bf03205482

Bonetti

Costa

(2017). Pitch-verticality and pitch-size cross-modal interactions. Psychology of Music, 46(3), 340–356. https://doi.org/10.1177/0305735617710734

Bornstein

R. F.

(1989). Exposure and affect: Overview and meta-analysis of research, 1968–1987. Psychological Bulletin, 106(2), 265–289. https://doi.org/10.1037/0033-2909.106.2.265

Bronner

Frieler

Bruhn

Hirt

Piper

(2012). What is the sound of citrus? Research on the correspondences between the perception of sound and flavour. ICMPC. https://www.researchgate.net/publication/231226052_What_is_the_Sound_of_Citrus_Research_on_the_Correspondences_between_the_Perception_of_Sound_and_Flavour

10.

Crisinel

A. S.

Spence

(2009). Implicit association between basic tastes and pitch. Neuroscience Letters, 464(1), 39–42. https://doi.org/10.1016/j.neulet.2009.08.016

11.

Delplanque

De Loof

Janssens

Verguts

(2019). The sound of beauty: How complexity determines aesthetic preference. Acta Psychologica, 192, 146–152. https://doi.org/10.1016/j.actpsy.2018.11.011

12.

Gal

Christian

W. S.

Shiv

(2007). Ssrn.com. Modal Influences on Gustatory Perception. http://ssrn.com/abstract=1030197

13.

Gatti

Bordegoni

Spence

(2014). Investigating the influence of colour, weight, and fragrance intensity on the perception of liquid bath soap: An experimental study. Food Quality and Preference, 31, 56–64. https://doi.org/10.1016/j.foodqual.2013.08.004

14.

Graf

L. K. M.

Landwehr

J. R.

(2015). A dual-process perspective on fluency-based aesthetics: The pleasure-interest model of aesthetic liking. Personality and Social Psychology Review, 19(4), 395–410. https://doi.org/10.1177/1088868315574978

15.

Greenberg

D. M.

Baron-Cohen

Stillwell

D. J.

Kosinski

Rentfrow

P. J.

(2015). Musical preferences are linked to cognitive styles. PLOS ONE, 10(7), e0131151. https://doi.org/10.1371/journal.pone.0131151

16.

Gurman

McCormick

C. R.

Klein

R. M.

(2021). Crossmodal correspondence between auditory timbre and visual shape. Multisensory Research, 35(3), 221–241. https://doi.org/10.1163/22134808-bja10067

17.

Hayashi

Kato

Yanagisawa

(2024). Complexity mediated cross-modal correspondence between tone sequences and shapes. International Journal of Affective Engineering, 23(2), 95–107. https://doi.org/10.5057/ijae.IJAE-D-23-00048

18.

Hekkert

Thurgood

Whitfield

T. W. A.

(2013). The mere exposure effect for consumer products as a consequence of existing familiarity and controlled exposure. Acta Psychologica, 144(2), 411–417. https://doi.org/10.1016/j.actpsy.2013.07.015

19.

Hsu

Y. F.

Darriba

Á.

Waszak

(2021). Attention modulates repetition effects in a context of low periodicity. Brain Research, 1767, 147559. https://doi.org/10.1016/j.brainres.2021.147559

20.

Huang

Y. T.

C. T.

Fang

Y. X. M.

C. K.

Koike

Chao

Z. C.

(2024). Crossmodal hierarchical predictive coding for audiovisual sequences in the human brain. Communications Biology, 7(1), 1–15. https://doi.org/10.1038/s42003-024-06677-6

21.

Juravle

Spence

(2024). Beauty is context-dependent: Naturalness, familiarity, and semantic meaning influence the appreciation of geometric shapes. i-Perception, 15(6), 20416695241303004. https://doi.org/10.1177/20416695241303004

22.

Krugliak

Noppeney

(2016). Synaesthetic interactions across vision and audition. Neuropsychologia, 88, 65–73. https://doi.org/10.1016/j.neuropsychologia.2015.09.027

23.

Lee

Geert

Celen

Marjieh

van Rijn

P. M.

Jacoby

(2025). Visual and Auditory Aesthetic Preferences Across Cultures. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2502.14439

24.

Lin

Scheller

Feng

Proulx

M. J.

Metatla

(2021). Feeling colours: Crossmodal correspondences between tangible 3D objects, colours and emotions. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–12). ACM. https://doi.org/10.1145/3411764.3445373

25.

Margulis

E. H.

Beatty

A. P.

(2008). Musical style, psychoaesthetics, and prospects for entropy as an analytic tool. Computer Music Journal, 32(4), 64–78. JSTOR. https://doi.org/10.1162/comj.2008.32.4.64

26.

Matsumoto

Sato

Matsuoka

Kato

(2019). Quantification of “complexity” in curved surface shape using total absolute curvature. Computers & Graphics, 78, 108–115. https://doi.org/10.1016/j.cag.2018.10.009

27.

Maurer

Pathman

Mondloch

C. J.

(2006). The shape of Boubas: Sound-shape correspondences in toddlers and adults. Developmental Science, 9(3), 316–322. https://doi.org/10.1111/j.1467-7687.2006.00495.x

28.

Mezza

A. I.

Zanoni

Sarti

(2023). A latent rhythm complexity model for attribute-controlled drum pattern generation. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1). https://doi.org/10.1186/s13636-022-00267-2

29.

Miyamoto

Yanagisawa

(2021). Modeling acceptable novelty based on Bayesian information. International Journal of Affective Engineering, 20(4), 265–274. https://doi.org/10.5057/ijae.IJAE-D-21-00001

30.

Narmour

(1992). The analysis and cognition of basic melodic structures: The implication-realization model. Notes, 49(2), 588. https://doi.org/10.2307/897927

31.

Palmer

S. E.

Langlois

T. A.

Schloss

K. B.

(2016). Music-to-color associations of single-line piano melodies in non-synesthetes. Multisensory Research, 29(1–3), 157–193. https://doi.org/10.1163/22134808-00002486

32.

Palmer

S. E.

Schloss

K. B.

Prado-León

L. R.

(2013). Music-color associations are mediated by emotion. Proceedings of the National Academy of Sciences of the United States of America, 110(22), 8836–8841. https://doi.org/10.1073/pnas.1212562110

33.

Prince

J. B.

Pfordresher

P. Q.

(2012). The role of pitch and temporal diversity in the perception and production of musical sequences. Acta Psychologica, 141(2), 184–198. https://doi.org/10.1016/j.actpsy.2012.07.013

34.

Roeder

(1993). The analysis and cognition of basic melodic structures: The implication-realization model Eugene Narmour. Music Theory Spectrum, 15(2), 267–272. https://doi.org/10.2307/745819

35.

Sasaki

Kato

Yanagisawa

(2024). Quantification of “novelty” based on free-energy principle and its application for “aesthetic liking” for industrial products. Research in Engineering Design, 35(1), 21–41. https://doi.org/10.1007/s00163-023-00422-6

36.

Schoormans

J. P. L.

Robben

H. S. J.

(1997). The effect of new package design on product attention, categorization and evaluation. Journal of Economic Psychology, 18(2–3), 271–287. https://doi.org/10.1016/S0167-4870(97)00008-1

37.

Sekoguchi

Yanagisawa

(2020). Modelling acceptable novelty transitions with emotional habituation: Effects of uncertainty and prediction error on preference changes. BioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2020.08.27.269811

38.

Seo

H. S.

Arshamian

Schemmer

Scheer

Sander

Ritter

Hummel

(2010). Cross-modal integration between odors and abstract symbols. Neuroscience Letters, 478(3), 175–178. https://doi.org/10.1016/j.neulet.2010.05.011

39.

She

Huang

Bao

(2025). Emotional state as a key driver of public preferences for flower color. Horticulturae, 11(1), 54–54. https://doi.org/10.3390/horticulturae11010054

40.

Shimizu

Okamoto

Ieda

Kato

(2024). Index for quantifying “order” in three-dimensional shapes. Symmetry, 16(4), 381–381. https://doi.org/10.3390/sym16040381

41.

Spehar

Clifford

C. W. G.

Newell

B. R.

Taylor

R. P.

(2003). Universal aesthetic of fractals. Computers & Graphics, 27(5), 813–820. https://doi.org/10.1016/S0097-8493(03)00154-7

42.

Spence

(2011). Crossmodal correspondences: A tutorial review. Attention, Perception & Psychophysics, 73(4), 971–995. https://doi.org/10.3758/s13414-010-0073-7

43.

Spence

(2020). Simple and complex crossmodal correspondences involving audition. Acoustical Science and Technology, 41(1), 6–12. https://doi.org/10.1250/ast.41.6

44.

Spence

Ngo

M. K.

(2012). Assessing the shape symbolism of the taste, flavour, and texture of foods and beverages. Flavour, 1(1). https://doi.org/10.1186/2044-7248-1-12

45.

Talsma

(2015). Predictive coding and multisensory integration: An attentional account of the multisensory mind. Frontiers in Integrative Neuroscience, 09, 1–13. https://doi.org/10.3389/fnint.2015.00019

46.

Ujiie

Kato

Sato

Matsuoka

(2012). Curvature entropy for curved profile generation. Entropy, 14(3), 533–558. https://doi.org/10.3390/e14030533

47.

Van de Cruys

Wagemans

(2011). Putting reward in art: A tentative prediction error account of visual art. I-Perception, 2(9), 1035–1062. https://doi.org/10.1068/i0466aap

48.

C. T.

Marzo

Memoli

Maggioni

Ablart

Yeomans

Obrist

(2020). Levisense: A platform for the multisensory integration in levitating food and insights into its effect on flavour perception. International Journal of Human-Computer Studies, 139, 102428. https://doi.org/10.1016/j.ijhcs.2020.102428

49.

Wilson

Brewster

S. A.

(2017). Multi-moji: Combining thermal, vibrotactile & visual stimuli to expand the affective range of feedback. ACM, 1743–1755. https://doi.org/10.1145/3025453.3025614

50.

Yanagisawa

(2021). Free-energy model of emotion potential: Modeling arousal potential as information content induced by complexity and novelty. Frontiers in Computational Neuroscience, 15, 1–13. https://doi.org/10.3389/fncom.2021.698252

51.

Yanagisawa

Kawamata

Ueda

(2019). Modeling emotions associated with novelty at variable uncertainty levels: A Bayesian approach. Frontiers in Computational Neuroscience, 13, 2. https://doi.org/10.3389/fncom.2019.00002

52.

Zajonc

R. B.

(1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(Pt.2), 1–27. https://doi.org/10.1037/h0025848

53.

Zbikowski

Narmour

(1993). The analysis and cognition of basic melodic structures: The implication-realization model. Journal of Music Theory, 37(1), 177. https://doi.org/10.2307/843949