Abstract
The examination of cross-modal correspondences between auditory and olfactory senses opens up an intriguing perspective into the study of extra-musical meaning. In a behavioral experiment, musically trained participants were presented with 26 complex synthetic tones and 12 aromatic stimuli. Their task was to report potential associations between the two. The data analysis revealed that the majority of scents featured at least one association with a sound that was above chance. The salient acoustical correlates of basic aromatic categories could be summarized as follows: both fruity (e.g., cherry, melon, and pomegranate) and sour aromas exhibited a positive correlation with pitch. Fruity scents, in addition, were more likely to be associated with sounds featuring pronounced low harmonic partials (1st–4th), low noise content, low roughness, and a greater number of distinct pitches. Conversely, sour aromas were linked with stronger energy in the higher frequencies. Sweet scents correlated with sounds characterized by a lower spectral centroid, whereas aromas in the spicy/other category were associated with weak lower partials (fundamental frequency in particular) and stronger noisy components.
Introduction
It is generally accepted that sound and music are capable of conveying meaning that may take different forms. One common differentiation is made between intra-musical and extra-musical meaning (Patel, 2008). The former represents meaningful information that arises from statistical learning and the subsequent formation of expectations due to long-term musical exposure (e.g., Krumhansl, 2015). The latter concerns referential associations between musical entities and extra-musical concepts such as, for example, density (e.g., Noble et al., 2020), tension (e.g., Farbood, 2012; Farbood & Upham, 2013; Huron, 2006), intimacy (Huovinen & Kaila, 2015), or various kinds of visual imagery (e.g., Hashim et al., 2023). Koelsch (2011) further organizes extra-musical meaning into three subcategories, namely, iconic meaning in the case where musical qualities are linked to objects or abstract concepts through similes and metaphors; indexical meaning when musical patterns indicate a psychological state; and symbolic meaning that is conveyed through cultural and social associations (e.g., Christmas carols or a national anthem).
The fundamental building block of music is the musical tone, which may or may not have pitch but is certainly associated with some loudness, perceived duration, and timbral qualities. There is now ample evidence, collected from both neuroimaging methods (e.g., Painter & Koelsch, 2011) and behavioral approaches (e.g., Zacharakis et al., 2014), that even out-of-context isolated musical tones can carry extra-musical semantic information. Arguably, the study of timbral semantics can be traced back to the pioneering work of von Helmholtz (1877), but interest in the subject has grown substantially over the past 20 years. This has led to an accumulation of knowledge around the extra-musical concepts that we tend to associate with timbre. The subject has been approached from several viewpoints, such as comparing data from different languages (Zacharakis et al., 2014), comparing perceptual with semantic ratings (Samoylenko et al., 1996; Zacharakis et al., 2015), analyzing descriptions from corpora on orchestration (Wallmark, 2019a), and creating semantic profiles of imagined instrumental timbres (Reymore & Huron, 2020; Reymore, 2022). The increased interest in the extra-musical meaning of timbre is reflected by the recent book chapter on the semantics of timbre by Saitis and Weinzierl (2019), which offers a comprehensive review of the current literature. In the closing remarks of this chapter, the study of cross-modal correspondences is proposed as a promising path for further investigation into the extra-musical meaning conveyed by timbre, potentially providing deeper insight into mechanisms of human semantic processing.
Associations between auditory attributes and other modalities have already been well documented. For example, pitch and loudness have both been related with spatial height (Ben-Artzi & Marks, 1995; Bernstein & Edelstein, 1971), visual brightness (Klapetek et al., 2012; Marks et al., 1987; Marks, 1987, 1989), size (Eitan et al., 2014; Gallace & Spence, 2006), tactile sensation (Eitan & Rothschild, 2011), and even gustatory (Mesz et al., 2011; Wang et al., 2015) or olfactory qualities (Belkin et al., 1997; Crisinel & Spence, 2012a). In addition, higher-level musical characteristics such as harmonic dissonance have been associated with visual roughness (Giannos et al., 2021).
At the same time, cross-modal metaphors for musical timbre description have long been utilized, as shown by the analysis of past orchestration treatises (Wallmark, 2019a). Behaviorally accumulated evidence has further supported the widespread existence of cross-modal adjectives in timbre semantics. In particular, the use of metaphors inspired by touch and vision is now strongly backed by a multitude of findings that originate from several different languages (Lichte, 1941; Pratt & Doak, 1976; Rosi et al., 2023; Štĕpánek, 2006; von Bismarck, 1974; Zacharakis et al., 2014; Zacharakis & Pastiadis, 2016). Besides these indirect indications, several more recent works have directly investigated cross-modal associations with respect to timbre. Adeli et al. (2014) expanded the kiki-bouba paradigm (Köhler, 1929; Ramachandran & Hubbard, 2003; Ramachandran et al., 2020) in musical timbres and observed that softer timbres corresponded to rounded shapes with blue or green colors, while harsher timbres were linked to angular shapes with red or yellow colors. Wallmark and colleagues explored cross-modal associations of timbre in a series of recent studies. Combining neuroimaging and behavioral methods with acoustic analysis, Wallmark et al. (2018) initially showed that noisy timbres that are often described through tactile metaphors (e.g, rough, harsh, coarse), activated somatosensory regions responsible for tactile processing. Then, Wallmark (2019b) applied a Stroop-type speeded classification and reported interference between word–timbre presentations of roughness and brightness. Subsequently, Wallmark et al. (2021) expanded this finding further by identifying interference between visual and timbral brightness. In addition, strong haptic–auditory correspondences were demonstrated for preschool children, with visual-–auditory associations proven to be less systematic and more dependent on the developmental stage (Wallmark & Allen, 2020).
The above works support the existence of correspondences between timbral characteristics and tactile or visual attributes. The mechanisms proposed to explain the origin of such effects include activation of common brain structures in response to properties of cross-modal stimuli, statistically learned environmental associations, affective similarities, and mediation through shared semantic descriptions (Spence, 2011, 2020a). Out of all the senses, vision seems to gather the highest number of descriptive terms (at least in the English language) (Majid & Kruspe, 2018). In contrast, Winter (2019) argues that olfactory experiences are the most difficult to express lexically (i.e., ineffable), followed closely by gustatory experiences, and then auditory experiences.
The apparent ineffability of smell, taste, and sound sets an intriguing context for the exploration of crosstalk between them. Indeed, over the past 15 years, there has been a growing interest in the correspondences and interactions between the chemical senses (i.e., gustation and olfaction) and musical parameters. Knöferle and Spence (2012) summarize the state of the art regarding identified mappings between musical parameters and basic tastes up until 2012 (some of the most notable studies being Bronner et al., 2012; Crisinel & Spence, 2009, 2010a, 2010b, 2012b; Mesz et al., 2011; Simner et al., 2010; Wang et al., 2021). Despite some expected inconsistencies between experimental data, there seems to be a consensus that sweetness is related to soft consonant sounds and chords, legato articulation, slow tempo, and low roughness. The pitches reported for sweetness vary from average to high. Sourness, in contrast, is associated with high pitch, staccato articulation, fast tempo, and dissonance. Bitterness is consistently associated with low pitch and high roughness. The latter is also positively linked with saltiness, which additionally correlates with sound discontinuities, long decay times, and regular rhythmic patterns. It must be pointed out that timbre is not explicitly mentioned in most of these studies and is often treated either as a source category (i.e., piano, brass, and woodwind) or as a semantic dimension (i.e., roughness and sharpness). The findings of a subsequent study by Guetta and Loui (2017) are in accordance with the above and confirm the existence of systematic associations between auditory and gustatory stimuli of varied complexity. The findings from these studies have significantly influenced subsequent research on sound–taste correspondences, exploring the potential impact of background music on taste perception (e.g., Carvalho et al., 2015; Carvalho et al., 2017; Crisinel et al., 2012; Spence, 2021c; Wang et al., 2015; Wang & Spence, 2016), what Charles Spence has coined as “sonic seasoning.”
The existing evidence on sound–taste mappings largely concerns the four (or five with occasional inclusions of umami) basic tastes and not specific flavors. When it comes to olfaction, however, such basic categories for odor classification are less prominent. Instead, odor naming is mostly based on resemblance to a certain source (e.g., smells like freshly cut grass), and the precision of odor identification is low (Agapakis & Tolaas, 2012; Majid & Burenhult, 2014; Speed et al., 2021). It has been suggested that the ability of humans to talk about odors in abstract terms may have been lost in urbanized societies, since communities of hunter-gatherers are more capable of it in comparison to English speakers (Majid & Burenhult, 2014; Majid & Kruspe, 2018). That said, there do exist general categories of scents that are based on properties of the source (e.g., fruity, floral, spicy, earthy, sweet, and green), analogous to how source characteristics largely dictate timbre perception. Efforts to organize the wealth of odor descriptions into more parsimonious models for odor classification are intended to facilitate communication and have been attempted in perfumery (e.g., Zarzo & Stanton, 2009), wine (Noble et al., 1984), beer (Meilgaard et al., 1982), whiskey (Piggott & Jardine, 1979), and even wastewater (Burlingame et al., 2004) among other disciplines. It is worth mentioning that odor perception employs similar methods to timbre perception, such as similarity ratings and semantic differential (Kaeppler & Mueller, 2013), and interestingly, odor semantics feature some overlap with timbre semantics (e.g., warm, rich, smooth, clear, in Zarzo & Stanton, 2009). In yet another analogy with timbre, odor perception is multidimensional (often represented through two to four-dimensional spaces), and existing classification systems are still imperfect (Kaeppler & Mueller, 2013).
The above commonalities between sound and odor perception raise the question of whether they are also reflected in some type of cross-modal associations. The 19th-century perfumer Piesse (1891) was probably the first to attempt a connection between essential oil scents and the pitch of musical notes. Some empirical backing came many decades later from the pioneer music scholar von Hornbostel (1931), who reported a match between odors and tuning fork tones. At the end of the 20th century, Belkin et al. (1997) pursued an empirical odor–pitch association for odor classification purposes. Their experimental data suggested that odor–pitch associations were systematic and could not be attributed to the olfactory dimensions of intensity or pleasantness. On the contrary, the authors speculated that the identified pitch–odor mappings could have been based on underlying semantic dimensions of olfaction (e.g., hard–soft, heavy–light, or bright–dark) and call for a subsequent investigation on potential correspondences between timbral and odor qualities. A more recent study by Crisinel & Spence (2012a) confirmed the existence of systematic associations between odors (associated with wine) and pitch while providing some evidence for a link with source-cause timbral properties. In particular, they reported a tendency of people to associate fruity aromas with higher-pitched sounds and confirmed that aromatic intensity did not seem to correspond to pitch but may instead correspond to timbre (e.g., a trend between higher aromatic intensities and brass instruments was observed).
Considering the semantic commonalities between timbral and aromatic qualities, and drawing on indications provided by the work of Crisinel & Spence (2012a), this study seeks to examine potential timbre–aroma correspondences more closely. The goal of the current approach is to minimize the coupling between timbre and source-cause, acknowledging that total decoupling may not be entirely feasible. To this end, I have created complex synthesized sound stimuli instead of familiar instrumental samples. I report here the results of an experiment whereby the task of the evaluation panel was to match (if possible) each sound stimulus with any of the 12 provided aromas in the form of essential oils. Through this experimental design, I aimed to address two fundamental questions: Is it possible to come up with systematic timbre–odor associations in accordance with previous evidence on pitch-odor connections? And if so, can an interpretation of cross-modal relationships in terms of acoustic properties (represented through audio descriptors) be achieved? That is, can there be identified acoustical correlates of scents?
The motivation for these pursuits originates from one long-term goal: to gain a deeper understanding of the already documented influence of sounds and music on gustatory and olfactory experiences, such as, for example, wine tasting (Spence & Wang, 2015a, 2015b, 2015c; Spence, 2020b). The underlying premise is that multisensory congruence positively contributes to the brain's processing fluency, thereby enhancing overall pleasure (Spence, 2021b). Therefore, being able to define congruence or incongruence between the modalities under question is of paramount importance for such an investigation.
The following section elaborates on methodological decisions, including the selection of sound stimuli and aromatic variables, and outlines the experimental procedure. The results section introduces the assessment of statistical significance for sound–odor correspondences, underscores noteworthy occurrences, and adduces a few representative aromatic profiles of sounds. Additionally, this section presents the identified acoustical correlates of both specific aromas and aromatic families. The paper concludes by contextualizing the primary findings within the existing literature, discussing the limitations of this study, and suggesting potential future directions.
Method
The experiment described below aimed to explore potential associations between auditory and olfactory stimuli by asking participants to report potential correspondences between presented sound stimuli and a number of aromas.
Sound Stimuli and Apparatus
The sound stimulus set consisted of 26 complex synthetic tones that were created through various combinations of sound synthesis (frequency modulation, amplitude modulation, wavetable, additive, and granular synthesis) and/or sound processing (filtering, reverb, delay, phasing, etc.) implemented using Ableton Live. I created these sounds attempting to sonically represent a wide range of aromas that are typically found in wine. This approach was favored over an exposition to familiar instrumental timbres, firstly to minimize (as much as possible) source-cause category influences on the auditory end (Siedenburg, 2017) and secondly to maintain the freedom to create timbres with desired characteristics. During the sound synthesis process, I conformed with some of the basic guidelines offered by the literature (Crisinel & Spence, 2012a; Crisinel et al., 2013; Deroy et al., 2013; Spence, 2021a) concerning identified correspondences between timbral qualities and olfactory properties. However, as presented in the introduction, the majority of existing evidence concerns pitch–odor correspondences, while timbre–odor relationships are largely approached as associations between aromas and specific source-cause categories (i.e., musical instrument classes), with limited suggestions for acoustical correlates. As a consequence, numerous sound synthesis decisions were guided by personal impressions formed during exposure to the designated aromatic stimuli, introducing a considerable degree of subjectivity. The number of sound stimuli (26) was selected to be larger than the aromatic variables (12, see below) to avoid a one-to-one correspondence experimental setup. In addition, the higher the timbral diversity within the stimuli, the likelier the acquisition of unexpected auditory–olfactory associations should be. As a result, many of the sounds within the stimulus set were designed having in mind scents that were not represented by the aromatic variables. At the same time, due to the exploratory nature of this study, two of the aromas (melon and pomegranate) had two sonic candidates. The stimulus duration ranged from 6 to 12 s, while the pitch also varied, ranging from G2 (98 Hz) to G5 (784 Hz). Several of the stimuli comprised tone combinations and complex temporal fluctuations.
The sound stimuli were delivered via a MacBook Pro laptop (Apple Computer, Inc., Cupertino, CA), utilizing a custom-built graphical user interface in Max/MSP for stimulus playback and data acquisition. Listeners were presented with the sound stimuli binaurally using Beyerdynamic DT-880 PRO headphones (250 Ohm). Loudness was equalized across all stimuli at a comfortable playback level through informal listening tests. This resulted in RMS levels between 65 and 75 dB SPL (A-weighted, slow response). The sound stimuli are available in the supplementary material. At this point, it should be noted that the sound labels employed throughout the manuscript derive directly from the names of their intended aromatic counterparts. Given that these sounds lack a physical source, this methodology was considered more advantageous than simply labelling the stimuli as S1–S26. This approach allows the reader to be informed about the intended aromatic target corresponding to each sound.
Aromatic Variables
Twelve aromatic variables were selected to reflect distinctive aromas present in the profiles of three different wines (one white, one rosé, and one red). This selection served the future objective of composing congruent music tailored to specific wine profiles as a means to investigate multisensory perception. The aromatic variables were introduced using small glass bottles (5 ml) sealed with a plastic screw cap, each containing a piece of cotton on the inside. The cotton in each bottle was moistened with 3 drops from a selection of 11 different essential oils, namely, vanilla, honey, caramel, cinnamon, coffee, (black) pepper, lemon, lemon blossom, pomegranate, melon, and cherry. No satisfactory essential oil representative was found for our 12th selected aromatic variable, tobacco; therefore tobacco leaves enclosed in a small plastic container with a screw cap (8 ml) were provided as a stimulus. The presentation of real aromatic stimuli ensured that all participants shared the same olfactory references, as past research has shown that imagining a stimulus may result in different cross-modal associations compared to actually experiencing it (Bronner et al., 2012; Zarzo & Stanton, 2009). In contrast to the approach for the sound stimuli, we opted to facilitate the source identification of aromatic variables due to the large number of provided options. Thus, a label indicating the aroma contained in each glass bottle was provided on its placement base.
Participants
A convenience sample of 29 participants 1 with formal musical training (mean age: 22.5 years, age range: 19–41 years, 19 females) took part in a listening experiment. The majority were students at the Aristotle University of Thessaloniki and received course credit compensation for their participation.
Procedure
Each participant listened to the 26 stimuli in random order and their task was to associate —if possible— each sound with any of the 12 provided aromas. The participants were completely naive regarding the intended aromatic associations for each sound stimulus. Each sound could be associated with as many aromas as desired by providing a strength-of-association value (hidden scale: 0–100). Participants were initially instructed to experience each scent by sequentially opening the corresponding glass bottles before listening to the sound stimuli. They were encouraged to revisit specific scents only if they deemed it necessary while forming associations between scents and sounds. This instruction aimed to protect their olfactory system from sensory overload by limiting the overall number of times they experienced each different scent. An additional instruction was to ignore possible influence stemming from conscious higher-level connections between sounds and concepts related to the source of the aromas (e.g., “this sounds like a buzzing bee therefore it has to be associated with the scent of honey”). Participants were instead encouraged to base their judgements strictly on potential sensory connections as much as possible.
Results
The analysis of the data had two primary objectives. The first was to examine whether above-chance associations could be identified between certain sound stimuli and some of the aromatic variables. The second was to uncover acoustical correlates of general aromatic categories.
Analysis of Responses
To examine which of the observed effects were statistically significant, a bootstrapping approach that created random distributions through computational simulation of the experimental conditions was adopted. This included 29 virtual raters evaluating 26 objects on 12 variables. The acquired behavioral data indicated that real participants associated each auditory stimulus (i.e., object) with 2.13 aromatic variables on average. Therefore, for each computational evaluation, the virtual raters were set to randomly select 2 out of the 12 aromatic variables and indicate the strength of association by randomly assigning a value drawn from a uniform distribution with a range between 0 and 100. The above scenario was permuted 1000 times, and the distributions of selection frequencies and descriptive statistics for a variable under these circumstances were obtained. The 95th percentile of the number of raters that associated a variable with a certain stimulus (even with a minimal value) through this simulation was 8 (out of 29). This translates into an above-chance effect (

Example of the data gathered for two indicative sound stimuli. Horizontal axes depict the 29 participants, rank-ordered based on the magnitude of their responses as shown in the vertical axes (range 0–100). Three of the variables (vanilla, caramel, and melon) for the sound intended to resemble the caramel aroma (on the left) were selected above chance (
Returning to the statistical significance of the effect size, in essence, every percentile above the 73th (
Figure 2 shows the 85th percentiles of the scores of the 26 stimuli on each of the 12 aromatic variables. With the exception of cinnamon and pomegranate, the remaining 10 aromatic variables featured at least one statistically significant association with one of the sound stimuli.

Bar graphs depicting the 85th percentiles of the acquired associations (vertical axes) between the 12 aromas and each of the 26 sound stimuli (horizontal axes). The horizontal red line at 62 signifies the threshold for statistical significance (
Acoustical Correlates of Aromatic Categories
Since several statistically significant correspondences between the sound stimuli and the provided aromas were observed, I proceeded to identify acoustical correlates of specific scents and groupings of more general scent categories. A correlational analysis (Spearman's
Spearman's rank correlation coefficients between ratings of stimuli on aromatic variables and audio features extracted using the Timbre and MIR Toolboxes. The ratings were represented as the median value of the non-zero ratings for each sound–aroma pair weighted by the number of raters. The aromatic variables are grouped into more general categories to facilitate interpretation.
Effect size: (
The behavioral ratings were represented as the median value of the non-zero registrations for each sound–aroma pair weighted by the relative number of raters (i.e., no. of registrations
Figure 3 presents four representative sound stimuli that received statistically significant ratings in each of the four general aromatic categories of Table 1. In some accordance with Table 1, the stimulus that received a statistically significant rating in the sweet category (i.e., vanilla) features energy concentrated in the lower partials (i.e., lower spectral centroid). In contrast, the stimulus that was rated highly in the spices/other category (i.e., pepper) features stronger high energy and noisy content, lower pitch, and longer duration (i.e., higher temporal centroid). The representative of the sour category (i.e., lemon) is a sound with both high pitch and strong high-energy content, some modulation, and the presence of non-harmonic partials. Finally, the representative of the fruit category (i.e., Melon) is a sound with its highest energy concentrated in the low frequencies and in the 2nd harmonic in particular (i.e., Tristimulus 2). There is a complete lack of non-harmonic components (i.e., strong harmonic to noise energy), but this specific sound also features a major third interval (note a distinct harmonic series initiating from the 3rd visible partial), highlighting the weak positive correlation with the number of distinct pitches listed in Table 1.

Four examples of correspondences between the aromatic profiles and the spectrograms of highly rated sound stimuli in each of the four aromatic categories of Table 1. The radar plots display the 85th percentile of the response distributions. The red line corresponds to the 62/100 level of statistical significance, as detailed in the subsection Analysis of Responses. The labeling of the stimuli stems from the intended aroma that each synthetic sound was meant to resemble.
Figure 4 shows the dendrogram resulting from a hierarchical cluster analysis (Ward's method, distance metric: Spearman's

Dendrogram from hierarchical cluster analysis (Ward's method, distance metric: Spearman's correlation) applied to the 12 aromatic variables. One major cluster encompasses the majority of fruity and sweet aromas. Spices and others form a distinct group, while lemon and lemon blossom form a two-member cluster that is loosely linked to the fruit/sweet one.
Discussion
The present study sought to enhance our understanding of correspondences between timbre and scent. This subject has not yet received extensive attention despite its potential applications in the sonic representation of odors for marketing, the well-being industry, and even artistic expression. Current findings suggest that it was possible to obtain reliable cross-modal associations between complex timbres and aromas. Ten out of the twelve aromas under study featured a statistically significant association with at least one sound stimulus (see Figure 2). This outcome is particularly impressive given the complexity of the task at hand. It is worth noting that not only were there numerous aromatic variables to choose from (12 in total), but some of them were closely perceived by participants, such as lemon with lemon blossom or cherry with melon (see Figure 4). Taken together, these results provide several adequate sonic representations for many of the scents under study. This was one of the major objectives of this work for informing future cross-modal experimental designs. Obtaining validated sonic representations is crucial for distinguishing congruent from incongruent properties between audition and olfaction—an essential step in investigating how sound and music may influence the perception and appreciation of scents. This result supports the notion that iconic extra-musical meaning, as proposed by Koelsch (2011), may also encompass an olfaction-related component. Moreover, given that pleasantness is a prominent olfactory characteristic, iconic and indexical (i.e., signaling an affective state) forms of extra-musical meaning may get intertwined in the context of sound–aroma correspondences, in line with the proposed mechanism of affective similarity for cross-modal correspondences. Thus, a sound may also be associated with an aroma based on the emotion (e.g., pleasantness) it commonly evokes.
At the same time, the correlational analysis between aromatic variables and audio features extracted from the sound stimuli provided more general insight into possible acoustical correlates of olfactory properties. To facilitate a more general interpretation of findings, the aromatic variables were organized into four categories, i.e., sweet, spices/other, sour, and fruit. It might have been apparent from the introduction that odor classification is a non-trivial task, and consequently, there is no universally accepted taxonomy (Kaeppler & Mueller, 2013). Therefore, the source cause classification I opted for could be challenged. After all, lemon is a fruit, spices constitute a very broad category, tobacco and coffee are—strictly speaking—not spices, and lemon blossom is a flower. However, both the cluster analysis of the aromatic variables and the audio features correlated with members of each category portray a relatively cohesive picture (see Figure 4 and Table 1) and support the selection of this classification.
In agreement with existing evidence (Crisinel & Spence, 2012a; Ward et al., 2021), both sour and fruit categories were positively correlated with pitch. They were, however, differentiated acoustically through several features. Fruity aromas (e.g., cherry, melon, and pomegranate) were more likely to be associated with sounds featuring higher energy concentration between the 2nd and 4th harmonics (i.e., Tristimulus 2), stronger harmonic-to-noise energy ratios, lower roughness, and a stronger presence of distinct pitches. On the contrary, sour exhibited a positive correlation with distribution of energy at the higher frequencies (i.e., higher spectral centroid values), aligning with previous findings in Knoeferle et al. (2015) and Simner et al. (2010). Sweet scents were associated with a stronger concentration of energy in the lower harmonic partials (i.e., lower spectral centroid and positive spectral skewness) and indications for smoother attacks (i.e., smaller attack slopes), supporting Mesz et al. (2011) and Bronner et al. (2012). The spices/other category was characterized by lower pitch, energy concentration toward the higher harmonic partials (i.e., negative correlations with Tristimulus 1 & 2 as opposed to positive correlations with Tristimulus 3), more noisy timbres (i.e., lower harmonic to noise energy ratio), higher roughness, less smooth spectral envelopes (i.e., higher odd even ratio and harmonic spectral deviation), slower attacks (i.e., positive correlations with attack time and slope), and longer-lasting sounds (i.e., higher temporal centroid). Many of these findings align with prior research (Knöferle & Spence, 2012; Ward et al., 2021), especially when considering the spices/other category as a potential equivalent to bitterness, given the inclusion of coffee and black pepper. In particular, the combined observation of a positive correlation with lower pitch and weaker concentration of energy in the low partials may offer a partial explanation for certain discrepancies highlighted in the literature concerning the acoustic correlates of bitterness. Indeed, Knoeferle et al. (2015) reported a negative association between bitterness and spectral centroid, while more recent work by Wang et al. (2019) observed a positive correlation between perceived bitterness in wine tasting and the spectral centroid of background music. The current findings suggest that a combination of lower spectral centroid (due to low pitch) with a spectral balance skewing toward the higher partials may contribute to the correspondence between sound characteristics and bitterness perception. The preceding discussion interchangeably refers to studies that have examined either sound–taste or sound–odor associations. Combining evidence from both perspectives is warranted since taste and smell often share common properties due to associative learning formed by past experiences (Stevenson et al., 1995; Stevenson & Boakes, 2003). Thus, a lemon scent may partially evoke a sour sensation, a caramel scent may evoke sweetness, and so on. In addition, similarly to previous literature on sound–aroma correspondences, the majority of the aromatic stimuli used in this experiment had a straightforward link with taste. Therefore, the current data are not useful for either rejecting or confirming the “indirect hypothesis” proposed by Deroy et al. (2013), according to which the origin of observed olfactory–auditory mappings may lie in a common connection to gustatory properties.
Limitations and Future Work
While contributing to and augmenting the limited literature on sound–scent correspondences, the present study does have certain limitations. First, the auditory stimuli were presented in more controlled conditions and analyzed more thoroughly (i.e., extraction of audio features) compared to the olfactory ones. Indeed, the sound stimuli were equalized for loudness, but the same was not the case for the intensity of the aromatic variables. Although three drops were used from each essential oil, this did not guarantee equal perceived intensity (due to potentially different concentrations and/or sensitivities), not to mention that tobacco was presented in the form of leaves. Based on the findings by Crisinel & Spence (2012a), there seems to be no clear reflection of aromatic intensity in scent–pitch correspondences. However, a potential association with timbral qualities in the form of source-cause categories was implied. In general, the same scent may feature different perceived qualities because of differences in concentrations (Kaeppler & Mueller, 2013). Expanding on this, future work should not only adopt a more controlled approach toward equalized perceived aromatic intensity but should also seek to link acoustic characteristics of sound stimuli with chemical substances for each scent as opposed to the elementary aromatic categorization of the current approach. A similar approach has been adopted to offer chemical correlates for a range of cross-modal associations centered around olfaction through the use of an electronic nose (Ward et al., 2020, 2022). Hence, the source-cause category caveat may be mitigated not only for auditory objects but also for olfactory ones.
In the same vein, the current design introduces the possibility of semantic mediation effects, since aromatic variables were labelled to explicitly indicate their source. The degree of influence of source semantics on odor perceptual processing and classification is a subject of open debate (see, Kaeppler & Mueller, 2013), while it has also been supported that knowledge of an odor's identity can affect its association with emotions and music genres (albeit not with shape angularity, pitch, smoothness of texture, or perceived pleasantness) (Ward et al., 2021). Thus, it is not unlikely that such effects may have somewhat affected the reported associations. However, the number of provided scents (12) was high enough and their similarities were quite close (in some cases) to warrant the inclusion of labels to reduce noise in the acquired behavioral data. In any case, participants were instructed to focus on the provided aroma and not on a prototypical representation of this category that they may have had in mind, even if these two were not entirely aligned. While acknowledging all these caveats, the presentation of actual olfactory stimuli as a common reference still leads to a more robust experimental design compared to the alternative of simply asking participants to imagine scent sensations based on memory, according to current evidence (Bronner et al., 2012; Zarzo & Stanton, 2009).
Overall, this study not only corroborated certain prior findings regarding sound–scent correspondences but also contributed a more nuanced perspective by examining timbre features. Nevertheless, there is substantial untapped potential for further exploration and application of cross-modal correspondences between audition and olfaction. For one thing, the cross-modal data reported here originated from an auditory-oriented population. It has been suggested that odor familiarity influences perceived pleasantness (Crisinel & Spence, 2012a), a salient dimension in odor perception. Furthermore, Wang et al. (2015) has found that taste perception associated with certain musical characteristics can vary between musically trained participants and musical novices. While the aromas examined in this context were relatively common, it would be intriguing to compare the perceptual aromatic organization (that resulted from sonic correspondences) identified by musicians with an equivalent organization established by a panel specializing in olfaction.
Several diverse experimental paradigms could also be considered. A slight variation would be to limit the aromatic variables to a significantly smaller number without source labelling to attenuate possible semantic mediation effects. Moreover, given the acquired sound–odor relationships, a comparison of perceptual spaces resulting from pairwise dissimilarity ratings for both sound and scent groups would offer a purely perceptual standpoint along with acoustical and chemical correlates. This approach could be also complemented by a semantic description of sounds and odors to facilitate interpretation and to explore possible common forms of iconic and indexical meaning. Alternatively, adopting a different approach altogether would involve utilizing the priming experimental paradigm to assess potential effects on odor perception induced by listening to a prior sound and vice versa. Priming with sound has already been demonstrated to affect chocolate taste ratings, albeit concurrent presentation was found to exert even greater influence (Wang et al., 2020). Finally, an experimental set-up resembling the approach of musical improvisation in response to olfactory stimuli by (Mesz et al., 2023) could also be adopted for timbre. In such a scenario, participants would be able to adjust the parameters of a sound synthesizer in real time to achieve the best possible matching of a timbre with a target scent. A combination of all the above research paths could lead to a more comprehensive understanding of the rules governing the crosstalk between audition and olfaction. This may, in turn, contribute to a stronger scientific foundation for the burgeoning real-world applications of auditory–olfactory interplay—already underway in domains such as marketing (Spence et al., 2021; Spence & Keller, 2024), rehabilitation, education, art, and entertainment (Spence, 2021a).
Supplemental Material
sj-zip-1-mns-10.1177_20592043241274258 - Supplemental material for Sonic Bouquet: Decoding Cross-Modal Correspondences Between Timbre and Scent
Supplemental material, sj-zip-1-mns-10.1177_20592043241274258 for Sonic Bouquet: Decoding Cross-Modal Correspondences Between Timbre and Scent by Asterios Zacharakis in Music & Science
Supplemental Material
sj-zip-2-mns-10.1177_20592043241274258 - Supplemental material for Sonic Bouquet: Decoding Cross-Modal Correspondences Between Timbre and Scent
Supplemental material, sj-zip-2-mns-10.1177_20592043241274258 for Sonic Bouquet: Decoding Cross-Modal Correspondences Between Timbre and Scent by Asterios Zacharakis in Music & Science
Footnotes
Acknowledgments
I am grateful to Ioulieta Michail for her assistance in conducting the experiments and for carrying out a thorough literature review on correspondences between taste/aromas and sound. I would also like to thank the participants involved in the experiments, as well as the Action Editor and both reviewers for their valuable assistance in improving this manuscript. Finally, I appreciate Vasilis Paras' help in creating the sound stimuli using Ableton Live.
Action Editor
Zachary Wallmark, School of Music and Dance, University of Oregon,
Peer Review
Charles Spence, Department of Experimental Psychology, Oxford University; Caroline Traube, Faculté de Musique, Université de Montréal.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This experiment received ethical approval from the Aristotle University ethics board. Participants provided written informed consent acknowledging their voluntary participation in the study, understanding the purpose, procedures, potential risks and benefits, confidentiality measures, and their right to withdraw at any time. They also agreed to the storage, management, and, if applicable, anonymized sharing of their data for research purposes.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
The raw data collected for this experiment are available in the supplementary material as 29 .xlsx files. Each file represents the associations reported by one of the 29 participants between each sound stimulus and the 12 aromatic variables. The supplementary materials also include the 26 sound stimuli as .wav files.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
