Abstract
There has long been interest in the nature of the relationship(s) between hue and pitch or, in other words, between colour and musical/pure tones, stretching back at least as far as Newton, Goethe, Helmholtz, and beyond. In this narrative historical review, we take a closer look at the motivations that have lain behind the various assertions that have been made in the literature concerning the analogies, and possible perceptual similarities, between colour and sound. During the last century, a number of experimental psychologists have also investigated the nature of the correspondence between these two primary dimensions of perceptual experience. The multitude of different crossmodal mappings that have been put forward over the centuries are summarized, and a distinction drawn between physical/structural and psychological correspondences. The latter being further sub-divided into perceptual and affective categories. Interest in physical correspondences has typically been motivated by the structural similarities (analogous mappings) between the organization of perceptible dimensions of auditory and visual experience. Emphasis has been placed both on the similarity in terms of the number of basic categories into which pitch and colour can be arranged and also on the fact that both can be conceptualized as circular dimensions. A distinction is drawn between those commentators who have argued for a dimensional alignment of pitch and hue (based on a structural mapping), and those who appear to have been motivated by the existence of specific correspondences between particular pairs of auditory and visual stimuli instead (often, as we will see, based on the idiosyncratic correspondences that have been reported by synaesthetes). Ultimately, though, the emotional-mediation account would currently appear to provide the most parsimonious account for whatever affinity the majority of people experience between musical sounds and colour.
Introduction
For centuries now, commentators working in a wide variety of research fields have speculated on the possible (privileged) relationship that might exist between colour and sound (e.g., Goethe, 1840; Helmholtz, 1867; Newton, 1704). In fact, the intuition that there was a meaningful correlation between the musical scale and the rainbow spectrum of hues can be traced all the way back to ancient Greek philosophers such as Pythagoras (Moritz, 1997) and to Ptolemeus Alexandreus (Sebba, 1991). In his theorization concerning the senses, Aristotle also hypothesized the existence of a link between colour harmonies and musical proportions, speculating that the effect of colour combinations to the eye might depend upon the same numerical proportions as the musical sounds (Aristotle, 1908, III 439b–440a). In the contemporary era, such audiovisual crossmodal correspondences, as they are now more commonly known (see Spence, 2011), have started to fascinate a new generation of experimental psychologists (e.g., Adeli et al., 2014) and music theorists (e.g., Eitan, 2017). There has also been interest in colour-sound correspondences from those interested in the design of more intuitive sensory substitution systems (e.g., for those who are visually impaired; Hamilton-Fletcher et al., 2016; Marks, 1983). 1 However, while some commentators have chosen to emphasize the similar structure of the perceptual spaces carved out by simple visual and auditory sensory experiences (Pridmore, 1992), others have instead focused their attention on the putative perceptual similarity between specific combinations of sensory inputs (e.g., see Kandinsky, 1977). The latter highlighted, for example, by the title of an article by Ione and Tyler (2004): ‘Is F-sharp coloured violet?’
The psychological affinity between simple auditory and visual stimuli can be further sub-divided into at least two distinct categories, namely perceptual and affective similarity (the latter, possibly linking to Kandinsky’s, 1977, notion, of ‘inner harmony’; see Spence & Di Stefano, 2022). While interest in the crossmodal correspondences between sound and colour initially appears to have been motivated by intuitions based around the structural and physical similarity (these so-called analogical mappings) between the two senses, the absence of any obvious phenomenological sense of perceptual similarity has led to the emergence of emotional mediation as a likely ‘glue’ linking (if not necessarily binding) the stimuli presented in this particular pair of sensory modalities. At the outset, though, it is important to consider the various perceptual dimensions underlying our experience of auditory and visual stimuli, as this has also proven to be a major motivator for empirical research, as well as a significant constraint on the associations that have been proposed to date. At the same time, however, and as we will come back to later in this narrative historical review, when considering the topic of ‘colour music’, it is important to recognize that as soon as auditory and visual stimuli are embedded within spatial and/or temporal patterns, as they typically are in works of art (and music), the likelihood is that intramodal perceptual grouping will completely override any weak crossmodal perceptual alignment that might otherwise have been detectable (see Sebba, 1991, on this point; and see Spence, 2015, for a review). In much the same way, the affective/mood associations that have been established between specific simple sensory stimuli when presented in isolation will likely also change as a result of any intramodal perceptual grouping that might occur. For instance, consider here only how the affective meaning of musical scales has been shown to depend on direction, i.e., whether they are ascending or descending (e.g., Collier & Hubbard, 2001).
Metathetic, Prothetic, and Circular Dimensions
More than half a century ago now, Stevens (1957) highlighted an important distinction between ‘metathetic’ and ‘prothetic’ stimulus dimensions. Loudness, brightness, lightness, heaviness, duration, roughness, area, and apparent length etc., are all prothetic dimensions, with a clear ‘more than’ and ‘less than’ end. 2 Stevens refers to these as quantitative perceptual continua (i.e., having to do with how much) with ratio properties. According to Panek and Stevens (1966), the saturation of red can also be arranged as a prothetic continuum. Metathetic dimensions, by contrast, obey a well-structured organization without necessarily having a ‘more than’ or ‘less than’ end. In adults, prothetic dimensions tend to possess a unitary and well-ordered psychophysics, whereas metathetic stimulus dimensions do not. Stevens classified the latter perceptual continua as ‘what kind’ or ‘where (position)’. Relevant here, both auditory pitch and hue are metathetic dimensions, as are visual position, inclination, and proportion, according to Stevens. Smith and Sera (1992) subsequently added shape to the list. Interestingly, however, pitch and hue can also both be represented as circular dimensions. In fact, according to Pridmore (1992), they are the only perceptual dimensions that can be so represented (see Table 1 for a summary).
Putative organizational principles underpinning various dimensions of perceptual experience, and supporting references.
Putative organizational principles underpinning various dimensions of perceptual experience, and supporting references.
According to Marks (1987), while visual lightness is a prothetic dimension, it is extremely unusual in as much as which end of the dimension is associated with 'more', and which with 'less’, varies between individuals. Note that a given dimension of perceptual experience (e.g., brightness or pitch) can appear in more than one organizational category.
Polar dimensions should also be considered here. According to Clark (1970), the latter are fundamental to children’s development of relational concepts. Dimensions such as big vs. little, high vs. low, but presumably also left vs. right, sweet vs. sour/bitter are all polar. Notice here how a polar organization can be applied to prothetic dimensions (big vs. little; spatial high vs. low), to metathetic dimensions (such as left vs. right; high vs. low pitch), and to pairs of stimuli that cannot be organized as part of either prothetic or metathetic dimensions, such as between pairs of basic tastes (though see Crisinel et al., 2012; and Watson & Gunter, 2017) or olfactory stimuli. Polar dimensions can presumably be aligned with other polar dimensions although which poles people choose to align may well sometimes turn out to be arbitrary. Intriguingly, certain of the basic tastes have recently been shown to correspond to the polar dimension of elevation (e.g., high vs. low; see Velasco et al., 2019).
While it is undoubtedly possible to establish an analogy between pitch and hue at more of a cognitive, or abstract, level, and by so doing relate, or figure out, the relative position of auditory and visual stimuli along their respective unisensory dimensions (cf. Cohen, 1934; Mellers & Birnbaum, 1982; Moul, 1930; Simpson et al., 1956, p. 100), 3 there is little empirical evidence supporting the existence of any kind of perceptual ‘resonance’ (cf. Muecke & Zach, 2007), or unification 4 that necessarily follows from the correct structural alignment of stimulus dimensions (e.g., Sebba, 1991). That is, the dimensional alignment of pitch and hue does not necessarily lead to any perceptually apparent correspondence based on the perceived similarity of the component stimuli. 5
Intriguingly, von Hornbostel (1931, 1950) was convinced that sensory brightness represented one of the universal dimensions of sensory experience. His participants were required to match sounds of different pitches to points along a greyscale. He also had them crossmodally match scents with grayscale values. He took his results, and, in particular, the transitivity between different crossmodal comparisons that they apparently revealed, to demonstrate that the concept or quality of ‘sensory brightness’ was one that was common to all of the senses. Were this to be the case, one could easily imagine how simply matching the brightness of auditory and visual stimuli would provide a means of objectively aligning the senses. However, subsequent researchers were not convinced (see Cohen, 1934), arguing instead for a relative/relational judgment rather than necessarily a crossmodal perceptual mapping (see also Hartshorne, 1934). As Cohen (1934, p. 119) put it, the stimuli were ‘analogous’ rather than ‘identical’. Cohen explains as follows: “It would not be unreasonable then to suppose that cross-modality comparison should be based (physiologically, if not introspectively) upon relative positions within different ‘absolute’ scales. According to this view equation with respect to brightness of two experiences of different modalities would involve nothing more than the identity of relative positions upon two wholly independent scales.”
Echoing von Hornbostel’s theoretical speculation, Smith (1987, p. 96) writes that: “we have a lot to gain if we can get beyond mapping specific dimensions one to another and instead delineate the amodal dimensions.” The possibility for amodal concept(s) to exist is apparently linked to the existence of absolute correspondences, as Smith (1987, pp. 97–98) observed: “This suggestion of a trend from dichotomous, categorical treatments of continua to more relativistic ones ought not to be confused with the issue of absolute versus relative correspondences across dimensions. The notion of absolute correspondences between dimensions is that particular values on one dimension map onto particular values on another—for example, higher is not like brighter; rather, a specific pitch matches a specific brightness. As Marks et al. point out, there is little evidence for such absolute correspondences.” (Italics in original). Interestingly, Mellers and Birnbaum (1982, p. 600) came to much the same conclusion, namely that: “In cross-modality judgments, the scale values are influenced by the stimulus distribution: It appears that subjects compare the relative position of a stimulus in its distribution with the relative position of a stimulus of another modality to its distribution, going on to suggest that their results were consistent with a psychological relativity theory of cross-modality judgment.”
To summarize, beyond shedding light on the way in which stimuli are organized within perceptual dimensions in different sensory modalities, the distinction between metathetic and prothetic stimulus dimensions leads to the related distinction between absolute vs. relative crossmodal correspondences. The latter distinction is, in turn, instrumental when it comes to assuming the existence of an amodal concept, conceived of as the same physical property (such as shape) being picked up via multiple senses (see Lewkowicz & Turkewitz, 1980). This, it should be noted, is subtly different from von Hornbostel’s (1931, 1950) notion of universal dimensions of perceptual experience. The emphasis in the latter case would appear to be on the perceptual experience itself (i.e., what it is like), whereas the emphasis for those who have proposed amodal dimensions is on the multiple routes to picking-up information about physical properties out there, regardless of the perceptual qualities that may be associated with that information.
Researchers have long known that people can scale sensations in different sensory modalities across a wide range of prothetic stimulus dimensions (Stevens, 1966; see also Bond & Stevens, 1969). And while some have wanted to argue from such matching behaviours that there may be shared attributes of experience, such as sensory brightness in von Hornbostel’s (1931) case, it turns out that it is hard to rule out alternative, more cognitive, accounts of the underlying process, as we have just seen (see Cohen, 1934). That said, Marks et al. (1987, p. 84) conclude that: “In some fundamental sense, the similarities between pitch and brightness and between loudness and brightness are personal, internal, and subjective; they reside in perception per se and probably depend on common processes of neural coding.” Note the strong claim here, albeit with multiple provisos, that similarity relations are perceptual in nature. Nevertheless, the ‘personal, internal, and subjective’ element did not stop Marks (1974a, b) from trying to establish a robust psychophysics based on the crossmodal matching of the colour lightness of grey surfaces with the pitch of pure tones. 6 However, the ability to crossmodally match stimuli is presumably possible between any pair of prothetic stimulus dimensions, only a few of which might be argued to pick-up on the same stimulus, or perceptual, property (cf. Cohen, 1934). 7 Elsewhere, Marks et al. (1987, p. 5) talk of the “perceptual, cross-modal equivalence with respect to intensity”. As such, it may be difficult to assert anything concrete about the perceptual similarity of pairs of stimuli based merely on the fact that participants can engage in robust crossmodal matching of stimuli relative to the dimensions to which they belong. In contrast to Marks and colleagues’ suggestion that the similarities are fundamentally perceptual in nature, we will argue for an emotional mediation account of colour-sound correspondences instead (cf. Wilms & Oberfeld, 2018).
Here it may also be worthwhile to contrast hue and pitch with other dimensions of unisensory experience, namely basic tastes and odour qualities which can neither be organized prothetically nor metathetically. Notice how the latter can both be distinguished from one another (i.e., sweet can be distinguished from salty and sour etc.) without there necessarily being any obvious organizational structure linking the various stimuli within each category (i.e., tastes and odours, respectively). 8 The organization of smell and taste qualities are, in other words, neither prothetic nor metathetic. Nevertheless, research on the crossmodal correspondences now clearly reveals the existence of robust associations between colour hue categories (such as red, white, etc.) and basic taste qualities, such as sweet, bitter, salty, sour, salty, and umami (Spence & Levitan, 2021; Spence et al., 2015; see also Ikeda, 2002), as well as with odour qualities (see Spence, 2020b, for a review). 9 Crossmodal correspondences have also been documented between auditory pitch and both basic taste qualities (e.g., Knöferle et al., 2015), and various olfactory stimuli (e.g., Belkin et al., 1997; Crisinel & Spence, 2012; see Di Stefano et al., 2022; Spence, 2021, for reviews).
Given the latter observation, it would appear that the nature, or structure, of the underlying stimulus/perceptual dimension (see Table 1) does not necessarily constrain the likelihood of observing, or establishing, crossmodal correspondences, nor does it automatically connect to the notion of perceptual similarity. Taken together, therefore, the research that has been reviewed in this section would appear to support the claim that the strength and quality of crossmodal correspondences, such as, for example, between pitch and hue, does not necessarily bear any meaningful relation to the structure, or organization, of the underlying perceptual continua. This conclusion obviously runs counter to the intuitions of those who first commented on the connection between colours and musical sounds. 10 Indeed, it should be noted that the exact correspondence between these two classes of continua and their underlying neural representations is by no means clear (Lewkowicz & Turkewitz, 1980; Stevens, 1971).
It is worth noting how the conclusion that stimulus dimensional structure has no implications for crossmodal perceptual similarity would appear to contradict the claims made by the psychologist Lawrence Marks in his influential book, The unity of the senses, when he asserts that: “Sensory correspondence is not a domain of inquiry restricted to scientists, a matter solely for experimental scrutiny and empirically based theory. The plain fact is that sensory analogies do exist; they are important to the ways that we sense, perceive, and cognize; they are significant properties of the bodies and minds of people” (Marks, 1978, p. 7). In contrast, according to the view espoused here, while ‘sensory analogies’ may bias (or be based on) the kinds of sensory connections that people find it easy to establish cognitively, we do not believe that they play any role in determining the nature, or strength, of the crossmodal correspondences that are based on perceived similarity. However, while apparently contradicting Marks’ view, our own conclusion would appear to be much more in tune with the art historian Gombrich’s suggestion that we should focus our attention on the structural relationships in the system rather than on the similarity of the elements: “The problem of synesthetic equivalences will cease to look embarrassingly arbitrary and subjective if . . . we fix our attention not on likeness of elements but on structural relationships within a scale or matrix” (Gombrich, 1960, p. 314).
On the Multiplicity/Hierarchy of Crossmodal Correspondences
One other important point to bear in mind here is that, outside of the psychophysics laboratory, perceptual stimuli typically vary along several dimensions simultaneously (e.g., two visual stimuli may well differ in terms of their hue, but also in terms of their size, shape, texture, etc.). There is presumably a hierarchy of crossmodal correspondences, such that certain perceptual dimensions (or attributes) may only obviously be aligned if the more natural, or perceptually salient, matching dimensions (or attributes) are not available (cf. Gardner, 1974; Parise, 2016). That is, the participants in laboratory studies will presumably simply go with the best of the available response options when, say, in an experimental setting, they are instructed to match a colour to a musical sound. In the audiovisual context, note only how auditory pitch, a correlate to visual brightness, is overall at least as strong as the crossmodal correspondence with loudness. In fact, according to Marks (1989, p. 598), it may even be stronger, or more dominant. In other words, one can ask both what is the spontaneously chosen dimension corresponding to a given perceptual experience, and whether it is consensually chosen across individuals. Separately, for a given specific pairing of attributes or dimensions one can ask how consistent it is across individuals (see Spence, submitted). It is interesting to note how the former question has rarely been posed by researchers, with the latter normally assuming what the comparison dimension should be.
As such, merely demonstrating a statistically significant crossmodal correspondence between stimuli in two arbitrary perceptual dimensions (such as pitch and hue) does not necessarily mean that the correspondence so documented reflects the best of all possible matches with either of the component attributes, merely that it is the best of the options that happened to be available to participants at the time they were asked (see Spence & Levitan, 2021; Spence et al., 2015, on this point). Thus, the mere fact that a robust crossmodal correspondence can be established between two stimuli, or stimulus dimensions, does not mean that this will necessarily be the most natural, or obvious, association that an observer will make, or be drawn to, spontaneously should there be a better-connected dimension available (cf. O’Mahony, 1983, on the multiple correspondences with basic tastes, of which hue is but one). Notice here also how the frequency, or consensuality, with which specific crossmodal associations occur is not necessarily linked to the strength, or vividness, of the association itself (e.g., Marks, 1975; Rader & Tellegen, 1987; Spence, submitted). Or, to give another example, consider only how people may draw an analogy between spatial patterns in vision and temporal patterns in audition if those happen to be the only dimensions in which the auditory and visual stimuli differ/vary (Julesz & Hirsh, 1972). However, as soon as a temporal pattern is presented visually then this may come to dominate as the more natural, or intuitive, crossmodal match for an auditory temporal pattern than a spatial pattern.
On the Popularity of Colour-Sound Correspondences
So why, then, is it that people have been so interested in colour-pitch correspondences for so long? The topic is undoubtedly of theoretical interest in terms of the discussion of crossmodal perceptual similarity, with early interest seemingly stemming from the various structural and physical similarities that have been highlighted between the dimensions of stimuli/experience, though as we will see later (see
Coloured Hearing Synaesthesia
While some commentators have wanted to align the visible colour spectrum with the auditory pitch scale (Newton, 1704; see also Arnheim, 1986), others have focused instead on specific matches between particular pairs of stimuli, often based on the idiosyncratic inducer-concurrent relations experienced by those synaesthetes who report experiencing coloured music (see Suarez de Mendoza, 1890). For example, just take the Bauhaus/Abstract artist Kandinsky’s (1977) suggestion that the sound of the trumpet is scarlet (see Ione & Tyler, 2003, 2004; Just, 2017; though see also O’Regan, 2011). At one point, Kandinsky (1977, p. 40) writes that: “Light warm red …In music, it is a sound of trumpets, strong, harsh, and ringing.” In this particular case, one can only wonder whether it is mere coincidence that a couple of centuries earlier both Locke (1690) and Leibniz (1704/1896) had written about the blind man who apparently understands what scarlet is because of its being compared to the sound of the trumpet. Such consistency might be taken to suggest the existence of a meaningful (i.e., fundamental) mapping between a specific timbre and a particular hue category (see also Lavignac, 1899; Wallmark, 2019). Other commentators, though, have wanted to put forward an account in terms of learned associations instead. Just take, for example, the eminent neurologist MacDonald Critchley (1900-1997), who once apparently suggested that “the familiar story of trumpet blasts provoking a photism of red, may stem from the fact that such a sound immediately calls up in some persons an imagery of soldiers on parade. Ordinarily they shall be in dress uniform. This evokes a mental picture of scarlet. Should the middle part of this notion eventually become submerged, there will remain a synaesthetic linkage of trumpet-calls with redness.” (as quoted in Harrison, 2001, p. 209; cf. Barilari et al., 2018, for the suggestion that crossmodal correspondences involving visual stimuli may be internalized by the early blind in terms of the structural regularities in language). It is perhaps also worth considering the effect of key colour on the recognition of absolute pitch on the piano (Marvin & Brinkman, 2000).
Kandinsky (1977) referred to a number of specific colour-sound mappings in his writing, though it is often unclear whether the examples he gives constitute examples of the artist’s own synaesthesia, or should be better considered as examples of emotionally-mediated crossmodal correspondences (and hence might perhaps be expected to be experienced by us all; Spence, 2020a). Kandinsky often refers to the latter in terms of ‘inner harmony’ (cf. Harrison, 2001). Something of a similar challenge has faced those interested in trying to understand more about the idiosyncratic crossmodal mappings that have been suggested by synaesthetic Russian artists, namely the composers Rimsky-Korsakov (who reported ‘seeing’ music in the key of A-major as yellow; Myers, 1911), and Scriabin (Galeyev & Vanechkina, 2001; Myers, 1914). Once again, though, it has long been the subject of debate as to what exactly the relationship, if any, was between Scriabin’s personal repertoire of idiosyncratic audiovisual inducer-concurrent mappings, and those that he chose to incorporate into his colour circle/score/luce (Galeyev & Vanechkina, 2001; Triarhou, 2016). 12
‘Coloured hearing’ turns out to be one of the most commonly-mentioned forms of synaesthesia, and often appeared in the scientific literature in the decades either side of 1900 (e.g., Dauriac, 1902; de Parville, 1883; English, 1923; Flournoy, 1893; Ginsberg, 1923; Jewanski et al., 2009; Jewanski et al., 2011; Jewanski, Simner, Day, Rothen, & Ward, 2020; Underwood, 1893). 13 Marks (1975) provides a particularly thorough review of the multitude of early studies of coloured music and coloured speech sounds. 14 Interestingly, both pitch and timbre appear to be salient features (i.e., sensory inducers) driving the coloured musical concurrents that have been reported to date. At the same time, however, it has also been acknowledged that there may be a strong visual mental imagery component to many coloured responses to music (Karwoski et al., 1942; Mudge, 1920; see also Spence & Deroy, 2013). And while it has been suggested that the inducer-concurrent mappings experienced by synaesthetes do typically tend to be appreciated by non-synaesthetes (e.g., see Ward et al., 2008), that presumably has no necessary implications regarding the question of whether the inducer is perceptually (or necessarily even affectively) similar to the concurrent in the case of synaesthesia.
Early Suggestions Concerning the Alignment of Colour and Pitch
As has been noted already, of all the crossmodal correspondences that could potentially have captured the interest of commentators (see Spence, 2011, for a partial listing), it is the connection between colour and pitch that would seem to have attracted by far the most widespread, and longstanding, interest amongst everyone from philosophers to writers, and from scientists to artists. But, once again, we return to the question of why that should be? Is this particular pairing of sensory modalities (or rather stimulus dimensions) in some way special? Otherwise, how are we to explain what has made it stand out from the many other crossmodal correspondences that we now know about? As we have just seen, the answer would not appear to reside in the structural similarity between these stimulus dimensions. Interest from the late nineteenth century is tied up with the colour music so often reported amongst artists/in the press (Zilczer, 1987). However, looking even further back in time, one finds Sir Isaac Newton (Newton, 1704, book III, part I, qu. 13-14), famously drawing an analogy between the seven notes of the diatonic scale and the seven putative primary colours of the spectrum (Jewanski, 2010; see Figure 1).

Newton’s suggested analogy between the seven musical notes and the seven spectral hues; The musical divisions of the prism, as proposed by Newton (1704). The seven colours: red, orange, yellow, green, blue, indigo and violet, filled the seven intervals between the eight notes, starting from the highest note on the right. Deep violet, on the left, is the most refracted light, and red on the right is the least refracted.
Inspired by Newton’s Opticks, first published in 1704, the English chemist Field (1835) conceived colour painting based on musical criteria, leading to the definition of the general mapping between sound and colour. The systematic use of musical notions for describing the harmony of colours likely makes Field’s system the most rigorous and musically informed attempt to achieve a general theory of harmony which applies equally to music and sounds (see Figure 2; see also Spence & Di Stefano, 2022). Johannes Wolfgang von Goethe (Goethe, 1840, c. 201-202, para. 748) once also famously suggested an association between colour and musical key expressed as a general tendency to match darker hues to musical keys with flats in their signatures, and brighter colours to those with sharps. Or, as Goethe himself put it: “It would not be unreasonable to compare a painting of powerful effect with a piece of music in a sharp key; a painting of a soft effect with a piece of music in a flat key” (Goethe, 1840, p. 342).” 15 Goethe also wrote: “That a certain relation exists between the two, has been always felt; . . . Colour and sound do not admit of being directly compared together in any way, but both are referable to a higher formula, both are derivable, although each for itself, from this higher law” (Goethe, 1840, p. 298). Notice how the stress here would appear to be on a structural analogy, or affective similarity, rather than on a direct crossmodal mapping driven by any perceived similarity between the component stimuli.

The correspondence between colours and musical sounds as theorized by Field (1835, p. 79). Each coloured triangle is divided into two equal triangles of slightly different hues that correspond to the chromatic intervals (represented also in circles on the left). Pitches are ordered from low to high, with darker and lighter hues, respectively.
Subsequently, Hermann Ludwig Ferdinand von Helmholtz (1867, p. 237) described the following analogies between the notes of the piano and the colours of the spectrum: G, Red; G#, Red; A, Red; A#, Orange-red; B, Orange; c, Yellow; c#, Green; d, Greenish-blue; d#, Cyanogen-blue; e, Indigo-blue; f, Violet; g, g#, a, a# Ultra-violet; b, end of the solar spectrum. According to Kubovy and Van Valkenburg (2001), this particular scale extends to about a Fourth beyond the Octave. 16 However, in the same multi-volume work, one also finds the famous German psychophysicist arguing that: “The distinctions among sensations which belong to different modalities, such as the differences among blue, warm, sweet, and high-pitched, are so fundamental as to exclude any possible transition from one modality to another and any relationship of greater or less similarity. For example, one cannot ask whether sweet is more like red or more like blue. Comparisons are possible only within each modality; we can cross over from blue through violet and carmine to scarlet, for example, and we can say that yellow is more like orange than like blue!” (Helmholtz, 1878/1971, p. 77). 17 While the two claims made by Helmholtz might, at first, appear to be contradictory, it is presumably entirely possible to recognize an analogy between two physical stimulus dimensions without presupposing that the corresponding stimuli on the respective unisensory scales will necessarily be perceived as any more (or less) perceptually similar than other pairs of mismatching stimuli.
However, it is important to note that not everyone has necessarily agreed with Helmholtz’s bleak assessment concerning the possibility of experiencing perceptual similarity between the senses. Just take the following quote from Lawrence Marks, writing in 2011 about “perceptual similarities between and among sensory experiences in different modalities. Much as the colour aqua is more similar phenomenologically to cerulean than to pink, the flavour of lime more similar to lemon than to banana, so too are low notes played on a bassoon or an organ more like dark colours such as brown or black than bright colours such as yellow or white, while the higher notes played on clavier or a flute resemble yellow or white more than brown or black.” (Marks, 2011, p. 52). Unfortunately, though, Marks makes no reference to Helmholtz’s prior work, and hence it is not possible for us to know how he would respond to what appear to be the latter’s diametrically-opposed position.
Where Newton, Goethe, and Helmholtz led in terms of laying out their analogical mappings between musical notes and spectral colours, various other artists, musicians, composers, and inventors have all come out with their own slightly different correspondence tables mapping colours to the pitch of musical notes (see Table 2 for a number of the proposed correlations/associations between colour and chromatic musical scales). And while early interest was seemingly driven more by theoretical considerations related to the existence of a natural structural alignment (or analogy; see Jewanski, 2010) between these stimulus dimensions of perceptual experience, the subsequent development of such crossmodal tables has often been motivated more by the practical interest in trying to create colour organs and/or ‘light symphonies’. The latter inspired/facilitated by technological developments and ideas circulating in the world of abstract art (Zilczer, 1987).
Correlation of colour and chromatic music scales. Table highlighting the various crossmodal correspondences that have been proposed since Newton. Newton’s correlation conforms to the seven-tone scale which he was probably familiar with. Castel’s correlations were made with the 12-tone chromatic scale, but as Wells (1980) notes, the hues fall in frequency as the tones rise in frequency. The alignment is reversed for the following correlations. (The scale, attributed to E. G. Lind, presents the pitch of tones (sound frequency, Hz) and the frequency of light (presented in parentheses in the Table as 10-8 Hz, for example red is 476 × 108 Hz).) [Reprinted with permission from Wells (1980, Table 1).].
Colour-Pitch Correspondences
During the middle decades of the twentieth century, a number of experimental psychologists, often without any reference to the earlier studies (such as those mentioned above), set themselves the task of trying to establish the nature (and consequences) of any crossmodal correspondences that might exist between colour and pitch. According to Kubovy and Van Valkenburg (2001, p. 114), the justification for this line of empirical research was simply that: “early psychologists concluded that colour gets mapped onto pitch, since they are non-spatial and non-temporal, and since they are both caused by waves.” Notice here how the justification is in terms of the physical nature of the stimuli themselves.
In terms of empirical research, one of the first studies to assess colour-tone associations systematically in a non-synaesthetic population was reported by Simpson et al. (1956). These researchers investigated the nature of any crossmodal associations between colours and pure tones in 995 elementary school children (between third and sixth grade). The children were presented with each of six pure tones (125, 250, 1,000, 4,000, 8,000, and 12,000 Hz, or cycles per second, c.p.s.) 18 at 40 and 50 dB and forced to choose which of six spectral colours (violet, blue, green, yellow, orange, and red) they ‘thought of’ immediately upon hearing each of the tones. Yellow and green were predominantly associated with high pitch. They suggested red and orange could be categorized as ‘middle-pitched’ colours, while violet and blue were predominantly ‘low-pitched’ colours. 19 Here, though, it is worth mentioning the point made earlier concerning the fact that the participants were only ever able to choose the best of the six available options. Hence, were it to be the case that pink were to have been the optimal match to one of the pure tones, say, then this would not be apparent from the data.
Simpson et al.’s (1956) study was phrased as a game with the children themselves told to guess if they were unsure. It is therefore unclear whether the mappings that were captured were cognitive in origin or anything more (i.e., perceptual crossmodal correspondences). And while Simpson and colleagues claimed to have demonstrated crossmodal pitch-hue correspondences (or ‘synaesthesia’ in their misleading terminology), 20 several researchers have subsequently questioned whether these researchers actually controlled the brightness and/or saturation of their stimuli (cf. Wicker, 1968). In the absence of certainty on this latter point, the possibility must remain that the crossmodal correspondences reported in this early study might reflect nothing more than a pitch-brightness correspondence (see Bleuler & Lehmann, 1881; Flournoy, 1893; Marks, 1982; and Riggs & Karwoski, 1934) rather than necessarily having anything to do with hue categories per se (Spence, 2011). Certainly, when other experimenters have subsequently looked for correspondences between hue and pitch, it is striking how many of them have failed in their efforts (e.g., Bernstein et al., 1971; Wicker, 1968) (see Table 3 for a summary).
Summary of documented crossmodal correspondences between a selection of perceptual qualities of sounds and visual colour (and shape). Null results preceded by a '-ve:'.
Summary of documented crossmodal correspondences between a selection of perceptual qualities of sounds and visual colour (and shape). Null results preceded by a '-ve:'.
A couple of the studies that have been reported by Marks and his colleagues are interesting inasmuch as they appear to highlight the existence of a crossmodal correspondence between colour words and the associated, or corresponding, pitch in adults (Marks, 1982) (see Figure 3). A few years later, Marks et al. (1987) went on to demonstrate much the same crossmodal associations in children aged between 9 and 13 years (see Marks et al., 1987, Figure 47). Such results presumably suggest that people may associate different (prototypical) or imagined colours as having a specific brightness level, and moreover that they do so from early in development. One final point to note here concerns Simpson et al.’s (1956) suggestion that: “A third interpretation, most consistent with our finding of selective pitch-colour associations in children, is that particular hues and pure tone frequencies are associated with each other because of a certain inherent “belongingness” between each member of a given pair and a particular mood.” (Simpson et al., 1956, p. 102). As we will see later, this mood, affect, or emotion-based account of crossmodal mappings is one that many other scientists, as well as a number of artists have seemingly also stumbled across (cf. Kandinsky’s, 1977, ideas around the notion of ‘inner harmony’).

Mean ratings of pitch versus mean ratings of brightness to colour words. (Open symbols: results of Marks (1982), Experiment 2; visual scale, dark-bright. Filled symbols: results of Marks (1982), Experiment 4; visual scale, dim-bright.) [Reprinted with permission from Marks (1982, Figure 6).].
Wicker (1968) conducted a pair of studies investigating what he referred to as the “intersensory dimensions in perceptual or cognitive space, i.e. of dimensions which are significantly descriptive of sensory inputs from more than one modality.” [italics in original] (Wicker, 1968, p. 178). Wicker was particularly interested in “behaviour which can be conceptualized as reflecting perceptual or associative alignment between sensory attributes from different sense-modalities” (Wicker, 1968, p. 178). In a first experiment, the participants were presented with a range of 13 pure tones (300, 400, 500, 600, or 700 c.p.s.; this, note, a more sensible choice of tones than used in Simpson et al.’s, 1956, study, given that the researchers were potentially able to investigate also another variable/dimension (such as different notes) beyond higher-lower frequency…) of varying loudness (53-84 dB); they were also presented with 13 coloured Munsell colour squares (green, red, blue, and yellow) of varying brightness and saturation. The participants were instructed to rate the similarity of all pairs of tones and, in a separate experimental session, to rate the similarity of all possible pairs of colour patches. They also had to rate every individual tone and colour patch in terms of 25 semantic differential adjective scales (cf. Moller et al., 2009). They were further encouraged to rate the similarity of the auditory and visual stimuli in another part of the study. In particular, according to Wicker (1968, p. 180): “They were asked to rate the similarity of every tone to every colour on a nine-point scale.”
Multivariate scaling revealed two orthogonal alignments underlying the intersensory and cognitive space: pitch-brightness and loudness-contrast. These were established using multidimensional-scaling, semantic-differential scaling, and an intersensory transfer of training paradigm. At the same time, however, Wicker (1968) also noted how several other ‘alignments’, what in today’s parlance might well be called crossmodal correspondences (Spence, 2011), were not evidenced by his analysis of the data, namely pitch-saturation, loudness-brightness, and loudness-darkness. The failure to find the systematic relationship between pitch and hue was especially surprising given the results of Simpson et al.’s (1956) previous developmental study. Taken together, the studies reported by Simpson et al. and Wicker would therefore seem to suggest that colour-sound mapping is likely based on frequency (in terms of low-high continuum) but ignores octave repetition. In fact, if the mappings were to be based on octave repetition, then most experimental stimuli would have been matched to the same colour (as they are all octaves apart).
A few years later, Bernstein et al. (1971) conducted a small study in which four participants made speeded discrimination responses to the colour (red vs. blue) of a light which was either presented by itself, or else was accompanied by a task-irrelevant tone (of either 100 or 1,000 Hz). However, no significant effect of crossmodal mapping was observed as a function of the colour-pitch mapping. 21 As outlined by Bernstein and colleagues, the particular motivation for undertaking their research was to test the hypothesis that: “a high frequency tone would have greater facilitatory effect for a stimulus from the high (short wavelength) end of the spectrum and a low tone would have greater facilitatory effect for a stimulus from the low (long wavelength) end of the spectrum.” (Bernstein et al., 1971, p. 1327). However, the tiny sample size means that it is unclear whether Bernstein et al.’s study was adequately powered to find an effect in the first place.
Finally in this section, Melara’s (1989) studies on dimensional interactions between colour and pitch should also be mentioned. As a cognitive psychologist, Melara was particularly interested in studying dimensional interactions between colour and pitch using the well-established Garner interference paradigm. Indeed, there had been much research interest in the preceding decades concerning Garner’s notion that there was a meaningful distinction between separable and integral stimulus dimensions. The participants in Melara’s studies had to make speeded classification responses to either black and white stimuli or to triangle waveform tones having a fundamental frequency of either 1046.5 Hz (high) or 174.6 Hz (low). That is, the participants made speeded classification responses to a sequence of stimuli presented in one sensory modality while attempting to ignore the task-irrelevant stimuli that were sometimes presented in the other modality. The results of a series of experiments suggested that the crossmodal connection between these achromatic colours (what might be described as brightness) and pitch was partly strategic and partly mandatory (cf. Sun et al., 2018). It is, though, important to bear in mind here that evidencing congruency effects in speeded classification tasks such as these does not, in-and-of-itself, necessarily tell us anything about the perceptual similarity of the stimuli that were used. That is, congruency effects in behavioural studies have multiple causes, only a minority of which are likely to be perceptual in origin.
One other point to note here, in passing, is the very different pitch ranges that have been used in the studies reported above. That is, while the label ‘high-pitched’ might be associated with a pure tone of 1,000Hz in one study (e.g., Bernstein et al., 1971) the same label has been associated with a 12,000Hz tone in another (Simpson et al., 1956). The assumption amongst many of the experimental psychologists would therefore implicitly seem to have been it is the relative position of stimuli along their respective continua that matters (cf. Spence, 2019, for a review). But, one might ask, is that necessarily the case? 22 What is also worth noting is that most (if not all) of the above-mentioned studies used pure tones, which are not properly musical sounds, as they lack overtones (and therefore timbre). Since fundamental aspects of frequency perception (e.g., discrimination) are shown to be facilitated (in healthy participants) with complex or harmonic tones rather than pure tones (e.g., Novitski et al., 2004; Zeitlin, 1964), this would make the choice of pure tones even more problematic, for those wanting to focus on audiovisual correspondences based specifically on frequency. It is relevant here to note that with his ‘Gamut of Odors’ (see Figure 4), Septimus Piesse has also been taken to have highlighted a direct one-to-one correspondence between musical notes and scents. Relative correspondences presumably suggest a belief in some kind of structural analogy rather than a direct perceptual mapping based on the phenomenological similarity of the component stimuli.

Scale of crossmodal correspondences between sound and odours reproduced from Piesse (1867, pp. 42–43).
Taken together, the experimental research that has been conducted to date to test the psychological correspondence between colour (hue) and pitch has failed to reveal any convincing evidence of such a mapping (or alignment of dimensions), at least not when brightness and saturation have been carefully controlled (cf. Dailey et al., 1997; Hubbard, 1996). Given these null results, researchers would subsequently appear to have been dissuaded from further searching for hue-pitch correspondences. Indeed, as Kubovy and Van Valkenburg (2001, p. 115) put it: “As profound differences between light and sound became clear in the twentieth century, psychologists abandoned the explorations of parallels between pitch and colour.” Instead, it would appear from the psychophysical research reviewed here (see Wicker, 1968) that the more natural correspondence may instead actually be between pitch and brightness (cf. Bleuler & Lehmann, 1881; Flournoy, 1893; Marks, 1982) and between loudness and contrast. 23
Pitch-Brightness Correspondences
Crossmodal correspondences between auditory pitch and visual brightness/lightness have often been reported in the psychology literature (Marks, 1987; see also Sabaneev & Pring, 1929; see Table 4). What is more, according to Marks (1978, 2011; Marks & Bornstein, 1987), a neurophysiological explanation for this particular mapping may be based on the existence of common, underlying sensorineural codes (of intensity, brightness, duration, etc.). So, for example, Marks points to the fact that the auditory system appears to use temporal patterning (neural response frequency) in the coding of both pitch and loudness, 24 and believes that this may help to explain the commonalities between both of these dimensions and visual brightness (e.g., see Marks, 1991, p. 193). This suggestion, which echoes an earlier one made by Stevens (1957) regarding the common coding of increased intensity in different sensory modalities in terms of increased neural firing, can be seen as an account of at least certain correspondences motivated on the basis of a similarity in the neural processing/encoding principles (cf. Ellermeier et al., 2021; Walsh, 2003). Spence (2011) labelled those correspondences that are putatively based on such neural similarities as ‘structural correspondences’. 25 Nevertheless, despite the absence of robust empirical evidence supporting the existence of pitch-hue correspondences, a number of other crossmodal correspondences between dimensions of colour and sound have been documented over the last decade or so, including between timbre and hue (e.g., Adeli et al., 2014). For instance, Hamilton-Fletcher et al. (2017) examined how sound influences chroma or hue when properly controlling for lightness. To address this question, they had their participants physically adjust equiluminant colours until they matched with certain sounds. They found that, for pure tones, increases in frequency were associated with increases in chroma. Increasing the loudness of pure tones also increased chroma. Intriguingly, for complex sounds that share the same bandwidth of frequencies (100-3,200 Hz), but that vary in terms of which frequencies have the most power, all of the stimuli were associated with yellow hues. Based on their results, the authors suggest that the presence of frequencies higher than a certain threshold (i.e., above 800 Hz) consistently yields yellow hues.
Summary of the various different types of crossmodal correspondence that have been proposed that have to connect auditory and visual stimuli and selected literature sources suggested and/or supported them.
Summary of the various different types of crossmodal correspondence that have been proposed that have to connect auditory and visual stimuli and selected literature sources suggested and/or supported them.
Note that Spence (2011) originally labelled this category ‘Structural'. However, given that this label had already been introduced 20 years earlier by Sebba (1991) to describe dimensional alignment of perceptual space, it would seem more appropriate, in hindsight, to call this category 'Physiological', instead given that that is the putative cause/source of the correspondence.
Anikin and Johansson (2019) reported on a series of 22 experiments in which they investigated crossmodal correspondences between visual (luminance, hue [R-G, B-Y], saturation) and acoustic dimensions (loudness, pitch, amongst various others). Their results revealed that loudness is associated with saturation, while pitch is associated with both luminance and saturation.
As well as investigating any pitch-hue correspondences, experimental psychologists have, over the years, also assessed the nature and consistency of any timbre-colour (hue) crossmodal correspondences. In the latter case, though, the motivation behind researchers’ attempts to demonstrate such correspondences would appear to have been driven more by, or at least related to, the particularities of the idiosyncratic reports of coloured music synaesthetes rather than anything else (see Donnell-Kotrozo, 1978). For example, the composer Raff reported that he perceived the colour of the sound of the trumpet to be scarlet (other people apparently report it to be bright red; Ortmann, 1933); for Kandinsky, meanwhile, the sound of the tuba was also red. Or take Ginsberg’s (1923, p. 589) suggestion that: “Most, if not all of us, seem to agree with the following descriptive phrases; the silvery tone of the violin, the red blare of the cornet, the golden voice of the tenor, etc.” In 1899, the French musicologist, Albert Lavignac suggested a number of instrument-colour relationships such as Flute—azure blue; Oboe—green; Clarinet—red-brown; Horn—yellow; English horn—violet; Trumpet—crimson with orange; Trombone—crimson with orange; Cornet—red; Bassoon—dark brown; Timpani—black; Side drum—grey; Triangle—silver; and Violin—blue (see Donnell-Kotrozo, 1978; Lavignac, 1899). 26
In an early study, entitled ‘The common synaesthesia of music’, Mudge (1920) assessed visual responses to music in a group of 50 mature students. The latter were instructed to report the colours or brightnesses that they associated with certain tones, keys, instruments and familiar musical compositions. While Mudge notes that there was little uniformity as to the particular colour associations (and that eight of his participants reported a total lack of such colour associations), Mudge does admit that some tentative generalizations appear to be warranted. In particular, 34 of the students thought that low tones yielded dark colours, 26 that medium tones yield medium bright colours, and 36 that high tones yielded bright colours. In terms of timbre-colour correspondences, the sound of the violin was typically reported as blue or violet (or related colours), the trombone was reported to be dark, often brown (though for some it was yellow), and the clarinet and flute were reported to be bright. Mudge concludes with the suggestion that there appears to be an association between the complexity of tones, or timbre, and the ‘richness’ of the associated colour.
In recent years, a number of other studies have been published in which significantly non-random associations have been documented between timbre and colour/hue in non-synaesthetic individuals (cf. Adeli et al., 2014) (see Table 4). For instance, Reuter et al. (2018) claim (albeit only in a conference paper thus far) to have demonstrated consistent colour-timbre mappings in non-synesthetic individuals. Meanwhile, in terms of auditory timbre-colour/shape crossmodal correspondences, Adeli and colleagues concluded that their participants: “strongly associated soft timbres with blue, green or light grey rounded shapes, harsh timbres with red, yellow or dark grey sharp angular shapes and timbres having elements of softness and harshness together with a mixture of the two previous shapes.” (see also Arai et al., 2021; Gurman, McCormick, & Klein, in press, on timbre-visual shape correspondences). It should, though, be noted that in much of the modern research, colour and visual brightness are conceptualized as a couple amongst many of the crossmodal correspondences that people may hold with specific timbres, including with basic tastes, olfactory stimuli, textures, volumes etc. (Crisinel & Spence, 2010, 2012; see also Giannos et al., 2021; Saitis et al., 2020). 27 That is, hue-brightness to timbre crossmodal correspondences would not seem to be granted any kind of preferential status relative to the others. This state of affairs obviously contrasts with the disproportionate amount of theorizing that has seemingly been given over to the putative crossmodal correspondences between pitch and hue.
On the Artistic Exploitation of the Crossmodal Correspondence Between Colour and Sound: Colour Music and Colour Organs
While one important element, or strand, of artistic interest in the relation between colour and music relates to analogies between colour and sound, some artists and inventors have gone further in wanting to postulate an exact physical correspondence between light and sound. 28 Indeed, the promise of there being a robust code that would help to translate visual experience into sound, or vice versa, has long been the dream of many of those artists interested in trying to create ‘colour music’ (Moritz, 1997). According to Moritz, the idea that colour and music were somehow connected fascinated Renaissance artists such as Leonardo da Vinci (who produced elaborate spectacles for court festivals), Athanasius Kircher, the popularizer of the “Laterna Magica” projection apparatus (cf. Castel, 1725, 1735, 1740; Jewanski, 2010), and Arcimboldo, who produced entertainments for the Holy Roman Emperors in Prague. In the Sixteenth Century, the latter artist also conceived of a colour music (Eastlake, 1840; cf. Spence & Di Stefano, 2022). Meanwhile, Erasmus Darwin (1790) proposed a luminous music, to be played with coloured lights that were to be synchronized to the sounds of a harpsicord. Intriguingly, both Castel and Darwin believed that there was a certain ‘natural’ relation between colours and sounds. For Castel, whose colour organ was first exhibited in 1735, each note of the scale corresponded to a specific colour. In particular, the Jesuit priest assigned blue to do, green to re, yellow to mi, and red to sol, not arbitrarily or on whim, but because of some believed intrinsic appropriateness (see Marks, 1975, p. 313; see Table 2).
Colour music can be seen as just an extreme manifestation of the concept of musical analogy in the visual arts (Zilczer, 1987, p. 101), and while a number of artists were happy to merely title their paintings with musical terms, others have wanted to go further (cf. Vergo, 2012; Zilczer, 2016). As Zilczer (1987, p. 102) notes: “Since the late nineteenth century, musical analogy in the fine arts has taken a variety of forms which range from simple parallels to more complex systems of correspondence between the visual and musical arts.” Zilczer (1987, p. 101) suggests that the term ‘colour music’ was coined in the closing decades of the 19th Century to describe a visionary new art form, created by means of coloured lights and independent of easel painting. As the colour theorist Maud Miles put it: “Perhaps some genius will invent a pipe organ behind a screen of coloured lights. If these same lights could be operated by the same keys that play the organ, and if they could be reduced in brilliancy as the music grows softer, then a nearly perfect music and colour parallel would be produced.” (Miles, 1914, p. 97).
The synaesthetic Russian composers Rimsky-Korsakov and Scriabin both came out with their own distinctive colour-tone mappings at around the same time (see Table 5, for a comparison). That said, according to a survey by Vanechkina (1968; cited in Galeyev & Vanechkina, 2001), most Russian practitioners tended to follow Rimsky-Korsakov’s mappings rather than those of Scriabin. Galeyev and Vanechkina suggest that Scriabin built up his particular system of colour-tonal analogies, deriving many of the correspondences theoretically, rather than based on his own well-documented synaesthesia (Myers, 1914; though see Harrison, 2001). What is more, it should also be remembered here that Scriabin’s composition apparently did not match neatly with his synaesthesia. Note, once again, the theoretical rather than perceptual motivation underpinning the mapping. According to Galeyev and Vanechkina (2001, pp. 359–360): “Scriabin’s analogies are not so mechanistic in their motivation: his correlations are based on the equivalence of “complexity” of tonalities and colour (Schumann has defined tonality “complexity” as the number of alteration signs in the designation of tonalities; in its turn, colour “complexity” can be defined according to its place in a spectrum: colours at the red end are “simpler” than colours at the blue end)!”
Colour-sound correspondences proposed by the Russian composers Rimsky-Korsakov and shortly thereafter by Alexander Scriabin. Note that absence of ‘major’ designation in Scriabin’s list is common in 20th century music. Here it might be wondered whether Rimsky-Korsakov is referring to keys (major) whereas Scriabin is referring to isolated tones. [Adapted from the text of Galeyev and Vanechkina (2001).].
Colour-sound correspondences proposed by the Russian composers Rimsky-Korsakov and shortly thereafter by Alexander Scriabin. Note that absence of ‘major’ designation in Scriabin’s list is common in 20th century music. Here it might be wondered whether Rimsky-Korsakov is referring to keys (major) whereas Scriabin is referring to isolated tones. [Adapted from the text of Galeyev and Vanechkina (2001).].
Complicating matters somewhat, while the experimental psychologists have tended to present pure tones (i.e., from a tone generator), in a musical context, one is much more likely to experience chords instead. For instance, just takes the colours proposed by Scriabin to accompany his Prometheus: Poem of Fire (‘light symphony’; Galeyev & Vanechkina, 2001). In this case, the colours were not correlated with individual tones in a chord structure but rather with tonalities and chordal complexes. Scriabin’s harmony in Prometheus “was practically outside the framework of the traditional major-minor system.” (Galeyev & Vanechkina, 2001, p. 359), and, as such, it would therefore be a mistake to associate the colours in the separate tones of the “octave spectrum” (see Figure 5).

The scheme of “colour hearing” correspondences, by A. N. Scriabin. [Reprinted with permission from Galeyev and Vanechkina (2001, Figure 1).].
Cutietta and Haggerty (1987) conducted an intriguing study on colour-music association in several hundred participants aged between 3 and 78 years. The participants were asked to listen to short excerpts (30 s) from the following three music compositions: Gustav Holst’s Suite No. 1 in Eb, third movement, “March”, Modest Mussorgsky’s Pictures at an Exhibition, fourth movement, “Bydlo”, and George Friedrich Handel’s Music for the Royal Fireworks, “Bourree”. While listening to each music example, the participants were asked to indicate the colour which the music reminded them or made them think of, or that they associated with the music (orange, yellow, green, blue, purple, red). Interestingly, the authors pointed out that no more than 5 of the 350 subjects asked for any clarification as to what was meant by a “colour association” with music. Overall, the results tend to show that when asked to associate colours with music, a large percentage of participants were homogeneous in their colour choices (e.g., barely any participant associated the Music for the Royal Firework to blue, while more than 50% aged in their 20s, 30s, 60s, 70s associated it with yellow). Consistency of colour associations seems to emerge suddenly at around 9 years of age.
Cutietta and Haggerty (1987) interpreted their findings as contrasting with the idea that colour-music associations result from the experiential conditioning. Instead, they suggest that “results can best be explained by a theory that hypothesizes that colour associations to music are the result of some sort of sensory processing of music that appears to be widespread and consistent across a wide age spectrum” (Cutietta & Haggerty, 1987, p. 89). They also suggest associations might emerge from a processing “related to emotional responses” (pp. 89-90), that might partially explain fluctuation of associations to colours during development. It is important to note the mechanisms that might drive the associations between music and colour in this study need not necessarily be similar to those that researchers have focused on when considering correspondences involving pure tones. In fact, when using extended musical excerpts, it can be hard to determine which feature(s) of the stimulus may have prompted any consensual associations that are reported (e.g., average pitch, melodic contour, timbral or rhythmic aspect). As such, any structural organization of those auditory stimuli could hardly be meaningfully applied to musical excerpts such as the ones considered by Cutietta and Haggarty (and see Spence, 2020a, for a review of the literature on colour-music correspondences).
Alexander Wallace Rimington (1854–1918), a British inventor and professor of fine arts, based in London, built one of the first colour organs, patented in 1893. The first public performance of his colour organ took place in June, 1895, at St. James Hall, London (Rimington, 1895). 29 Rimington believed in the physical equivalence of light and sound, and in his compositions, attempted to apply three musical functions, namely time (possibly referring to tempo), rhythm, and instantaneous combination (slow or rapid and varied) to colour. Zilczer (1987, pp. 118–119) notes that: “Rimington equated the seven spectrum bands of natural light with diatonic intervals which composed the musical octave.” (see Figure 6). 30

Illustration of the colour-musical note mapping that was adopted by Rimington (1895) for the first performance of his colour organ in London.
Interestingly, however, a closer inspection of Rimington’s original address, given at the start of his first public performance in London, reveals that while the inventor stresses the physical analogies between vision and hearing, he leaves open the question of what he calls the psychic consequences. As Rimington puts it: “It will be a question of opinion, and of further experiment, whether the close physical analogy between the octaves of colour and sound has its physiological and psychical counter-part.” In fact, Rimington’s (1895) address is worth quoting at some length here, given the clarity with which he pursues the physical similarity and structural analogies between colour and music: “Taking the spectrum band as the basis of all colour, there are two remarkable points of resemblance between it and the musical octave, which have long been commented upon and discussed. The first of them is that the different colours of the one, and the different notes of the other are both due to various rates of vibration, acting on the eye or the ear. This is very simply and clearly put by Professor Schellen in his great work upon spectrum analysis. ‘Different colours,’ he says, ‘are produced by the different degrees of rapidity with which the ether vibrations recur, just as the various notes in music depend upon the rapidity of the succession of vibrations of air.’ In a word, ‘colours are to the eye what musical tones are to the ear….
I will therefore pass to the second and equally remarkable analogy between the octave of colour and the octave of sound. If we measure the rate of vibration at the first visible point at the red end of the spectrum, we shall find it is approximately one-half what it is at the extreme violet end. Now in music, as we all know, this relationship is the same. If we take the first and last notes of an octave (by which I mean the twelfth) the latter has nearly double the number of air vibrations – and the first note of the new octave has exactly double. This, as we have seen, is the case also with the spectrum band so far as the one octave is concerned; the lowest red stands for the first note of the octave, and the highest violet for the twelfth or last note. Further than this, the blue end of the spectrum shows a tendency to a return to red in the violet, and the red end of the spectrum shows a similar tendency towards a reappearance of blue, in the fact that it passes from scarlet to carmine before it fades away, so that Sir John Herschel and others may have been right when they surmised that, if our eyes could see them, the colours of the visible spectrum would probably repeat themselves in successive octaves, in the great invisible portions beyond the red and the violet.
Starting from these remarkable physical analogies, I have divided the spectrum band into diatonic intervals or notes, on the same plan as that of the musical scale.’
It is pertinent to note that the only claim Rimington makes in support of his suggestion that his colour organ successfully translated music into visual form was the fact that apparently great pieces of music played on the colour organ give rise to pleasing effects. 31 Here, one is reminded of similar claims concerning the scent organ (see Spence, 2021, for a review). Rimington wrote: “That colour, like sound, is capable of expressing artistic emotion there can, I think, be no question, but whether it expresses it in the same way as music is doubtful. It is, however, a somewhat strong argument in favour of the existence of the physiological and psychical analogy, that when we avail ourselves of the works of great musical composers for the interpretation of the new art, the results are vastly superior in variety, delicacy, and beauty of colour to those hitherto obtainable by other methods.” One can also find a similar appeal to the colourfulness of the transposition in Pridmore (1992, p. 60) when discussing what the Beatles’ music would look like when transduced by one of his sound-to-light convertors: “Interestingly, all-time famous pop bands, e.g., The Beatles, play very “colourful” music.” (see also Garner, 1978, for speculation on what the opening notes of “God save the Queen” would look like if rendered sequentially in colour). Again, though, one is left wondering how pleasant a control group of random sounds/music would be?
The artist Max Weber (Weber, 1916), just like Kandinsky (1977), was also inspired by synaesthesia, 32 and by the emotional correspondences that he though existed between painting and music (note here Kandinsky’s notion of ‘inner harmony’). A similar appeal to emotion can also be found in Sabaneyev (1911, p. 200; trans. in Galeyev & Vanechkina, 2001, p. 358) who writes that: “Colours, on the one hand, and sounds, on the other hand, engender various moods, often similar to one another, therefore—the associations of colours and sounds arises.”
According to Zilczer (1987), the North American architect Claude Bragdon, who was also a fan of colour music, put on a couple of colour music shows in New York (one in 1915 and the other in 1916). Bragdon based his notion of colour music on the psychological, rather than the purely physical analogy between light and sound. In his 1918 book, Architecture and democracy, Bragdon writes: “If we are to have colour symphonies, the best are not likely to be those based on a literal translation of some musical masterpiece into colour according to this or any theory, but those created by persons who are emotionally reactive to this medium, able to imagine in colour, and to treat it imaginatively.” (Bragdon, 1918, p. 139). In his work, Bragdon was therefore keen to enlarge the concept of colour music beyond a ‘simple’ physical correspondence between colour and music (see also Bragdon, 1916). Meanwhile, according to Zilczer (1987, p. 122), the Philadelphia pianist, May Hallock Greenewalt (active at around the same time) also promoted colour music. She apparently agreed with Brandon on the importance of the psychological basis for musical analogy, by which we might infer the common mood associations of colour and music.
Artist Stanton Macdonald-Wright, who together with Morgan Russell invented the style ‘Synchromism’ (see South, 2001; Zilczer, 1987), believed that an exact correspondence between the colour spectrum and the musical scale would provide a key to their new art. He later wrote: “For many years there has been growing a conviction that there is some deeply rooted, recondite analogy between colour and sound. Both are demonstrably vibratory; both have a varied and defined emotional stimulus for us, and each is used as a medium for an art.” (Macdonald-Wright, 1924, p. 14). This suggestion is curious inasmuch as it appeals both to an exact correspondence between colour and music while also invoking the importance of emotion.
Meanwhile, in the Soviet Union (as was), the Group 'Prometei' carried forward Scriabin’s dream of synthesizing electric light and music in the intervening decades since the composer’s death. According to one commentator: “They explored correlations between the two fields in a number of different ways: (a) correlations with individual qualities of music (pitch, key, timbre, and harmony); (b) correlations with musical themes; (c) correlations with different qualities and themes of music; (d) the polyphonic (contrapuntal) audio-visual integration approach.” (see Galeyev, 1976; Pridmore, 1992).
According to Zilczer’s (1987) excellent summary of the literature on colour music, those working in the field of contemporary art would largely seem to have lost their former fascination with the synaesthetic coloured hearing. As Moritz’s (1997) review makes clear, a number of other artists/film-makers worked with developing abstract colour displays to pair with music. One might think of Sergei Eisenstein’s (1898-1948) interest in sound-colour montage (see Harrison, 2001, p. 133). Eisenstein also wrote at one point that there is no “pervading law of absolute meanings and correspondences between colours and sound.” (quoted in Harrison, 2001, pp. 133–134). Disney’s Fantasia, with music conducted by Leopold Stokowski, can perhaps also be seen as a natural extension of the colour music movement (Pridmore, 1992), though a key strand of the emerging interest was linked to the notion of audiovisual Gestalten, which relied as much on synchrony as correspondence (see also Alves, 2005; Battey & Fischman, 2016; Corra, 1973; HaCohen, 2016; Haverkamp, 2020; Kaduri, 2016; Russet & Starr, 1988; Whitney, 1980). At the same time, however, it has also become increasingly clear that there is no straightforward structural or psychological mapping between this particular pair of senses. Indeed, Vanechkina (1973) noted almost half a century ago how almost all of those he spoke to specially emphasized the associative, metaphoric nature of “colour hearing” in music and excluded from the artistic sphere any clinical cases of colour hearing.
In the 18th century, Castel confidently predicted that every home in Paris would one day have an Ocular Harpsichord for recreation purposes (Moritz, 1997). Meanwhile, according to Marks (1975, p. 313): “By the late nineteenth century and early twentieth century, interest in colour organs had blossomed into what might almost be called an epidemic (e.g., Sullivan, 1914).” Nevertheless, the great enthusiasm that colour music once aroused rapidly faded, presumably because no agreed perceptual match exists – just consider the wide variety of potential mappings that have been put forward over the years.
Synaesthesia was once a popular source of inspiration for those interested in colour music. However, other accounts based on natural, physical relations (the latter based on the physical properties of the stimuli themselves and the idiosyncratic nature of the perceptual continua), or analogous mapping in Rimington’s matching on the basis of complexity have also, on occasion been proposed. And yet it is noticeable that the common explanation, mentioned, or supported, by so much of the research, both scientific and artistic in origin, is the affective correspondences resulting from shared affect, mood, or emotion of the component stimuli. That said, there would appear to be a growing realization of the potential relevance of crossmodal correspondences to the study of music (Eitan, 2017; Timmers, 2022; Walker, 2016), or as an additional means of illustrating the ‘meaning’ of music for music students (Donnell-Kotrozo, 1978; Wells, 1980; see Bresin, 2005, for an attempt to illustrate the expressivity in music performance by means of the use of colour to link to specific emotions; cf. Clarke & Costall, 2008; Kandinsky, 1977).
Theoretical Accounts of the Structural Similarity of Musical Tones and the Colour spectrum
A complex and convoluted debate has been raging in journals such as Leonardo and Colour Research and Application (Caivano, 1994; Davis, 1979; Garner, 1978; Pridmore, 1992; Wells, 1980) concerning theoretical justifications for the appropriateness of particular structural mappings between hue and pitch. It is noticeable how those commentators contributing to this debate have come from a wide range of disciplinary backgrounds including experimental psychology (Garner, 1978), architecture (Sebba, 1991), art (Davis, 1979), and music education (Wells, 1980; see also Kandinsky, 1977, p. 25; and O'Callaghan, 2008, for philosophical interest in the relation between sight and sound). For instance, in a brief piece published in Leonardo, the psychologist Wendel Garner provided a particular logical foundation for correlating sound and light frequencies according to a common octave principle. However, in response, other commentators (such as Davis, 1979, a visual artist by training) subsequently pointed to the fact that there are, in fact, numerous possible correspondences between colour and music, and hence that there is no need to limit oneself to a literal structural translation of one medium to another.
As Wells (1980, p. 106) notes: “In them one would not deal with individual tones in a chord structure but would consider only the root tone as providing the chord’s colour. There is an analogy here with the harmonics of a fundamental that are not heard individually but blend together to reinforce and give character to the sound of the fundamental.” According to Wells (1980, p. 106): “I sought neither to expand nor to limit what appeared evident to me—that there was a correlation between colour and music based on the principle of complementarity.” In particular, Wells (1980, p. 101): “points out that an equal division of the musical octave into 12 half-steps permits one to recognize chords built on tones occurring at the interval of half octave or the tritone interval as being complementary to each other. This corresponds to the equal spacing of 12 hues on a colour circle in which complementary hues are located diametrically opposite each other. A circular form of the musical octave divided according to the chromatic scale of 12 half-steps places tones serving as roots for complementary chords diametrically opposite each other also.” (see Figure 7). Such a neat arrangement would not be possible with the 7-step diatonic scale that Newton was probably more familiar with.

Top: colour circle. Bottom: Chromatic musical scale circle, according to Wells’ (1980) complementary account. [Reprinted from Wells (1980, Figure 1), with permission.].
Pridmore (1992) drew attention to the limited range of the crossmodal mappings proposed by Sebba (1991). While not wishing to criticize the appropriateness of the 13 mappings established by means of analysis of art students’ paintings made in response to specific pieces of music, Pridmore highlights the fact that there are something like 100 semitone intervals from around 30 to 15,000 Hz, what he calls the gamut of semitones (imitating Piesse’s, 1867, ‘Gamut of Odours’), while Sebba’s mappings only covered 10-13 semitones. “Tone comprises some nine cycles (octaves), each repeating the same 12 semitones, whereas hue is limited to only one cycle of hues.” (Pridmore, 1992, p. 57). Pridmore (1992, p. 57) also points to the observation that: “In psychophysical theory, any psychological correlation between music and colour must derive primarily from the physical stimuli, which, as sonic or radiant energy, have only two variables: (a) amplitude, causing loudness or brightness/lightness: and (b) wavelength, causing musical tone or hue.”
Pridmore’s (1992) interest here is more technical, given his desire to develop a transducer capable of converting music into colour for the deaf. Indeed, he states that he intends “to support the technical or scientific approach as the only possible way of finding a constant and objective correlation (rather than arbitrary and subjective).” (Pridmore, 1992, p. 57). Pridmore (1992) describes three electronic sound-to-light transducers in order to better convey colour to give visual impression of music to students and deaf people. In the Introduction to his piece, Pridmore also writes that: “Correlation of tone and hue is also indicated by the cyclic nature of each, as octave cycle and hue cycle (or colour wheel). No other psychological dimension of sound or colour is cyclic, so no other correlation than tone and hue is possible, at least technically.” (Pridmore, 1992, p. 57). Colour mappings based on the cyclic repetition of the octave seem to reflect more adequately the fundamental acoustic property for which the octave of sounds with fundamental frequency at f can be expressed as 2f. Since octaves are labelled in the same way (e.g., C3, C4), the same tone is always associated with the same hue, e.g., C with Cyan, independent of its absolute frequency (e.g., C3, C4…; see e.g., Pridmore, 1992). Such an assumption might be the basis for experimental designs such as Simpson et al.’s (1956), mentioned earlier, in which the majority of the auditory stimuli were multiples of the same frequency. Notice here how in reference to our earlier discussion of the putative constraints on crossmodal mapping attributed to the nature of the underlying perceptual continua, we have an argument based on the nature of the underlying physical stimuli instead (see Figure 8). However, such mappings apparently do not differentiate between the same hue when it is paired to the same note at different frequencies (e.g., Cyan paired to C3 or C4).

Layout of Pridmore’s (1984) final display panel. A given tone (e.g., C#) in all octaves is represented by a constant hue (e.g., cyan). Each octave is represented by a cycle, and each semitone (and its hue) by a sector (as presented in Pridmore, 1992).
Hereinafter, we provide a list of issues regarding colour-sound correspondences that apparently limits the possibility of explaining such correspondences (or analogies; see Jewanski, 2010) by assuming any physical similarities between the two sensory stimuli.
The visible colour spectrum ranges between 380 nm and 740 nm (violet and red, respectively) and from 405 THz and 790 THz (red and violet, respectively). Musical sounds range from about 20 Hz to 20 kHz. Besides the huge difference in terms of the amplitude (or wavelength) of the two spectra, a relevant structural difference might rely in the non-linear distribution of tones along the frequency spectrum, in which the distance between two successive steps varies with frequency. For example, the interval C-D is about 4 Hz at 30 Hz increasing (more than 100 times) up to 500 Hz at about 4000 Hz. By contrast, in the colour spectrum, the distance (expressed in THz) between two successive hues remains constant between 30 and 80 THz (e.g., red-orange vs orange-yellow, about 30 THz; yellow-green vs green-blue, about 60 THz) (see Caivano, 1994, Figure 2).
Once a tuning system is chosen (e.g., Pythagorean, equal temperament…), musical tones correspond exactly to only one specific frequency in Hz. By contrast, in the case of colours, for example, 405 THz is (perceived as) red, but so too are 410 THz and 420 THz. And if one is exposed to the above colour’s patches at the same time, s/he will likely recognize three different hues of the same colour, namely red. Therefore, the sound continuum is fragmented into discrete points (i.e., tones), while the continuum of colour is fragmented into regions that might also overlap with one another (i.e., hues). One might object that a tone at 445 Hz is perceived as close to A (440 Hz), and this is indeed likely to be the case. Nevertheless, the point here is that the note is no longer A in terms of its frequency, while the hue seen at 405, 410, 420 THz still counts as red.
Octave periodicity, i.e., the fact that pitches with f, 2f, 3f, 4f…, are perceived as the same tone at different frequencies, is a fundamental property of auditory perception. No similar property seems to characterize colour perception. Thus, systems such as those proposed by Pridmore (1992) that conceived the colour wheel as simply repeating itself at every “octave” (this, also a feature of various online tools, e.g., see: https://www.flutopedia.com/sound_color.htm) ignore octave cyclicity (that is not mere repetition) and therefore would not be based on the structural/physical properties of the two spectra.
As discussed earlier, the mapping based on the correspondence between pitch and hue could not reflect structural similarities concerning the way in which stimuli are perceived in the two modalities. Other mappings based on pitch, such as pitch-saturation, as well as mappings based on other properties of sounds, such as loudness and timbre have been less investigated in the empirical literature (e.g., see Anikin & Johansson, 2019; Hamilton-Fletcher et al., 2017; and see Table 3, for a summary). However, when pitch is kept constant, differences in loudness are consistently associated to differences in luminosity (see Caivano, 1994). For example, Stevens and Guirao (1963) had their participants to use the length of a line as a variable to represent loudness and luminosity. Their results revealed that greater line length was used to indicate both greater loudness and luminosity. Other associations might be based on size of the colour patch and duration of sounds, but these seem to reflect more the similarities with respect to the temporal and spatial magnitude of sounds and colours, respectively (what some might be tempted to call amodal).
Based on the above considerations, it seems hard to explain colour-sound correspondences based on structural similarities between the component stimuli (cf. Helmholtz, 1867). Thus, it seems that two further possibilities remain. One is that stimuli are matched based on their relative position along the continuum in each sensory modality. Such an explanation (would) fit(s) with results from studies that used isolated tones, which listeners can perceive as higher or lower (such dimension might parallel, as some suggested, brightness for colours). However, when considering complex musical stimuli (e.g., Cutietta & Haggerty, 1987; see Spence, 2020a, for a review), pitch is a much more blurred perceptual correlate of sound. In such cases, probably, one needs to put forward an alternative explanation, according to which the mapping can be conceived of in terms of emotional similarities (or mediation), or even ‘inner harmony’, to use Kandinsky’s (1977) preferred terminology. Although such explanation leaves open the question about the nature of such emotional/inner harmonies, it suggests avoiding simplistic conception of mapping based on elementary properties of visual and auditory stimuli, such as brightness and pitch that seem unable to explain consistent mapping, if any.
While the popularity of colour music has undoubtedly declined, since its peak in the years around 1900, the last decade or so has seen something of a resurgence of interest in the topic of colour-sound correspondences amongst those researchers interested in the development of more intuitive sensory substitution systems (cf. Hamilton-Fletcher et al., 2016; Marks, 1983; Pridmore, 1992). In one recent study, for instance, Cho et al. (2020) investigated whether sound can be used to code colour in order to improve artwork appreciation by those individuals with visual impairments (see also Cavazos Quero et al., 2021). The question immediately crops up though as to whether certain dimension(s) of sound should be used to convey the different specific salient properties of visual images, such as colours. Is the choice an arbitrary one? Or can such systems be made more intuitive? Interestingly, in their recent research, Cavazos Quero and colleagues chose to combine sound with scent, thus presumably adding to the complexity of the process of sensory translation.
Conclusions
As hinted at by the findings of experimental psychologists, it is by no means clear that any structural similarity between the organization of colour and pitch is relevant to promoting either perceptual or affective similarity. At best, it might offer the basis for a more cognitive alignment of analogous dimensions (Kandinsky, 1977). Think of this as a reflection on one of the directions in which research moved, that is to the mood associated with more complex (albeit short) musical expressions (Bresin, 2005; Karwoski & Odbert, 1938), complex correspondences between music and painting (see Spence, 2020a, for a review).
At the same time, however, it should also be acknowledged that there may be crossmodal correspondences between visual and auditory arts based on cross-media artistic (e.g., historical) styles (Actis-Grasso et al., 2017; Duthie, 2013; Duthie & Duthie, 2015; Hasenfus et al., 1983). Ultimately, however, it would appear clear that no matter whether one is considering the basic correspondence between colour and pitch, the complex crossmodal correspondence between music selections and visual art works (see also Adams, 1995; Albertazzi et al., 2015; Albertazzi et al., 2020), or some combination of the two (e.g., colours to match music compositions, or less frequently-mentioned, musical note to match a painting) the emotion associated with the auditory and visual stimuli would appear to be the unifying link. Acknowledging the importance of such affective correspondences leaves open the question of whether there is any directly perceptible similarity, as Marks (2011) has claimed.
The Emotional Mediation Account(s)
In conclusion, what started out as intuition based on analogous mappings led to a search for exact natural, physical mappings between colour and musical notes. However, the evidence revealed a visual brightness-pitch mapping and, more recently, seemingly robust hue-timbre associations too. At the same time, however, while some have been convinced of the existence of a natural, physical, mapping perhaps based on perceived similarity between matching stimuli in the respective perceptual continua, it is striking how many of those working in this area over recent centuries have ended-up arguing for emotional mediation, sometimes referred to in terms of affect, mood, or what Kandinsky (1977) referred to as ‘inner harmony’, instead. In this regard, the crossmodal correspondence between simple stimuli bears close relation to the search for crossmodal correspondences amongst more complex visual stimuli, such as paintings and pieces of music, documented elsewhere (see Spence, 2020a, for a review).
Indeed, in recent years, there has undoubtedly been growing interest amongst psychologists and practitioners in emotional correspondences between music and colour (or paintings), or emotionally-mediated (or affective) correspondences, as they are sometimes known (Spence, 2020a). The interest in the emotional mediation of colour-to-music mappings is undoubtedly one that provides one additional means of translating between sensory impressions (though perhaps is not as rigorous and objective as some commentators would have liked). Looking back, it is interesting to note how Simpson et al. (1956, p. 100) already hinted at the emotional mediation account as an “indirect” association: “And lastly, before direct associations between hue and pitch could be postulated, it would first be necessary to eliminate the possibility of indirect associations based upon parallel relationships to a common mediating variable such as mood.” The primacy of the emotional mediation account also fits with Werner’s (1934) view of perceptual experiences in infancy and early childhood as largely syncretic – that is, functionally undifferentiated and as physiognomic – that is, imbued with expression and affective properties.
And, in terms of the question with which we started, why should it be that the musical sound pitch/timbre-hue/brightness mappings have attracted so much interest over the centuries, given the many other crossmodal correspondences that are now known to exist (see Spence, 2011), it would seem that there have been multiple different drivers helping to sustain the interest in the topic at different points in time. This interest can be seen as originating from the possibility of structural analogies, from the possibility of perceptual analogies, as well as with artistic interest, technological developments (colour organs previously, and sensory substitution devices more recently), and the florid concurrents reported by coloured-music synaesthetes. The fact that we are all such visually-dominant creatures (Hutmacher, 2019), and that colour appears to play a particularly important part in driving human behaviour (Elliott et al., 2015), may also be relevant here. Ultimately, though, the fact that it is affect, mood, or emotion that appears to explain the particular affinity people experience between musical sounds and colours, no matter whether the stimuli are simple or complex (e.g., paintings and musical composition; see Spence, 2020a), means that pitch/timbre-hue correspondences are, in no way, intrinsically special.
To conclude, it is perhaps worth returning to the opening sentences of the article written by Sabaneev and Pring (1929, p. 266) almost a century ago to consider how far (or not) we have come: “THERE was a time when it would have been regarded as absurd, or at all events as a symptom of decadence, to raise the question of a correspondence between sound and colour. And yet amongst musicians this is by no means a new problem. We know that a very long time ago there were persons, very often musicians, to whom sounds presented themselves as coloured, so to speak. Resonance evoked a colour association, and this was not a fortuitous happening but was repeated with the invariability of a law. Later on, of course, this phenomenon or, if you prefer it, this faculty, attracted a certain amount of attention amongst scholars, and the name of synopsy or colour-ear was bestowed upon it. So far it appears to be an open question as to whether this connection is organic, conditioned by certain causes of a physiological character, certain proximities or contacts of the optic and auditory nervous ramifications; or whether it is merely associative, a kind of conditional reflex, and therefore may vary with different persons, maintaining nevertheless within the given individual a certain stability.”
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Arts and Humanities Research Council, (grant number AH/L007053/1).
