Abstract
Historically, debates over relationships between spoken lexical form and meaning have been dominated by views of arbitrariness. However more recent research revealed a different perspective, in which non-arbitrary mappings play an important role in the makeup of a lexicon. It is now clear that phoneme-sound symbolism - along with other types of form-to-meaning mappings - contributes to non-arbitrariness (iconicity) of spoken words, which is present in many forms and degrees in different languages. Attempts have been made to provide a mechanistic explanation of the phenomenon, and these theories largely centre around cross-modal correspondences. We build on these views to explore iconicity within the evolutionary context and the neurobiological framework for human language processing. We argue that the multimodal bihemsipheric communicative system, to which iconicity is integral, has important phylogenetic and ontogenetic advantages, facilitating language learning, comprehension, and processing. Despite its numerous advantages however, iconicity must compete with arbitrariness, forcing language systems to balance the competing needs of perceptual grounding of the linguistic form and ensuring an effective signal. We conclude that, on balance, iconicity should be viewed as integral to language, and not merely a marginal phenomenon.
Keywords
Introduction
The nature of the relationship between word forms and their meanings has long been a prominent question. Do words sound like what they describe, or is this relationship arbitrary? Modern linguistic theory has largely adopted the view that the phonological form of the lexical sign bears no resemblance to its meaning; the connection is arbitrary. This view is often attributed to Saussure (1959) who argued “Because the sign is arbitrary, it follows no law other than that of tradition, and because it is based on tradition, it is arbitrary”. However, the idea of arbitrariness was established much earlier by Locke (1690), who proposed that the existence of multiple languages is sufficient evidence to disprove a so-called ‘natural’ connection: if the properties of the concept or object that the word refers to are sufficient to determine a phonological form, then there should only be one language. The notion of arbitrariness provides a robust explanation for the origin of most words; for example, there is nothing about the phonological form of the word ‘pen’ to suggest that it should denote ‘an instrument for writing or drawing with ink’. It has simply been established that there is a conventionalised system of symbols shared by communities of users and passed from generation to generation - “there is no reason for you to call a dog ‘dog’ rather than ‘cat’ except for the fact that everyone else is doing it” (Pinker & Bloom, 1990, p. 728). Notable exceptions to the conventional approach, such as onomatopoeia, have been regarded as marginal phenomena within an arbitrary linguistic system, with Saussure (1959) arguing that “such words are never organic elements” of language. In addition, although onomatopoeic words seem to represent the sounds they symbolise, they too show evidence of arbitrariness. For example, the English expression for the sound of a rooster (
However, in looking for an accurate account of non-arbitrariness in language, a broad-cross linguistic perspective is required, and more recent work has looked beyond Indo-European languages as well as at both signed and spoken languages (Perlman et al., 2018). This research has suggested a more textured view that non-arbitrariness may have a more important role in language than previously thought, and may in fact complement arbitrariness by conveying different linguistic advantages.
Defining Forms of Non-Arbitrariness
When considering forms of non-arbitrariness in language, there has been much confusion over terminology despite attempts to clarify these uncertainties (e.g., Lockwood & Dingemanse, 2015; Sidhu, 2019). The term
It has been argued that iconicity is not a binary property and may be present in different degrees, with a distinction between ‘absolute’ (imagic) and ‘relative’ (diagrammatic) iconicity (Dingemanse et al., 2015; Peirce, 1974); and ‘direct’ and ‘indirect’ iconicity (Masuda, 2007). Absolute iconicity is where there is a one-to-one mapping between phonological form and meaning. This may be ‘direct’ as in onomatopoeia, for example the sound of water leaking ‘
In relative iconicity, (what Cuskley and Kirby (2013) refer to as
This paper will first briefly review evidence for some of the different forms of sound symbolism in human language, and build on existing work (Dingemanse et al., 2015; Sidhu, 2019; Sidhu & Pexman, 2018; Svantesson, 2017) to explore how these form-meaning relationships might arise. We will then turn our attention to the possible role of sound symbolism from an ontogenetic and evolutionary perspective. Finally, we incorporate evidence for iconicity in non-human communication within a neurobiological framework for human language processing to suggest that sound-symbolism, in addition to arbitrariness, should be viewed as integral to language, rather than a marginal linguistic phenomenon.
Sound Symbolism
Spoken languages consist of an inventory of different sounds, or phonemes, which can be combined in different ways to form words. These words are then attributed to meanings, leading to the fundamental questions about the nature of this form-meaning relationship raised above. Primarily, as a foundation for spoken language, do phonemes themselves convey inherent qualities which lead to association with certain meanings, or are they semantically neutral in nature? The notion that phonemes may carry such associations underlies the phenomenon of sound symbolism. There are different ways in which phonemes may evoke semantic properties, examples of which are presented below.
Sound-Size Symbolism
One of the best-known examples of sound symbolism is the idea that different vowels may communicate the semantic property of either small or large size. Almost a century ago it was first reported that a closed-front vowel /i/ may more naturally denote small size over an open-front vowel /a/ (Sapir, 1929). Sapir presented participants with short nonwords such as (a) The use of vowel variation to indicate size differences in Khmu ideophones (Svantesson, 2017). (b) The takete/maluma or bouba/kiki effect - when prompted most subjects call the angular shape (left) 
There is also evidence to suggest that it is the number of ‘small’ or ‘large’ vowel sounds in a word, and not vowel alone, that determines the conveyed magnitude (Thompson & Estes, 2011). In this study, English-speaking participants were presented with novel figures of varying size and asked to match them to nonwords containing either ‘large’ (
Sound-Shape Symbolism
Another well-documented example of sound symbolism is the
While the
Debate also surrounds the question of which phonemes drive the effect. Some authors emphasise the role of vowels, particularly the close back rounded vowel /u/ associated with rounded shapes (Maurer et al., 2006; Ramachandran & Hubbard, 2001), while others highlight the contribution of consonants, noting that the harsh stop consonant /k/ contrasts with the softer bilabial /b/ (Nielsen & Rendall, 2011; Westbury, 2005). Evidence suggests that both play a role. For example, Nielsen and Rendall (2013) demonstrated that sound-shape symbolism is influenced by multiple phoneme categories, and Westbury et al. (2018) linked specific phonemes (e.g., /oƱ/ (as in
Other Phoneme-Feature Associations
Although much research on sensory sound symbolism has focused on the
Sound symbolic relationships have also been noted for features such as taste (Gallace et al., 2011; Simner et al., 2010), colour (Johansson et al., 2020), and perception of dominant or submissive body postures (Auracher, 2017), suggesting that sound symbolism in language extends far beyond the
Mechanisms of Sound Symbolism
The literature on the mechanisms of sound symbolism largely centres around cross-modal correspondences, defined as ‘a compatibility between attributes or dimensions of a stimulus (i.e., an object or event) in different sensory modalities’ (Spence, 2011, p. 3). The mechanisms of sound symbolism have been comprehensively reviewed by Sidhu and Pexman (2018), who evaluated five proposals, along with their supporting evidence. We summarise these briefly below and discuss additional evidence for each mechanism where appropriate.
According to Sidhu and Pexman (2018) one way that sound-symbolic association may be explained is through
As noted by Sidhu and Pexman (2018), the statistical co-occurrences explanation relies upon experience, and therefore assumes that there is no innate human sensitivity to sound symbolism. Therefore, according to this explanation alone, it might be expected that sound-symbolic effects are not present at birth or in congenitally deaf individuals, yet sound symbolism effects have been observed in 4 month old infants (Ozturk et al., 2013; Peña et al., 2011) and although the effect is diminished in individuals with prelingual auditory deprivation, it is still present at above-chance levels (Gold & Segal, 2020). It is quite possible that statistical learning begins from birth, however it seems plausible that humans may also have an innate sensitivity to sound symbolism. A related view therefore argues that statistical co-occurrences underpin sound symbolism, but that these associations are
Another potential mechanism that also relies on statistical co-occurrences - though this time occurring within language itself - is what Sidhu and Pexman (2018) label
Sound-symbolic associations have also been explained via the broad mechanism of
Sound Symbolism in Language Acquisition and Language Evolution
Having provided an overview of the key forms of sound symbolism, and the proposals of their mechanisms discussed in the literature, we now turn to exploring the possible roles of sound symbolism and iconicity in both language acquisition and in the context of language evolution. We also use this evidence to introduce a novel framework for understanding how iconicity might fit into a neurobiological model of human language processing.
Sound Symbolism in Language Acquisition
The advantages of sound symbolism in language acquisition have been well documented. It has been found that words learned earlier tend to be more iconic, suggesting that sound symbolism may support word learning (Perry et al., 2015). A study by Imai et al. (2008) suggested that iconicity facilitates early verb learning, and experiments in adults have shown that sound symbolism in natural language may aid word learning too (Lockwood et al., 2016). Sound symbolism may facilitate word learning via ‘grounding’, whereby resemblance-based form-meaning relationships allow for shared understanding and the establishment of communication (Cuskley & Sommer, forthcoming). For example, if iconic mappings are shared (e.g., /i/ and smallness), when combined with other multimodal inputs (e.g., pointing) this may facilitate shared attention and generation of a form-meaning relationship. A shared understanding of intuitive form-meaning associations may thus facilitate the formation of a lexicon – in other words sound symbolism may act as a scaffold for mapping semantic information, thus bootstrapping word learning (Imai & Kita, 2014; Monaghan et al., 2014). Thus, sound symbolism may aid bootstrapping via establishing ‘referentiality’ (an ability to map linguistic form to meaning) – a process fundamental to language learning, in combination with Hebbian learning and joint attentional processes (Perniss & Vigliocco, 2014). In a recent review, Nielsen and Dingemanse (2021) found strong evidence for the role of sound symbolism in ‘local learning enhancement’ - where resemblance-based associations of certain lexical items influence the learning of those items - but not for ‘general learning enhancement’, where they influence the later learning of arbitrary items. In addition, it has been suggested that iconicity aids the comprehension of communicative signs (Perniss & Vigliocco, 2014) and that the imitative, performative nature of iconic words makes communication more vivid (Lockwood & Dingemanse, 2015). It has however also been noted that more research is required on these effects of sound symbolism, as there may be cross-linguistic variation in learning benefits.
Iconicity in Language Evolution
Whilst the role of sound symbolism in language acquisition has been well established, its role in the evolution of language is a matter of continuing debate. The beginning of use of words as a communicative means was arguably a pivotal point in human evolution, with small articulations permitting rapid and efficient encoding of information, as well as wide and distant broadcast and communication without sight. However, given the relatively short period of recorded human history, a cross-species comparison is necessary to answer questions relating to language evolution, and the role of iconicity in this context.
The core platform for language is face-to-face communication, as this is how languages are learnt and most used. Manual gesture is seen as a likely evolutionary precursor to vocal communication (Rizzolatti & Arbib, 1998), and iconicity has been considered as a logical entry into the language system (Armstrong, 1983), perhaps acting as a bridge between manual (gesture) communicative systems and the verbal language we use today. For example, Levinson and Holler (2014) propose an evolutionary stratification of human communicative systems, whereby the different layers (e.g., joint attention, iconic gesture, turn-taking, and vocalisations) vary in antiquity. Declarative pointing is a form of signalling which facilitates mutual gaze to objects and thus allows for joint attention, while iconic gestures allow the effective depiction of motion, size and spatial relations between objects, such as an offering of something presented with the hand or other body parts (Liebal et al., 2006) or a request made with an open hand (Pollick & De Waal, 2007). The repeated use of iconic gestures would have arguably facilitated the grounding and memorisation of representations, leading to conventionalisation and hence increasing levels of abstraction (Garrod et al., 2007). In this way, iconicity may have been important for achieving
While the hypothesis above argues for gesture as a precursor to language in the vocal domain (e.g., Rizzolatti & Arbib, 1998), Perniss et al. (2010) reject this claim, proposing instead that language in manual and vocal modalities must have co-evolved, as linguistic and imagistic components are tightly integrated. This embodied link between linguistic form and sensorimotor experience is argued to reduce cognitive ability needed to unite signs and referents, with linguistic form activating the same systems used in perception and action. The finding that ideophones are more common in narrative contexts and occur alongside iconic gesture (Dingemanse, 2013) is taken as evidence for embodiment in language, with iconicity and gesture contributing to a multimodal act of depiction. According to this account, the innateness of embodiment is shown by close connections between the hand and mouth in the somatotopic organisation of the human motor cortex (Meier et al., 2008) and in congenitally blind individuals who gesture while speaking despite never having observed this (Iverson & Goldin-Meadow, 2001).
There is also some evidence that non-human primates may utilise the processing advantages of embodiment through iconic gestural communication. Studies have demonstrated that orangutans and chimpanzees can use iconic gestures (pantomime) to represent objects and mimic actions related to their use. Notably, these apes often elaborate on gestures that fail to elicit the desired response from their recipient, showcasing flexibility in their communication. This behaviour has been observed in both great apes raised in captivity (e.g., Miles et al., 1996; Tanner et al., 2006), and in forest-living rehabilitant orangutans (Russon & Andrews, 2011). However, the evidence for the use of iconic gestures in apes remains contested. Some researchers argue that non-human primates do not produce truly iconic gestures, as there is no clear requirement for the recipient to infer a resemblance between the gesture and its intended meaning (see Byrne et al., 2017; Tomasello & Call, 2019). Adding nuance to this debate, Perlman et al. (2012) point to a continuity between gesture and instrumental action in apes, suggesting that gestures are influenced by immediate physical and social contexts and can be adapted into spontaneous iconic gestures on-the-spot.
Of course, the great difficulty when examining the relationship between form and meaning from an evolutionary perspective is that it is highly contested whether animal signals can be said to have meaning (i.e., whether non-humans cognitively represent referents; Moore, 2014; Rendall et al., 2009; Scott-Phillips, 2015). This has led some authors to develop alternative cross-species approaches to examining arbitrariness and iconicity, such as Watson et al.’s (2022) ‘optionality’ framework, which highlights the presence of linguistic features of arbitrariness in non-human communication. Others (Fischer & Price, 2017) argue that non-human primates do not express communicative or informative intent, and so non-human communication (vocalisations and manual gesture) should be conceived as goal-directed behaviour only, with nothing more than a probabilistic causal link between the signifier and signified. For example, a chimpanzee may wave an arm in a ‘beckoning’ gesture to another simply because this produces the desired response in the recipient. Warren and Call (2022) however argue that non-human primates may be capable of applying social inferences to a communicative act, and that there may be mentalistic processes which underlie outcomes. Their model of ‘inferential communication’ argues that visual perspective taking and knowledge attribution are required for communicative exchange, helping to bridge the gap between animal and human communication.
Yet another useful approach to analysing the role of iconicity in language evolution is to consider behavioural imitation of others. The human ability to imitate the actions of others is fundamental to turn-taking, which is one of the critical foundations of language evolution (Levinson & Holler, 2014). Imitation is supported by mirror neurons, which code for manual goal-directed movement and fire both when an individual performs a manual task as well as when watching another individual perform the same task. They allow recognition of another’s action, as the same neural activation necessary to produce an action is generated via observation. This is important for the development of mutual understanding and an ability to share meaning. Iconicity and the mirror neuron system may therefore contribute to the emergence of expressions, as is seen in signed languages where new signs emerge from iconic gesture (Ahlner & Zlatev, 2010). In non-human primates, mirror neurons have been found in the ventral premotor cortex – comparable to the location of the mirror neuron system in humans - and are argued to have provided a bridge between iconic gesture and the imitation of behaviour (Rizzolatti & Arbib, 1998; see also Ramachandran & Hubbard, 2001). There is also evidence that chimpanzees and other great apes show imitative abilities beyond those of other animals (Bates & Byrne, 2010), which challenges the long-held view that humans are ‘imitators’ but non-human primates are ‘emulators’ (e.g., Tennie et al., 2006; Tomasello, 1996; Tomasello et al., 1987). It is argued that similarities in cultural transmission across species indicate our common ancestor was capable of imitating actions with sufficient fidelity to transmit culturally variant behaviours within and between communities (Whiten et al., 2009).
Pulling all this evidence together therefore suggests that speech may have evolved from proto-dialogue between individuals based on iconic hand gestures and imitation, supported by mutual action recognition facilitated by the mirror-neuron system (Rizzolatti & Arbib, 1998). While sound-symbolism in contemporary language may be perceived as a vestige of the iconic protolanguage (Kita et al., 2010), its manifold roles and advantages in the context of language evolution arguably imply a more integral role within the human language system.
Iconicity Within the Neurobiological Framework for Human Language
According to one prominent model, the Dual Neurobiological Systems Hypothesis (DNS, Marslen-Wilson & Tyler, 2007), modern human communicative capabilities are controlled by joint activation of bihemispheric (BH) and left-lateralised neural systems, which interact but are functionally and evolutionarily distinguishable (Marslen-Wilson & Bozic, 2018). Broadly speaking, the left-lateralised system encompasses a network of left-hemisphere frontal and temporal regions surrounding the Sylvian fissure and is unique to humans and responsible for supporting complex syntactic functions. The bihemispheric system involves a broad network of fronto-temporal regions in both hemispheres. It underpins social communication, through the processing of sound-to-meaning mapping, pragmatics, linear adjacency and multimodal interpretation (Bozic et al., 2010, 2015; Marslen-Wilson & Bozic, 2018) – with its capacity for interpretation of multimodal social communication cues particularly relevant for the current context. The bihemispheric system is argued to be evolutionarily primary, with the evidence showing that this system and its functions are highly conserved in non-human primates (Ghazanfar et al., 2008; Seyfarth & Cheney, 2017; Wilson et al., 2015). As such, the bihemispheric system can be considered a promising analogue to the one present in humans at the early stages of language evolution. Given that most forms of iconicity are underpinned by ‘linguistic cross-modal correspondence’ (Cuskley & Sommer, forthcoming; Sidhu & Pexman, 2018) - excluding only the most direct form-meaning associations such as onomatopoeia that remain within one sensory modality - and the presence of cross-modal associations has also been noted in non-human primates (Ludwig et al., 2011), it is possible that iconicity may have played a role in bridging the gap between the gestural or imitative visual signs used in early communication and the lexicalised concepts that emerged later. According to this view, cross-modal transfer may have developed into a more sophisticated cross-modal cognitive suite that provided humans with the neurological and behavioural architecture capable of storing and increasing cross-modal representations, thus affording us the ability to learn arbitrary symbols necessary for the modern linguistic capacity (Cuskley & Sommer, forthcoming). Therefore, iconicity may in part explain the evolutionary gap between the primate and the modern bihemispheric systems, ultimately setting a precedent for advancing communicative abilities that led to the evolution of the human linguistic capability – with the bihemispheric system likely heavily involved in the processing of iconic sounds, in line with evidence for activation in the bilateral superior temporal sulci (STS) in response to sound symbolic words (Kanero et al., 2014). Taking this evidence into account, we argue that placing iconicity within the neurobiological framework of the DNS hypothesis, and the bihemispheric system in particular, could help strengthen the argument that iconicity played a prominent role in early human language evolution.
Factors Limiting the Prevalence of Iconicity
If iconicity provides so many advantages, and is believed to be a stepping stone in the evolution of language, why is language still predominantly arbitrary? From an evolutionary perspective, features of language that improve processing and learnability should survive and become more common. Indeed, Jesperson (1922) found evidence of /i/ replacing other vowels in the historical development of words with meanings related to small size. This may have occurred as sound symbolic associations of /i/ made words containing this phoneme and describing small objects more favourable, and more likely to survive in language evolution – leading to the claim that languages should become ‘richer and richer in symbolic words’.
There are however factors that limit the type of meanings that can be expressed iconically. Dingemanse et al. (2015) and Sidhu (2019) review several such factors. Firstly, iconic words must have a fairly unique meaning, as similar meanings beg similar forms, and so iconicity may lead to ambiguity unless the referent is very distinct. A wholly iconic language would be dominated by words with similar forms and meanings, leading to uncertainties and deficiencies in processing and learnability. Secondly, the amount of sensory information contained within a referent may limit whether or not it can be described via an iconic relationship. Iconic mappings require sensory features, which restricts their use in language systems. Lupyan and Winter (2018) expand on this further, arguing that iconicity is limited in the expression of abstract concepts in particular, as iconic expressions are too strongly linked to sensory features or certain contexts. For example, certain metaphorical extensions found in English and Hebrew are not possible in Israeli Sign Language (ISL) and American Sign Language as the signs are too iconic (Meir, 2010). One example given is that while English allows for the verb “eat” to extend metaphorically (e.g., “The acid ate the iron key”), ISL’s sign for “eat” is too specific, depicting a human eating action at the mouth, which restricts such extensions. Meir discusses how this limitation is also seen in spoken languages where iconic sound-related words are less likely to be used metaphorically across sensory dimensions in ‘synaesthetic metaphors’. These examples demonstrate how iconicity may limit the meaning of an expression to a particular context, and thus its potential to become more abstract.
Iconicity in language may also be influenced by linguistic laws such as Zipf’s law of abbreviation, which predicts a negative relationship between word length and frequency of use (i.e., words used more frequently tend to be shorter and vice versa). Zipf’s law, alongside Menzerath’s law (which states that longer communicative constructs are composed of shorter parts) captures the information theoretic principle of minimising code length (compression). Both laws are ubiquitous in communicative systems, having been observed in both manual gesture and vocal communications of non-human primates as well as humans (Heesen et al., 2019; Huang et al., 2020). These laws may decrease iconicity in language - as frequency of use increases, the need for efficiency prevails over iconicity, resulting in ‘lexical elaboration’ with increasing levels of arbitrariness (Haiman, 1985).
Iconic words, such as ideophones, also seem to resist deep integration into the grammatical apparatus of a language. According to Dingemanse (2017), this reflects conflict between grammatical integration and the strength of iconic form-meaning mappings – if fully integrated, iconic expressions are subjected to processes of reduction and regularisation, so iconicity is reduced. This syntactic conventionalisation diminishes the role of iconicity and could perhaps be further evidence for the processing of iconic signals within the bihemispheric system, as it is the specialised left-hemisphere frontotemporal system that provides the principal neurobiological substrate of core, distinctly human-specific, grammatical capacity, (Bozic et al., 2010, 2015) and iconicity seems to resist grammatical integration.
The prevalence of iconicity in language is also limited by the advantages of arbitrariness. According to (Hockett, 1963), the flexibility of signalling afforded by arbitrariness is a fundamental property of language. Arbitrariness allows referral to any possible concept, and unlike iconicity it can allow for easy discrimination between similar entries in a lexicon. This allows for larger lexica to develop (Gasser, 2004) and allows communication about concepts for which direct perceptual grounding is not possible. A highly iconic lexicon with similar phonological forms for similar meanings would lead to high confusability, and decreased communicative effectiveness (Perniss et al., 2010).
Conclusions
In summary, the ancient debate over whether the linguistic sign is arbitrary has been clouded by unnecessary polarisation. A resemblance between form and meaning cannot be sufficient for understanding meaning, and conventionalisation of language throughout evolution is inevitable and necessary to maintain the efficiency and versatility of language. Hence while much of the historic debate until recent years has been binary, words cannot be entirely arbitrary or iconic – they fall on a spectrum instead. Iconicity and arbitrariness both convey their own unique advantages in a linguistic system, representing the adaptation of languages to the constraints of needing to link linguistic form with human experience whilst ensuring an effective signal. It is these constraints that drive the evolution and development of linguistic systems, adding further weight to the hypothesis that theories of language (both phylogenetic and ontogenetic) must value iconicity in addition to arbitrariness, and sound symbolism should be regarded as a core principle of language, and not merely a peripheral phenomenon.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
