Abstract
Iconicity (the extent to which word forms resemble their meanings) is proposed to be based on universally accessible form mappings that depict/express sensory imagery. In the present study, we explored phonological structural features proposed to be characteristic of iconicity and subjective iconicity ratings in two large English and Spanish datasets. Restricting analyses to words with good rating agreement across participants, we show that the distributions of iconicity ratings differ considerably between the two languages, with far fewer Spanish words rated as iconic. Multiple regression analyses showed that structural markedness significantly predicted iconicity ratings in both languages, although the relationship was weaker in Spanish. Highly rated English forms included many phonaesthemes, that is, words with systematic sound-meaning mappings that can be iconic or non-iconic. Surprisingly, English and Spanish words rated higher in iconicity had larger phonological neighbourhoods despite comprising less frequently occurring phoneme sequences. In English, words rated as more iconic were also more likely to be polysemes (i.e. convey multiple, metaphorically-related meanings) than linked to a specific sensory meaning. Regression models revealed phonological/phonetic features, syllable structures and reduplications predicted significant proportions of variance in both English (33.3%) and Spanish iconicity ratings (50.8%), demonstrating both common and language-specific mappings. While our findings support the qualified use of subjective ratings for cross-linguistic comparisons of iconicity, we recommend researchers control for systematicity and polysemy and consider using additional/alternative measures to exclude non-iconic forms.
Introduction
Across spoken languages, there are some words whose forms sound like the meaning they convey, that is, they are iconic. Most are ideophones, a structurally marked, open class of words that depict imagery across sensory and motor domains (Thompson & Do, 2019). Some, such as onomatopoetic forms, directly imitate an auditory percept via articulatory gestures (e.g. in English the words splash and thump mimic the sounds of objects hitting water or a solid surface, respectively) while others indirectly depict motion (e.g. zigzag references alternating left and right turns in English and Spanish). 1 A different category of words references sensorimotor imagery via a systematic pairing of sub-morphemic form features, as in phonaesthemes (e.g. in English, the consonant cluster gl- is linked to luminance/vision as in glare, glimmer, gloom, etc.; while fl- is linked to movement as in fling, flip, flutter, etc.; Zingler, 2017). Iconicity is proposed to manifest in only small sets of words cross-linguistically, being based on universal form-resemblance mappings, while systematic pairings are represented more extensively within languages reflecting their basis in statistical regularities (Blasi et al., 2016; Dingemanse et al., 2015; Haslett & Cai, 2023; Perniss et al., 2010; Thompson et al., 2021).
Recent attempts to operationalise iconicity have used subjective ratings based on lay definitions supported by examples (Hinojosa et al., 2021; Perry et al., 2015; Thompson et al., 2020; Winter et al., 2024). An advantage of this approach is that it allows data to be collected from large portions of the vocabulary, enabling direct comparisons between various spoken languages (Motamedi et al., 2019; Winter & Perlman, 2021). However, the basis for participants’ intuiting that a word “sounds like what it means” is not entirely clear. As Winter et al. (2024) acknowledged, iconicity ratings “underspecify the particular form-meaning links that lead to a rater’s intuitions . . . they do not give clues to the nature of this correspondence” (p. 11). It has been suggested that participants might simply rate the iconicity of words based on their sensory meanings rather than on their form-meaning resemblances (Thompson et al., 2020). Given that iconicity ratings are increasingly employed in psycholinguistic studies to support inferences about form-meaning mappings and theories of language embodiment (Dove, 2022; Lupyan & Winter, 2018; Perniss & Vigliocco, 2014; Perry et al., 2015; Sidhu & Pexman, 2018b; Sidhu et al., 2020), it is essential to clarify the nature of this correspondence.
Several lines of evidence have been cited in support of iconicity ratings referencing form-meaning resemblances using the two largest normative datasets in English (Winter et al., 2024; 14,776 words) and Spanish (Hinojosa et al., 2021; 10,995 words). These studies have primarily focused on semantic relationships and have shown that words rated higher in iconicity are also rated higher in terms of sensory experience in both languages (sensory experience ratings (SER); Díez-Álamo et al., 2019; Juhasz & Yap, 2013). While significant, this relationship is also weak (e.g. r = .20 and r = .24 in English and Spanish, respectively; de Zubicaray et al., 2024; Hinojosa et al., 2021), suggesting iconicity ratings do not merely recapitulate sensory experience (cf., Thompson et. al., 2020). English words rated high in iconicity also have sparser semantic neighbourhoods involving fewer concepts with similar meanings, as might be expected for non-arbitrary form-meaning mappings (Sidhu & Pexman, 2018a; Winter et al., 2024).
Relatively little research has explored the form mappings that contribute to a raters’ intuition of iconicity across languages, which is the focus of the present study. Here, we aimed to investigate how well iconicity ratings align with proposals concerning the phonological structure of iconic words. For example, as iconicity is proposed to be based in the use of universally accessible acoustic-phonetic features to depict/express sensory imagery (Blasi et al., 2016; Thompson & Do, 2019), then it seems reasonable to assume these features should be shared among words with high iconicity ratings across languages. Most of this research has involved single phonemes. For example, Blasi et al. (2016) reported strong associations between the concepts of roundness and smallness and /r/ and /i/ sounds, respectively, across multiple languages. Ćwiek et al. (2024) reported that trilled /r/ and /l/ sounds are associated with roughness and smoothness, respectively, across 28 languages (see also Winter et al., 2022). Thompson et al. (2021) found that seven articulatory features were common to ideophones across 13 typologically diverse, non-Indo-European languages, a finding they considered consistent with the cross-linguistic use of sensorimotor analogies. Furthermore, these features should primarily be shared by root forms. This distinction is important because morphological derivation in most languages involves the systematic addition of redundant form-meaning mappings in affixes. For example, in English and Spanish, negation is typically prefixal (e.g. un- in “unhappy,” in- in “infeliz”; de Zubicaray & Hinojosa, 2024). In English, suffixes convey abstractness (e.g. the concrete word “friend” becomes the abstract word “friendliness” with the addition of -iness; Kearney et al., 2024; Reilly et al., 2017). In Spanish, evaluative suffixes for diminution (e.g. the word “pájaro,” bird- is perceived as more positive in the word “pajarito” little bird with the addition of -ito) and augmentation (e.g. the word “cabeza,” head- becomes negative valenced in the word “cabezón” big head with meaning of stubborn with the addition of -ón) not only express quantification but also melioration and pejoration (Hinojosa et al., 2022).
In order to facilitate cross-linguistic comparisons, Dingemanse (2019) proposed that ideophones are marked words, in that they have structural features that “make them stand out from other words” (p. 15) in their respective languages. Structural markedness is a linguistically relative concept (Waugh & Lafford, 2006). For example, iconic words in English have been proposed to be more structurally marked in terms of having phonologically complex consonant clusters/blends in their onsets and codas, and Dingemanse and Thompson (2020) observed a positive correlation between iconicity ratings and structural markedness in a sample of English words also rated for humour/funniness. However, phonaesthemes are also characterised by complex onsets and codas, and some researchers distinguish them from ideophones noting that the latter are not a traditional part of speech like nouns or verbs (Dingemanse, 2019; Kwon & Round, 2015; Zingler, 2017). Iconic forms also show a high rate of reduplication (the root is repeated exactly or with only slight modification, e.g. zigzag, frufrú), which is considered an example of markedness (Dingemanse, 2015; Punselie et al., 2024; Zingler, 2017).
Distinguishing between iconicity and systematicity in form-meaning mappings is not always straightforward. For example, while there is nothing about the phonoaesthetic gl- onset cluster that links it sensorily to luminance or vision, the sn- onset cluster associated with nose/oral functions in English (e.g. sneeze, sniffle, snore, snack, snarl etc.) could be considered both phonoaesthetic and iconic, because the initial fricative /s/ is systematic while the nasal /n/ appears imitative (Kwon & Round, 2015; Zingler, 2017). The former phonaesthemes are therefore often distinguished as learned or conventional rather than iconic forms (Kwon & Round, 2015; Perry et al., 2015; Zingler, 2017). English has far more phonaesthemes than other languages (Mompeán et al., 2020). Spanish has fewer onset consonant clusters than English and none in its codas (Carlo et al., 2020), making them a less prominent structural marker in that language, although there are some cross-linguistic examples of phonaesthemes with similar meanings (e.g. the /fl/ onset is linked to motion in fluid in both English and Spanish, as in float and flow, and flotar and fluir, respectively; see Mompeán et al., 2020). Using an inductive approach, Dingemanse and Thompson’s (2020) proposed that “phonological improbability” can be considered a proxy for structural markedness. In Spanish, plosives and fricatives other than /s/ occur infrequently in coda position, meeting this criterion (Lloyd & Schnitzer, 1967; Rodríguez, 2016). Voiced plosives (/b/, /d/, /g/) that follow vowels also undergo a systematic weakening (lenition) process known as spirantisation in Spanish, further differentiating them from their regular voicing in other positions (González, 2006; Piñeros, 2002).
Phonotactic probability can also be viewed as a cross-linguistic proxy for structural markedness (Dingemanse, 2019). For example, it has been proposed that ideophones might be less likely to follow the general phonotactic rules of their language and so comprise less probable sequences of phonemes than non-iconic forms due to their imitation of sensory percepts (Thompson & Do, 2019). Dingemanse (2019) also considers this to be a form of structural marking that is shared with phonaesthemes. Dingemanse and Thompson (2020) failed to observe a significant relationship between phonotactic probability and iconicity ratings in a set of 1,419 English words also rated for humour/funniness, although they did find a significant negative covariance (−16.3%) with a measure of log letter probability which they suggested might be due to the ratings having been collected using written words. Winter et al. (2024) also observed a negative correlation with log letter frequency (r = −.15) in their larger set of English iconicity ratings, although did not investigate phonotactic probability. However, this relationship might differ for Spanish as it is more phonotactically regular and constrained in terms of onset and coda clusters and has spelling-to-sound mappings that are more transparent (Carlo et al., 2020; Rodríguez-Ferreiro & Davies, 2019). Words with high probability segments also tend to occur in more dense neighbourhoods involving more similar sounding words (Vitevitch & Luce, 2016). Hence, if words rated higher in iconicity are less phonotactically probable (Thompson & Do, 2019), then they should also have sparser phonological neighbourhoods.
The imitative aspects of iconic form-meaning mappings have been interpreted as supporting grounded or embodied cognition accounts of language that propose conceptual processing is situated in sensorimotor systems/experience (Murgiano et al., 2021; Perniss & Vigliocco, 2014; Sidhu & Pexman, 2021). Indeed, some authors have argued iconic forms “are too linked to specific referents and contexts, and so are less well suited for expressing abstractions” (Lupyan & Winter, 2018, p. 1). However, this focus ignores the reality of colexification across languages in which single word forms are frequently used to express multiple inter-related meanings, especially abstract ones (François, 2008; Rzymski et al., 2020). Lexical ambiguity significantly impacts how words are processed (for a review, see Eddington & Tokowicz, 2015). In English, most onomatopoetic forms are polysemous in that they convey multiple, metaphorically-related senses (Sasamoto, 2019). This is also the case for Spanish (Ibarretxe-Antuñano, 2019). For example, consider the following senses of the word “crunch”: We are currently experiencing a credit crunch. You have to crunch the numbers now. It’s crunch time for the team. This suggests iconicity ratings should show a positive rather than negative correlation with number of senses. However, the strength of this relationship might also differ between the two languages. Using sense annotations derived from Wikipedia corpora, Dandala et al. (2013) reported that the average number of senses per word in English and Spanish was 9.6 and 4.2, respectively.
In this paper, we use the large datasets of English and Spanish iconicity ratings collected by Winter et al. (2024) and Hinojosa et al. (2021), respectively, to identify form structural features that contribute to participants’ intuitions that a word “sounds like what it means” across the two languages. It should be noted that while both studies included words from prior normative studies of various semantic ratings (e.g. concreteness, emotional valence), including lists of onomatopoeias in their respective languages, Winter et al. (2024) also included a list of phonaesthemes taken from Hutchins (1998) but did not distinguish learned and iconic forms in their norms. Given relationships between form and meaning can be systematic and/or iconic, it is essential to distinguish them lest they be misattributed by researchers (Thompson & Do, 2019). Specifically, we tested the hypotheses that unaffixed word forms with high iconicity ratings in both languages would: (1) exhibit structural markedness; (2) comprise less phonotactically probable phoneme sequences; (3) have sparser phonological neighbourhoods; (4) be more likely to be polysemous; and (5) share “universal” form features.
Methods
Materials
Iconicity ratings for 14,776 English and 10,995 Spanish words were sourced from Winter et al. (2024) and Hinojosa et al. (2021), respectively. In both studies, participants rated individually presented written words on a scale of 1 to 7, according to whether they considered the sound of the word to be unrelated to its meaning (not iconic at all) to it being closely related to its meaning (very iconic). Information about affixation and lists of compound words in both languages were sourced from the Multilingual Database of Derivational and Inflectional Morphology (MorphyNet; Batsuren et al., 2021) and normative databases (Desrochers et al., 2010; Juhasz et al., 2015), respectively. Phonological neighbourhood sizes and mean phonotactic (biphone) probabilities for English were sourced from the Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighbourhood Densities (CLEARPOND; Marian et al., 2012) and Irvine Phonotactic online Dictionary (IPHOD v2; Vaden et al., 2009), respectively, both of which are based on the SUBTLEXus database (Brysbaert & New, 2009). For Spanish, these values were sourced from the EsPal database (Duchon et al., 2013), which is also based on a movie subtitle corpus. 2 Information about colexification in terms of the number of inter-related senses for English and Spanish words was sourced from WordNet (3.0; Gao et al., 2022; Miller, 1995) and the Multilingual Central Repository (MCR 3.0; Gonzalez-Agirre et al., 2012), which is based on WordNet, respectively. We selected the above databases because they each used similar methods to derive their measures from comparable English and Spanish corpora.
We followed Dingemanse and Thompson’s (2020) inductive approach of cataloguing phonological complexity to identify instances of structural markedness in iconic words. In English, this involved identifying words with complex two- and three-letter consonant clusters/blends in their onsets (bl/cl/fl/gl/pl/br/cr/dr/fr/gr/pr/tr/sk/sl/sp/st/sw; spr/scr/str) and codas (mp/nk/rt/rr/sh/wk) in addition to the diminutive suffix -le. However, unlike Dingemanse and Thompson (2020), we excluded instances where the suffix was used productively according to MorphyNet to avoid morphophonological redundancy influencing the results (so “sniffle” was excluded as it is a diminutive of the root “sniff,” while “drizzle” was included as “drizz” is not an English root word). We also identified instances of reduplication (repetition or near-repetition of roots, e.g. goo-goo, zigzag) and vowel lengthening/multiple consecutive vowels (e.g. squeal, squeak; see Dingemanse, 2015; Perniss & Vigliocco, 2014; Punselie et al., 2024). Each instance was coded as “1” and a cumulative measure of structural markedness was derived via summation (e.g. “plow” was coded as 1, while “drizzle” and “sport” were each coded as 2).
We adopted a similar approach for Spanish words, including only complex onset consonant clusters (pl/pɾ/bl/bɾ/tr/dr/cl/cr/gl/gr/fl/fr) as none may exist in codas (Bradley, 2006), as well as instances of reduplications (e.g. frufrú; Urbaniak, 2019) and multiple consecutive vowels (e.g. buaaa, muuu). In addition, we coded instances of fricatives other than /s/ as well as plosives in coda position, given these features occur infrequently in Spanish (Lloyd & Schnitzer, 1967; Rodríguez, 2016), following Dingemanse and Thompson’s (2020) proposal that phonological improbability is a proxy for structural markedness.
To determine how many English phonaesthemes were rated as iconic, we adopted two approaches: First, we compiled a non-exhaustive list of words with semantic gloss annotations from several available sources (Bergen, 2004; Hutchins, 1998; Kwon & Round, 2015; Zingler, 2017). Second, we catalogued words that included the 47 candidate phonaesthemic clusters from Otis and Sagi’s (2008) corpus study. While the first approach is conservative, the second is more lenient in that it will possibly include words that do not share the phonaesthemic meaning associated with a given cluster. 3
Phonemic transcriptions and stress category assignments for English and Spanish were retrieved from the Carnegie Mellon University (CMU) pronouncing dictionary (http://www.Speech.cs.cmu.edu/cgi-bin/cmudict; 39 phonemes) and EsPal database (Duchon et al., 2013; 31 phonemes), respectively. Each word was coded according to its whole word properties (number of letters, syllables and phonemes), their initial and end phonemes (a number was assigned to each of the phonemes), number of phonetic features and their initial and final positions (i.e. place and height for vowels; voicing; place and manner of articulation for consonants), and syllabic stress position (initial, medial, final). To these we added orthographic length (i.e. number of letters) as a proxy for auditory duration.
Syllable structures were retrieved from the CMU and Espal. Note that cross-linguistic coding of sub-syllabic onset, nucleus, and coda slots is not straightforward due to the differences in syllabic structure across Spanish and English and difficulties determining syllable boundaries in the latter. For example, Spanish syllable structure is more clearly defined compared to English, due to a predominant open consonant-vowel (CV) syllable, and is a syllable-timed language (all syllables have approximately equal auditory duration; Bertrán, 1999; Gorman & Gillam, 2003). Conversely, the predominant syllable in English is the closed CVC combination, making the assignment of consonants to syllables relatively subjective (Duanmu, 2009; Gorman & Gillam, 2003), with syllable durations longer or shorter according to whether they are stressed or unstressed (Bertrán, 1999; Gorman & Gillam, 2003). Here, we adopted Bartlett et al.’s (2009) syllabification of the CMU dataset based on the sonority sequencing principle, in which sonority (relative loudness) rises in onsets, peaks at the nucleus, then falls in coda position. 4
Analyses
All analyses were performed using R (version 4.4.1; R Core Team, 2024). Averaging disparate ratings typically results in an unrepresentative value in the middle range of the scale with a large standard deviation, particularly for sensorimotor experience variables (Pollock, 2018). Winter et al. (2024) noted this issue affected their ratings of iconicity in English. We therefore applied a cut-off of 1.5 standard deviations to identify words with reasonable rating agreement in both datasets based on their use of 7-point scales. 5 Next, we removed affixed forms (Batsuren et al., 2021) and compound nouns with iconicity ratings >= 5 from both datasets (Desrochers et al., 2010; Juhasz et al., 2015) as Dingemanse and Thompson (2020) observed that English compounds with transparent yet non-iconic structure (e.g. heartbeat) tend to be rated as highly iconic.
Distributions were plotted using ggplot2 and cowplot packages (Wickham, 2016; Wilke, 2024). Spearman correlations between the iconicity ratings and other variables were calculated using the Hmisc and corrplot packages (Harrell, 2024; Wei & Simko, 2024). Multiple regressions with robust standard errors were performed with iconicity ratings as dependent variable using the package estimatr (Blair et al., 2022), ensuring valid coefficient estimates even in the presence of skewness, outliers, multicollinearity and/or heteroscedasticity among predictor variables (Wilcox, 2019).
To determine the best subsets of phoneme, syllable and structural markedness (reduplications, multiple consecutive vowels) variables for predicting iconicity ratings across languages, we used the leaps package (Lumley, 2022) after first excluding those with linear dependencies (caret package – findLinearCombos; Kuhn, 2008). Next, we determined the best-fitting model in terms of predictive accuracy via a 10-fold cross-validation procedure (repeated 200 times with different randomised folds), selecting the model that minimised root mean square error to avoid overfitting (de Rooij & Weeda, 2020; Yarkoni & Westfall, 2017). The best fit model was then into a linear regression with robust standard errors (Wilcox, 2019).
Results
The mean iconicity values and corresponding standard deviations for every English and Spanish word in the Winter et al. (2024) and Hinojosa et al. (2021) norms are plotted in Figure 1. Following application of the 1.5 standard deviation cut-off, 2,684 English and 1,900 Spanish words showed reasonable rating agreement, of which 629 and 99 were rated as iconic based on a rating of 5 or greater (as 4 is the neutral mid-point of a 7-point scale), respectively. Removal of affixed forms and etymologically transparent, non-iconic compounds resulted in sets of 1,523 and 1,121 English and Spanish words, respectively. Figure 2 shows the distributions of the latter words’ ratings in each language. They differed considerably in terms of the number of words rated as iconic, with 362 (23.7%) English words meeting this criterion compared to only 87 (7.8%) in Spanish. Of the English words rated as iconic, 76 (21%) were phonaesthemes according to the compiled list of semantic glosses, and 186 (51.4%) matched Otis and Sagi’s (2008) list of phonaesthemic clusters.

Iconicity rating variability for English (n = 14,776) and Spanish words (n = 10,995).

Iconicity rating distributions of unaffixed English (n = 1,523) and Spanish words (n = 1,121) with good agreement.
Relationships Between English Iconicity Ratings and Phonological Variables
Approximately 426 (28%) of the unaffixed English words were structurally marked. Biphone probabilities, phonological neighbourhood sizes, and Wordnet number of senses were available for 1,190 of these words. The zero-order correlations among the variables are shown in Figure 3. Iconicity ratings showed significant weak to moderate positive correlations with all variables excepting phonotactic probability for which the correlation was negative. The results of the regression are summarised in Table 1. Overall, the variables explained 18.1% of the variance in iconicity ratings, with each being a significant predictor. All except phonotactic probability showed significant positive relationships with iconicity ratings (lower biphone probabilities predicted higher ratings).

Correlations between variables for English words (n = 1,190).
Regression Analysis Results for Predicting Iconicity of English Words (n = 1,190).
p < .05. ***p < .001.
Relationships Between Spanish Iconicity Ratings and Phonological Variables
Of the unaffixed Spanish words, 135 (12.5%) showed evidence of structural markedness. Biphone probabilities and phonological neighbourhood sizes were available for 1,078 words. As number of MCR senses were available for only 638 of these words, we conducted separate analyses with this variable. The zero-order correlations among the phonological variables are shown in Figure 4. Iconicity ratings were significantly and weakly correlated with all variables, although the relationship was negative for mean biphone probability. In the smaller sample, iconicity ratings were significantly correlated with number of senses (r = .084, p = .036). However, this weak result is likely due to a restriction of range issue (floor effect) given only 14 words in this smaller sample had iconicity ratings >= 5. The results of the regression with the phonological variables excluding number of senses are summarised in Table 2. Overall, the variables explained 11.92% of the variance in iconicity ratings, with all predictors contributing significantly to the model. All except phonotactic probability showed significant positive relationships with iconicity ratings (lower biphone probabilities predicted higher ratings). In the smaller sample of 639 words with sense information, the combined variables predicted less variance (6.5%) in the iconicity ratings and number of senses was not a significant predictor (estimate = 0.015, SE = 0.011, t = 1.405, p = .16; see OSF repository). Again, this is likely to be due to a floor effect in the iconicity ratings for this subsample.

Correlations between variables for Spanish words (n = 1,078).
Regression Analysis Results for Predicting Iconicity of Spanish Words (n = 1,078).
p < .05. **p < .01.***p < .001.
Surface Form Variables Predicting Iconicity Ratings of English Words
Phonemic transcriptions, syllable and stress assignments, and structural markedness predictor variables were available for 1,313 unaffixed English words, for which the best-fit model comprised 29 variables and explained 33.6% variance (see Table 3).
Best Fit Model for Predicting Iconicity Ratings of Unaffixed English Words According to 10-Fold Cross Validation Repeated 200 Times (n = 1,313).
p =< .1. *p < .05. **p < .01. ***p < .001.
Surface Form Variables Predicting Iconicity Ratings of Spanish Words
Phonemic transcriptions, syllable and stress assignments, and structural markedness (multiple consecutive vowels, reduplications) predictor variables were available for 1,075 unaffixed Spanish words (55 with iconicity ratings > 5), for which the best-fit model comprised 14 variables and explained 50.85% variance (see Table 4).
Best Fit Model for Predicting Iconicity Ratings of Unaffixed Spanish Words With Surface Form Variables According to 10-Fold Cross Validation Repeated 200 Times (n = 1,075).
p =< .1. *p < .05. **p < .01. ***p < .001.
Several features significantly predicted iconicity ratings across both languages: bilabial and velar sounds in the final phoneme position, penultimate stress, reduplications and an initial syllable ccvc structure. However, the relationship with penultimate stress differed across languages, being positive in English and negative in Spanish. Other features also showed similar but not identical relationships across languages: Close vowel sounds were associated with higher iconicity ratings in both languages, although this was only in the word final position in Spanish. Words with ccvcc in their initial versus final syllable positions in English and Spanish, respectively, were also associated with higher iconicity ratings.
Discussion
Recent investigations of iconic form-meaning mappings in various languages have employed subjective ratings. The present study investigated which phonological/phonetic variables contribute to participants’ intuitions that a word “sounds like what it means” across two large datasets of iconicity ratings in English and Spanish. Overall, many more words were rated as iconic in English compared to Spanish. Across languages, highly iconic words shared several phonological/phonetic features, syllable structures and reduplications while also demonstrating language-specific mappings.
At first glat Spanish and English differ substantially in terms of their number of words rated as iconic seems at odds with the research indicating that most Indo-European languages comprise similar small-sized inventories of iconic forms (Perniss et al., 2010). Before interpreting these differences in rating distributions as evidence for linguistic diversity in iconicity, it is worth considering an alternative explanation: While both Hinojosa et al. (2021) and Winter et al. (2024) utilised identical 7-point rating scales, the instructions to their respective participants differed in terms of the examples used. Specifically, Hinojosa et al. provided examples of iconic and non-iconic words, whereas Winter et al. provided examples of words that were rated as low, moderate or high in iconicity. This might explain the preponderance of words rated in the mid-point of the scale for the English words. Other studies have used different scales ranging from −5 (anti-iconic) to 5 (iconic) with the zero mid-point reflecting arbitrariness or asked participants to rate how accurately a “space alien” could guess the meaning of a word based only on its sound using a 100-point scale (Perry et al., 2015). As Motamedi et al. (2019) advise, variations in instructions and rating scales potentially index different aspects of sound-meaning mappings, and research using subjective iconicity ratings should be sensitive to this. In our view, the clear differences observed in the rating distributions between English and Spanish words preclude applying the set of English ratings cross-linguistically to translated words in other languages (e.g., Blasi et al., 2022).
In Spanish, the majority of unaffixed forms were onomatopoeias and interjections, consistent with Hinojosa et al.’s (2021) findings for their full dataset. In English, between 21% and 51% of the forms rated high in iconicity were phonaesthemes that have systematic sound-meaning mappings, depending on the method used to identify them. For example, the words “glare” and “gloom” had mean iconicity ratings of 5.73 and 6.2, respectively. This suggests that when English speakers rated whether a word “sounds like what it means,” part of this intuition involved accessing knowledge about systematic sound-meaning mappings that might be learned/conventional rather than directly linked to sensory experiences. As we noted in the Introduction, Winter et al. (2024) included Hutchin’s (1998) list of phonaesthemes because they can also be iconic. However, it is important to distinguish between iconic and non-iconic (learned) phonaesthemes (Thompson & Do, 2019), particularly if researchers are interested in direct grounding of meaning in sensory experience (Murgiano et al., 2021). Bergen (2004) reported that non-iconic English phonaesthemes contribute to lexical priming in a manner similar to morphemes, concluding they possess a “psychological reality” for speakers (see also Hutchins, 1998; Kwon & Round, 2015; Zingler, 2017). Interestingly, Spanish phonaesthemes tended not to be rated as iconic and had large rating standard deviations (Pollock, 2018). For example, the mean ratings of the phonaesthemes with fl- and tr- onsets such as “flotar” (3.15), “fluir” (4), “traca” (2.96), and “trueno” (3.19) were all in the middle range of the iconicity scale. In addition, while a number of etymologically transparent compound words were rated as highly iconic in English (e.g. “hardcover,” 6.2), confirming Dingemanse and Thompson’s (2020) observation, this was not the case for Spanish (none > 5). Again, it is unclear whether this tendency for English participants to rate non-iconic forms as iconic is due to linguistic diversity or different rating instructions.
We also confirmed and extended the observation that English words with higher iconicity ratings are more likely to be structurally marked (Dingemanse & Thompson, 2020). As we noted in the Introduction, structural markedness is a linguistically relative construct, and Spanish has far fewer words with complex onsets and none in its codas (Bradley, 2006; Carlo et al., 2020). We therefore investigated Spanish words’ markedness in terms of complex onsets, reduplications, long/multiple vowel sequences and infrequent phonemes in coda position (Lloyd & Schnitzer, 1967; Rodríguez, 2016), following Dingemanse and Thompson’s (2020) proposal that phonological improbability is a proxy for structural markedness. Here, the relationship was significant albeit much weaker than that in English (r = .07, p < .05 vs. r = .32, p < .001). It is possible that other examples of common markedness might be identified for Spanish and English using an inductive approach (Punselie et al., 2024). It is also important to reiterate that the relationship between markedness and English ratings is conflated to some extent with the systematic use of consonant clusters in phonaesthemes that can be iconic or non-iconic (Thompson & Do, 2019).
In both languages, words with less probable biphone sequences tended to be rated higher in iconicity, which might be due to their use of speech sounds to imitate sensory percepts (Thompson & Do, 2019), although it should be acknowledged that this relationship was quite weak in Spanish (r = −.08). This finding contrasts with that of Dingemanse and Thompson (2020) who failed to observe a similar relationship in a comparably sized set of English words rated for humour/funniness. This discrepancy is likely due to our inclusion of only unaffixed words with reasonable agreement in iconicity ratings across both languages. We also hypothesised that if forms rated high in iconicity were less phonotactically probable, then they would also have fewer lexical neighbours comprising similar sounds (Vitevitch & Luce, 2016). However, in both languages, words rated as iconic instead tended to have more phonological neighbours. Hence, they are less likely to “stand out” from other similar sounding words (cf. Dingemanse & Thompson, 2020), although again this relationship was much weaker in Spanish (r = .06) than English (r = .36). A possible explanation for this opposite relationship in English is suggested by the positive correlation we observed between phonological neighbourhood size and markedness which differed to Spanish where marked words tended to have fewer neighbours. This would also suggest that the relationship between iconicity rating and neighbourhood size in English is mediated to some extent by the systematic pairings of consonant clusters in phonaesthemes. In English, words with larger phonological neighbourhoods show an advantage in lexical decision (Vitevitch & Luce, 2016). In Spanish, they show a disadvantage (Vitevitch & Rodríguez, 2004).
In English, we also found that iconicity ratings were predicted by the number of senses expressed by a word. This is consistent with the use of most onomatopoetic/ideophonic forms to express multiple, metaphorical senses, that is, polysemy (Sasamoto, 2019), although some researchers have proposed iconic forms are linked to specific referents and contexts (Lupyan & Winter, 2018). We did not have a sufficient number of forms with sense annotations to confirm whether this was also the case for Spanish, although a similar relationship is likely (Ibarretxe-Antuñano, 2019). For both languages, we used Wordnet sense annotations (Gonzalez-Agirre et al., 2012; Miller, 1995), for which fine-grained annotations show lower inter-annotator reliability than coarser-grained ones (Navigli, 2009). Future work might consider using sense annotations derived from corpus distributional measures (Beekhuizen et al., 2021; Dandala et al., 2013).
One final question we investigated was whether iconicity ratings in both languages could be predicted by common form features, given that iconicity is proposed to be based on the cross-linguistic use of universally accessible acoustic-phonetic and structural features to depict sensory imagery (Blasi et al., 2016; Ćwiek et al., 2024; Dingemanse & Thompson, 2020; Punselie et al., 2024; Thompson & Do, 2019; Winter et al., 2022). Furthermore, we proposed that these features should be primarily shared with the root word, as iconicity is based in sensory depiction while affixation involves the systematic addition of redundant features (de Zubicaray & Hinojosa, 2024). Our analyses revealed phonological/phonetic feature and higher level structure mappings were able to significantly predict iconicity ratings in English and Spanish explaining 33% and 51% of variance, respectively.
In both languages, words rated higher in iconicity tended to be characterised by bilabial and velar consonant sounds in final position (Blasi et al., 2016), reduplications (Punselie et al., 2024) and an initial syllable structure with complex consonant onsets (ccvc; Dingemanse & Thompson, 2020). Penultimate stress also played a significant role across languages, although in opposite directions, being associated with higher versus lower iconicity in English and Spanish, respectively. Close vowel sounds were also associated with higher iconicity ratings in both languages, although this was only in the word final position in Spanish, consistent with prior work linking them to sound symbolism (e.g. small and /i/; Blasi et al., 2016; Dingemanse, 2019; Thompson & Do, 2019). Words comprising syllables with complex consonant onsets and codas (ccvcc) in initial versus final positions in English and Spanish, respectively, were also associated with higher iconicity ratings, and likely to be attributable to the inclusion of phonaesthemes. Interestingly, both languages showed a tendency for iconic words to have closed syllables, despite their relative frequencies differing across English and Spanish. Our analysis also revealed examples of features associated with iconicity specific to each language, e.g. /i/, /r/ and /l/ sounds in English (Blasi et al., 2016; Ćwiek et al., 2024) and the trilled /r/ in Spanish (Ćwiek et al., 2024; Winter et al., 2022). To determine the relative contributions of phonological/phonetic versus higher level structural features (syllables, reduplications) to predicting iconicity ratings, we reran the regression models excluding the latter. Phonological/phonetic features alone were able to predict 29% and 44% of the variance in English and Spanish ratings, respectively, demonstrating their importance.
Our novel findings have several implications for research on iconicity and theories of language embodiment. The fact that so many English phonaesthemes were rated as iconic indicates that systematic learned/conventional form-meaning mappings are likely to have contributed to past reports of subjective iconicity influencing lexical processing (Lupyan & Winter, 2018; Sidhu et al., 2020; de Zubicaray et al., 2024). This is consistent with Thompson and Do’s (2019) observation that systematicity has “sometimes been misappropriated as a form of iconicity” by researchers, a perspective that we agree with (p. 32). These findings also challenge embodied cognition accounts of language that use subjective iconicity ratings to support proposals that conceptual representations are grounded in sensory experience, as they assume that systematic relationships are a qualitatively different property of the language system (Dove, 2022; Murgiano et al., 2021; Perniss & Vigliocco, 2014). However, it is worth reiterating that Spanish words rated high in iconicity were less likely to be phonaesthemes than their English counterparts.
The fact that English words rated as iconic were also more likely to express multiple, metaphorical meanings indicates that past reports of a processing advantage for iconic words (Sidhu et al., 2020) might to some extent reflect the well-established advantage for polysemes (for a review, see Eddington & Tokowicz, 2015). The representations of polysemes have been proposed to occupy a complex, high-dimensional lexical-semantic space rather than simple grounding in sensory experience (Rodd, 2020). Embodied accounts have struggled to provide evidence for grounded representations of metaphors (Casasanto & Gijssels, 2015), while other researchers have considered iconicity inimical to abstraction (Lupyan & Winter, 2018). The positive correlation between iconicity ratings and polysemy might also explain both Hinojosa et al.’s (2021) and Winter et al.’s (2024) counterintuitive observations of negative correlations with concreteness ratings, indicating more iconic words were rated as more abstract, contradicting Lupyan and Winter’s (2018) proposal. 6 We recommend that researchers interested in using iconicity ratings to investigate lexical processing ensure that they adequately control for variables such as systematicity and polysemy and consider using additional/alternative measures to exclude non-iconic forms (de Zubicaray, 2025; Dingemanse & Thompson, 2020; Motamedi et al., 2019; Punselie et al., 2024). Systematic form-meaning mappings in English sensory words have been shown to influence lexical processing to a greater extent than iconicity ratings (de Zubicaray et al., 2024).
More generally, the present research highlights the need for more research to investigate why people rate things the way they do in their respective languages. The finding that phonological variables significantly influence subjective iconicity ratings in both languages has implications for research that attempts to generate or extend human ratings using large language models (LLMs) such as ChatGPT (OpenAI, 2024). For example, Trott (2024) reported a correlation of r = .59 between Winter et al.’s (2024) subjective iconicity ratings and those generated by GPT-4, indicating the two cannot be considered equivalent. As current LLMs are trained solely on written text corpora, they are not “phonologically aware” in the way that humans are (de Zubicaray, 2025). Hence, using LLM-generated ratings as an extension or substitute for human judgements risks obscuring phonological relationships like the ones observed here. Finally, as Motamedi et al. (2019; see also Punselie et al., 2024) noted, iconicity is not a monolithic construct, so it is also important to understand the roles that variables such as task-type and instructional set play in influencing subjective ratings.
Conclusion
We investigated relationships between subjective ratings of iconicity and phonological/phonetic variables in unaffixed English and Spanish words with good rating agreement. Our findings showed that many more English words were rated as iconic, although these included phonaesthemes that have systematic form-meaning mappings that can also be learned/conventional or iconic. Iconic words showed evidence of structural markedness in both languages, although the relationship was weaker in Spanish. Both English and Spanish words rated higher in iconicity had larger phonological neighbourhoods despite comprising less frequently occurring phoneme sequences. In English, words rated as more iconic were also more likely to be polysemous forms. Finally, iconicity ratings in both languages were able to be predicted by common phonological/phonetic features, syllable structures and reduplications, consistent with previous cross-linguistic research.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by an Australian Research Council Discovery Project Grant DP220101853 and by grant HORIZON-MSCA-2023-SE-01. Ref.101182959 from the Horizon Europe Framework Programme.
Ethical Considerations
This study was granted exemption from requiring ethics approval due to its use of publicly available datasets.
