Abstract
Aims/objectives/research questions:
Previous studies indicate differences in the way children who grow up with two languages use socio-pragmatic cues to help them identify referents and learn new words, yet the nature of these differences (executive control, better attention to social cues, or pragmatic reasoning) has not been investigated.
Design/methodology/approach:
This study examined 270 monolingual and bilingually exposed 4–6-year-old children’s performance in 2 tasks using different prosodic cues (contrastive stress and emotional affect) for fast mapping. It avoids a design where children have to inhibit an irrelevant cue, which would enhance the role of differences in executive control.
Data/analysis:
We performed statistical analyses using a logistic regression mixed model.
Findings/conclusions:
The bilingually exposed group performed lower than monolinguals in a control condition involving structural language (0.83 vs. 0.92). However, they performed on par with monolinguals in a pragmatic condition when considering only semantically correct answers in both groups (0.55 vs. 0.58), and even displayed significant comparative strength in the task once control performance and demographic variables were taken into account. This effect appeared when the task required reasoning about the speaker’s communicative intentions (contrastive stress) but not when children merely had to recognise a communicative cue (emotional affect).
Originality:
No study has so far investigated the socio-pragmatic abilities of bilingual children using a task that did not require inhibiting an irrelevant cue.
Implications:
These findings have implications for bilingual education and a better understanding of the impact of being educated in two languages. We also draw attention to implications regarding the existence of different types of pragmatic skills which may have differing developmental timelines and rely on different sets of abilities.
The bilingual linguistic environment
Children growing up in multilingual environments are by definition exposed to only a subset of the input that their monolingual peers receive for the same language. Unsurprisingly then, receptive and productive vocabularies of bilinguals do seem to lag behind those of their monolingual counterparts in the early years (Bialystok et al., 2010). In terms of communicative interactions in bilingual environments, additional challenges arise for these children, from having to identify the linguistic profile of any given interlocutor (monolingual in language A, monolingual in language B, bilingual without code-switching, bilingual who engages in code-switching, etc.), which result in a higher risk of communication failure. These challenges, however, may in fact serve as learning and training experiences for bilingual children. Bilingual toddlers appear to be better at detecting and attempting to repair communication misunderstandings (Wermelinger et al., 2017). Moreover, exposure to code-switching (which introduces more linguistic complexity and uncertainty in the input) seems to increase attention to social cues and speakers in both monolingual and bilingual children (Yow & Markman, 2016). Similarly, bilinguals have consistently displayed enhanced theory of mind (Goetz, 2003) and perspective-taking abilities (Greenberg et al., 2013; Rubio-Fernandez & Glucksberg, 2012).
The added complexity and uncertainty in their learning environment appear to make bilinguals more socially and communicatively aware. Could this situation then result in enhanced pragmatic abilities and could these abilities (as has been suggested by Byers-Heinlein, 2017) help them compensate for the obstacles they face to achieve efficient vocabulary learning?
Pragmatic abilities in bilingual children
A few studies have investigated potentially enhanced pragmatic abilities in bilinguals and have reported positive findings. In 3 studies, Siegal and colleagues (2007, 2009, 2010) conducted acceptability judgements on scalar implicatures and presented children aged 3–6 years old with statements from a Conversational Violations Test uttered by 2 puppets (e.g., Question: ‘What did you get for your birthday?’ Answer: ‘A puppy’ or ‘A present’) asking them to indicate which of the dolls had said something ‘silly or rude’. Correct performance of the task required sensitivity to various pragmatic maxims (such as the maxim of informativeness, in the example given here) and world knowledge. Antoniou and Katsos (2017; Antoniou et al., 2020) used a combination of acceptability judgements and picture choices. For example, in the paradigm that they adapted from Kronmuller et al. (2014), a character has a hidden and visible card and refers to the visible item as the ‘open window’. The participant is then asked ‘what do you think was on Martijn’s second card?’. This answer invites the participant to reason (possibly quite metalinguistically) about the use of the modifier in ‘open window’, which provides a ‘cue’ for inferring that the hidden image is that of a closed window. Both of these studies as well as the studies by Siegal and colleagues reported that bilingual children were performing at higher levels with pragmatics than their monolingual peers, or that they performed at the same levels as their monolingual peers with pragmatics, even though the bilinguals had substantially lower competence in the vocabulary and grammar of the language of testing compared to monolinguals (see also Syrett et al., 2017a, 2017b who tested children aged 3–6 years old on scalar implicatures). In all of these cases, it was suggested that bilingual children are exceptionally competent with pragmatics, either in absolute terms (i.e., because they outperformed the monolinguals) or in relative terms (i.e., because they were better with pragmatics than what would have been expected from their competence with vocabulary and grammar which are reliably key predictors of pragmatic competence in monolingual populations; see Katsos & Bishop, 2011).
While these works examined pragmatic competence in a purely linguistic context (i.e., using only language-related cues), other studies have focused on the use of ‘socio-pragmatic cues’ for word learning and referential resolution in young bilingual children. We define word-learning cues as signals used to help determine the meaning of a novel word. Social cues are non-linguistic cues provided by a speaker for the purpose of communication. Perceptual cues are cues linked to the perceptual characteristics (e.g., shape, colour, etc.) of an object considered a potential candidate for a novel word. Prosodic cues are cues linked to the speaker’s voice inflexions, such as variations in pitch (i.e., fundamental frequency), emotional affect, or other acoustic features (i.e., volume, rhythm, etc.). Lexical or prosodic stress is the emphasised accent placed on a word or part of a word. Emotional affect is the tone of voice typically indicative of a speaker’s emotional state (i.e., ‘sad’, ‘happy’, or ‘fearful’ tone).
For example, a study by Yow and Markman (2015) suggests that simultaneous bilinguals (aged 3, English-L2, min 30% exposure in non-dominant language) are better than monolinguals at combining speaker’s cues (semantic and social) to identify novel words’ referent. Bilinguals also seemed to attend more to the speaker’s cues compared with monolinguals when confronted with incongruent information for word learning. Brojde et al. (2012) tested simultaneous balanced bilinguals (aged 24–36 months, English-L2) in a word-learning paradigm pitching object property cues against the speaker’s cue (eye gaze). They found that monolinguals were more likely to rely on the shape of the objects to build a category for a novel word, while bilinguals were more likely to use a speaker cue such as eye gaze. Another recent study by Groba et al. (2018) with 3–5-year-old simultaneous bilinguals (Spanish-German) using a similar paradigm to investigate reliance on object property and socio-pragmatic (gesture) cues in adjective learning yielded no behavioural differences but found higher neural activation in bilinguals of an area related to gesture interpretation. Additional studies by Yow and colleagues also found bilinguals to rely more on speaker-related than other types of cues for interpretation. These included emotional affect rather than semantic content (Yow & Markman, 2011b) and eye gaze rather than object salience (Yow et al., 2017).
Why might bilingual children differ from their monolinguals in pragmatics? Three proposals
Given the growing evidence that bilinguals are more competent with pragmatics than their monolingual peers, it is natural to ask at this point what underlies the differences. We discern three possibilities. First, because most of these paradigms used with younger children involved inhibiting an irrelevant cue in order to focus on the socio-pragmatic cue, the results in these studies might have been driven by enhanced general cognitive abilities such as inhibitory control. There have been suggestions that the constant switching and inhibition needed to monitor multiple languages could lead to enhanced executive abilities in bilinguals (Green, 1998). An extensive body of experimental literature has shown early bilingual children outperforming monolinguals on tasks related to executive functioning skills, such as working memory, inhibition, and cognitive flexibility (e.g., Barac & Bialystok, 2011; Bialystok & Craik, 2010). Improved skills in executive control could enhance the basic associative mechanisms underlying instances of word learning by allowing bilinguals to retain a greater number of possible referents for a novel word and switching more easily between hypothesised referents. This advantage could be mediated by verbal short-term memory (Papagno & Vallar, 1995) or inhibitory control (Bartolotti et al., 2011). The cognitive advantages experienced by bilinguals could also potentially have an impact on word learning beyond simple associative mechanisms, by improving their ability to combine cues to word meanings from different sources and ignore irrelevant or contradictory information (Kaushanskaya & Marian, 2007; Yow & Markman, 2015). Yow and Markman’s (2015) study suggests that bilinguals could indeed have an advantage for ignoring irrelevant cues in general, not only in favour of a social cue. They presented 3-year-old bilingual and monolingual children with a word-learning task where they saw 2 objects while the experimenter could only see 1. In one condition, the experimenter would ask ‘Where’s the [novel word]’, whereas, in the other, she would say ‘There’s the [novel word]’, and then ask the child ‘Can I have the [novel word]?’. In both conditions, the experimenter’s gaze was fixed on the visible object. While both monolinguals and bilinguals selected the visible object above chance in the ‘there’ condition, only bilinguals selected the hidden object above chance in the ‘where’ condition, using the speaker’s semantic information in combination with (or despite) the gaze cue.
However, while an advantage in pragmatics could in principle be related to enhanced executive functions, there are two main reasons to doubt that this is actually what is at play. At a general level, many studies have failed to replicate the bilingual advantages in cognition and the very existence of this alleged advantage has recently been questioned (Paap & Greenberg, 2013). The inconsistency in the results could be due to different types of bilingualism having differential influences: earlier acquisition, higher proficiency, and more frequent switching between languages are, for example, likely to result in a higher need for control abilities (e.g., Luk et al., 2011). However, more pertinent to the literature that we are reviewing here, those studies that measured executive control failed to find a difference between bilinguals and monolinguals which would have explained the former’s higher performance in pragmatics (Champoux-Larsson & Dylman, 2019; Yow et al., 2017; Yow & Markman, 2015) or failed to find such a difference for bilingually exposed children despite both bilingually exposed and bilinguals outperforming monolinguals (Fan et al., 2015). Therefore, in line with Fan et al. (2015), we suggest that the bilingual advantage in pragmatics is not likely to be due to enhanced executive functions.
The second possibility we consider is that bilingualism increases attention to speaker or social cues (or preferential weighting of them compared with other types of cues). Young children have time and again demonstrated efficient and early capacity to use social cues for word learning: eye gaze (Baldwin, 1993), gestures (Horst & Samuelson, 2008), facial expressions (Tomasello et al., 1996), adult intentional/accidental (i.e., clumsy) behaviour (Carpenter et al., 1998), the context in which the referent is presented (i.e., other objects/actions that accompany it; Tomasello & Akhtar, 1995; Waxman & Booth, 2001), novelty/givenness status (Moll et al., 2006), and prosody (Grassmann & Tomasello, 2007).
According to standard pragmatic theory (e.g., Grice, 1975), communication proceeds successfully by assuming the speaker is cooperative, i.e., behaves according to a set of maxims enjoining her to be truthful, relevant, clear, and informative. Building on these assumptions, listeners can try and retrieve the intended meaning from the speaker’s utterance or communicative behaviour. In the case of a word, the purpose of the learner is to find which referent is attached to the speaker’s label. Using the cues mentioned above, they could possibly be doing so by computing a pragmatic inference (i.e., an inference that uses hypotheses about the speaker’s communicative intentions to understand the meaning of his words or actions): if the speaker is gazing or pointing at this object, it is probably because they intend to refer to it; if they are using a novel label, it is probably because they have a new referent in mind or if they are pairing this pink elephant with other animals, they probably mean the label to refer to a category; if they are pairing a pink elephant with other pink objects, they probably intend to refer to the property, etc.
Alternatively, in many of these cases, it could be argued that they reach the same conclusion by using these cues in a more direct manner, i.e., because they contribute to raising the salience of one referent over the others, without any intention-reading actually having taken place (Ambridge & Lieven, 2011). This is what is suggested by Frank’s model (2014), which separates two dimensions for word-learning processes: the source of the cue (cross-situational or social) and the way the cue is used (in a purely associative manner or through the computation of intentions). Thus, the social cues could contribute to contrasting the target referent with other potential referents directly by highlighting it, or the contrast could be the result of a pragmatic inference about what the speaker intended to highlight. Therefore, the second possibility that we put forward is that bilingual children, because of the nature of their interactions with their social environment, are better at paying attention to social cues.
Finally, the third possibility is that bilingual experience could contribute to enhancing pragmatic competence per se, i.e., the ability to reason about communicative intentions and informativeness. This is a distinct possibility from the one mentioned above. Pragmatic inference distinguishes itself from social cue-following in that it requires not only recognition of the cue provided but also reasoning about the speaker’s intentions in providing the cue. Take for example contrastive inference which is a type of implicature using Grice’s second Maxim of Quantity: ‘do not say more than you need’. In this type of pragmatic inference, the reason for providing a certain cue (e.g., an adjectival modifier, or prosodic stress) is assumed to be for informative purposes (i.e., to distinguish one potential referent from another). For example, hearers presented with an array of four objects and prompted with an utterance beginning with ‘Pick up the tall. . .’ will start shifting their eye gaze towards the tall glass (which has a small glass counterpart) rather than the tall jug (which does not have another jug as a contrast) before hearing the noun, inferring that the modifier is more specific, being required, for this referent (Nadig & Sedivy, 2002). Similarly, ‘the red dax’ should be taken to refer to the leftmost rather than the rightmost object in Figure 1.

Contrastive inference.
Emotional affect (expressed via positive or negative intonation), on the contrary, is a type of socio-pragmatic cue that, like eye gaze and pointing, can potentially be used directly to identify a particular referent with a positive or negative feature without the need to reason about why the speaker provided the cue (or this specific amount of information). In this scenario, the affect (or point, or gaze, or other social cues) would shift the child’s attention towards the target object, raising its probability as a candidate for the novel word without any processes involving pragmatic inferencing (or accessing people’s mental states, i.e., theory of mind taking place; see also Frank, 2014). In this second case, the inference is not about the amount of information provided by the speaker but more directly about the conceptual links between a particular affect and situation or object. This is a difference between the mere recognition of a cue and the actual use of a cue into a chain of pragmatic inferential reasoning. Therefore, the third possibility for bilingual children’s better performance with pragmatics could be that the nature of their linguistic experiences has made them more adept at pragmatic reasoning per se.
To take stock, better performance by bilingual children in the social cues word-learning studies previously mentioned could be the result of a bilingual advantage in general cognitive abilities (and inhibition of irrelevant information) or increased attention towards speaker cues, or they could result from increased ability to reason about speaker’s intentions in providing the cue. While there is growing evidence which suggests the first possibility is not promising, the other two are very much up for empirical scrutiny. This study sought to investigate these last two hypotheses by testing monolingual and bilingually exposed children’s performance on two different word-learning/fast-mapping paradigms in the absence of conflicting information (i.e., where success could not be obtained by ignoring an irrelevant cue). These are a contrastive inference task (which required reasoning about informativeness) and an emotional affect task (which could be completed by simply attending to speaker cues). We ask two interrelated questions. First, if bilinguals perform better than monolinguals in these two tasks where attending to speaker cues does not require ignoring another type of cue (which is unlike the tasks used in the bilingual word-learning literature to date). Second, if there is a difference in performance for these two groups when only one pragmatic cue (contrast) is included as opposed to two pragmatic cues (contrast and stress). Critically, we ask if bilinguals perform better in a word-learning task where they are required not only to attend to a speaker cue but also to reason about it (contrastive stress) compared with a task that requires mere attention to a speaker cue (emotional affect).
Method
Participants
The study was approved by the Cambridge Psychology Research Ethics Committee, and parents in participating schools were sent information about the study along with a form allowing them to opt-in or opt-out depending on the school’s policy.
In terms of selecting participants for the bilingual group, the high variability in input and increased risk of miscommunication applies to any bilingually exposed group regardless of the level of fluency. The few studies that did use bilingually exposed children (i.e., who might not have fluent productive ability) indeed found similar results as with productive bilinguals for perspective-taking (Fan et al., 2015; Liberman et al., 2017) and for word learning (Menjivar & Akhtar, 2017) as well as both bilingual adults and L2 learners adapting their speech to a greater extent to avoid communication failures when talking to a child or non-native speaker (Lorge & Katsos, 2018). In light of this, we decided to compare two groups: a group of monolingual speakers of English and a bilingually exposed group who received regular exposure to another language.
In total, 270 children aged 4–6 years old were recruited through schools in Cambridge and London. Demographic and language information was obtained through parental forms for 74 children, while the information for the remaining participants was obtained through school staff and proved to be highly reliable (over 98% match with the 74 questionnaires) when compared with available parental information. Of the 270 children, 138 (66 females, 72 males, mean age = 5 years 2 months, SD = 6.9 months) had been exposed daily to a second language for at least a year, including children in French immersion programme (n = 26), children identified through parental forms as sequential bilinguals (n = 18) or as simultaneous bilinguals (n = 26) and identified as bilingual by their teachers; for these children, the school described the child as having English as an additional language (EAL; n = 68) which is a technical term used in the UK educational setting to signify children whose dominant language is not English. The remaining 132 children (62 females, 70 males, mean age = 5 years 5 months, SD = 8.2 months) had no regular exposure to a language other than English. The average percentage of free school meals (FSM) in each of the participating schools was used as a proxy for socioeconomic status (SES) (see Hobbs & Vignoles, 2007, for a discussion on the use of FSM for this purpose). Languages other than French (immersion and home, n = 34) were Hindi (n = 15), Gujarati (n = 13), Tamil (n = 10), Romanian (n = 10), Urdu (n = 6), Polish (n = 6), Arabic (n = 5), Pakhto (n = 3), Portuguese (n = 3), German (n = 3), Cantonese (n = 2), Czech (n = 2), Somali (n = 2), Slovak (n = 2), ESL (n = 2), Albanian, Bahasa, Farsi, Greek, Hungarian, Italian, Lithuanian, Malayalam, Mandarin, Punjabi, Persian, Russian, Serbian, Sindhi, Sinhala, Swahili, Swedish, Thai, Turkish, and Vietnamese (all n = 1).
Contrastive inference task
The purpose of this experiment was to investigate whether bilingual children would outperform monolingual children in a fast mapping task where they had to pick the referent of a novel word on the basis of reasoning about the communicative intention behind a cue given by a speaker. While structural constructions such as focus or modifier are one way of creating a contrastive inference (as in Gelman & Markman, 1985), another way involves the use of prosody and, in particular, lexical stress (also referred to as emphasis). For example, while a hearer might not derive any particular implicit meaning from the utterance ‘It’s a very nice garden’, they could infer a contrast if prosodic stress is used to emphasise a particular word (‘It’s a very nice GARDEN’ e.g., [as opposed to the house next to it]). The interpretation and realisation of this prosodic stress appear highly dependent on context: the same noun-final stress (‘Give me the yellow ROSES’) can take either a neutral or contrastive reading and contrastive stress could be realised with either a simple H* target high or fall-rise L + H* (according to autosegmental theories, cf. Pierrehumbert & Hirschberg, 1990). Despite young children’s well-known general sensitivity to prosodic variation and their preference for infant-directed speech (Fernald, 1985), Cruttenden (1985) showed that prosody alone is not a reliable or salient enough cue for children to derive a contrastive inference but could facilitate such inferences if presented with other converging or supporting cues. This is supported by other studies which found that preschoolers manage to perform contrastive inferences that they previously failed at when given additional discourse support for a contrastive reading (Horowitz & Frank, 2015, Experiment 1) or explicit access to alternatives in the previous discourse (Horowitz & Frank, 2015, Experiment 3; Kurumada & Clark, 2017).
In view of this, we included a ‘stressed’ condition in our experiment. In this first experiment, children heard a prompt containing a novel word that had been modified either with a non-stressed or a stressed adjective (e.g., ‘Touch the WET/wet gorp’). They were shown a display with four novel aliens, two of which were semantically compatible with the prompt (i.e., two of the aliens were wet). However, only one of the compatible aliens had a dry counterpart which made the use of a contrastive modifier more likely. Given our hypothesis that the previously evidenced socio-pragmatic advantage in bilingual children is not uniquely the result of better inhibitory control or increased attention to speaker cues, we predict that bilingual children would show a better ability to use the contrastive inference to identify the referent of the novel word despite the fact that the modifier itself does not raise the salience of either compatible referent above the other and thus inhibitory control or better attention to the modifier would not lead to better performance.
Stimuli
Stimuli were 12 pictures each with 4 unknown aliens, 2 of which were of the same kind (e.g., 2 type A aliens, 1 type B alien, and 1 type C alien).
Pictures for the critical conditions contained a target and distractor featuring the target property (e.g., a wet type A alien and a wet type B alien) and a counterpart that did not feature the property (e.g., a dry type A alien) (see Figure 2). Pictures for the control condition contained only one alien featuring the target property. The properties were chosen so that they could be described by common adjectives, would be accidental/non-intrinsic (so that one alien could feature the property, while its counterpart did not), and could be used to describe modified or unmodified creatures (e.g., a wet alien or dry alien) to account for the potentially increased salience of modified aliens. Colours were avoided because of the confusion often displayed by children around that age in reliably recognising and naming them (Bornstein, 1985).

Example stimulus contrastive inference (non-stressed and stressed conditions) ‘Touch the wet gorp/Touch the WET gorp’.
Procedure
Stimuli were implemented and presented on a touchscreen laptop using Superlab version 5.0 (Cedrus, San Pedro, CA). Clicks were recorded as raw X and Y pixel coordinates of the point where the screen had been touched and answers were subsequently coded by matching the coordinates to the corresponding answer area among the four possible choices on the picture. Children were tested in a quiet room at their school. They were introduced through the computer to Mr. Puppet, who had recently made some alien friends and lent them some toys that he now needed to get back. They were then asked if they would like to help Mr. Puppet find the aliens, and given a test trial where they had to find the ‘wet gloop’ and ‘dirty gloop’ (feedback provided) before proceeding to the task. There were 4 trials of each type (12 total): non-stressed condition (e.g., ‘Touch the wet gorp’), stressed condition (‘Touch the WET gorp’), and control (similar instructions to the non-stressed condition, but only one alien had the target feature). Instructions were recorded by a female native English speaker using a Sennheiser ME 64 cardioid microphone connected to a Tascam HD-P2 Compact Flash Audio Recorder. Recordings were made in 24-bit mono with a sample rate of 44.1 kHz. Experimental design was within-subject, with 4 target features/adjectives (‘wet’, ‘dry’, ‘clean’, and ‘dirty’) and 12 novel words (gorp, pitack, rapook, lep, plonk, yubba, moozie, ral, flurg, dinkoo, patam, and tweep). There were two lists of items counterbalanced for word/alien pairings and target position. Trials were randomised.
Receptive vocabulary task
Children also completed a computerised version of the BPVS-3 (Dunn & Dunn, 2009) implemented on the touchscreen laptop to test receptive vocabulary. This is a picture-matching task in which children are asked to point to one of four pictures that match the word uttered. The items are arranged in blocks of 12, and the test continues until children have made 8 or more mistakes in a block. Children received two warm-up trials. Raw scores rather than standardised were used in the analyses as they indicated both vocabulary and developmental levels. Receptive vocabulary scores are also a more direct measure of the impact of socioeconomic factors on language development (the issue of interest here) than SES indices such as maternal education or household income. Instructions were recorded by a female native English speaker.
Results
The final results can be seen in Table 4. Data from 11 monolingual children were excluded from final analyses because of the experimenter’s error or child fussiness or failing to complete the task.
We were surprised to find a rather high percentage of semantic errors (e.g., children chose a dry character in response to a prompt for a ‘wet gorp’). For bilinguals, the percentage of non-semantic answers was 0.14 in non-stressed condition, 0.13 in stressed condition, and 0.17 in control condition. For monolinguals, it was 0.05 in non-stressed condition, 0.04 in stressed condition, and 0.07 in control condition. In order to see if target choices are significantly higher than distractor choices (in the target conditions only), we carry a t-test excluding non-semantic answers so that we can set the chance at 50% instead of 0.25%. The reason we originally set a fourth choice (which was not necessary for the task design) was to have a symmetric four-quadrant setup for the touchscreen. Ideally, only the target and distractor would have been accessible to touch for the children but this was not possible if we wanted to have the dry counterpart visible on the screen while they made their choice as well as a symmetrical setup.
However, because of the large difference in control performance between bilinguals and monolinguals (probably differences in attention ability due to the significant differences in age, SES, and vocabulary scores, although the latter is unlikely as we tested the knowledge of the adjectives used in a test trial), we wanted to include the control condition as a baseline and reference level in the regression model. Because of this, we include all answers in all conditions (including semantically incorrect ones) for the mixed-regression model so that the chance is at 25% for all conditions and the comparison with the control reference level is fairer. Unfortunately, only 60% of children made no semantic errors in the control condition, which meant that we would have lost almost half the participants if we excluded them.
Children were significantly above chance in the stressed condition (m = 0.56, SD = 0.49, t = 4.06, df = 947, p < .0001) but not in the non-stressed condition (m = 0.51, SD = 0.50, t = 0.59, df = 927, p = .55). This pattern applied to both monolinguals (non-stressed = 0.50, p = ns; stressed = 0.58, t = 3.66, df = 489, p = .0003) and bilinguals (non-stressed = 0.52, p = ns; stressed = 0.55, t = 2.06, df = 457, p = .04). However, numerically, the bilingual group does not perform as much above chance as the monolingual group. This may be due to cross-linguistic variations in contrastive stress which might render this condition more challenging for bilinguals. Preliminary analyses revealed that, compared with the monolinguals, the bilingual group was on average significantly younger (5 years 1 month vs. 5 years 5 months, t = 2.63, df = 364.9, p = .009), had significantly lower English vocabulary (British Picture Vocabulary raw scores 76 vs. 68, t = 6.3, df = 377.75, p < .0001), and had significantly lower SES (as calculated through averaging free school meal percentages previously normalised using national average and standard deviation, −3.00 vs. −2.51, t = −3.95, df = 372.32, p < .0001).
We use R and the lme4 package for our analyses. We start with a baseline model of item and participant intercepts (adding slopes results in non-convergence). Adding condition results in significant improvement, as well as bilingual group, and interaction between condition and bilingual group, and BPVS raw score. Adding age or FSM does not significantly improve the model (as verified by an analysis of variance); thus, the final model is a mixed-model regression with condition, bilingual group, and their interaction, as well as BPVS score, as can be seen in Table 1 (crit represents non-stressed condition and crita stressed condition). Fitted values for fixed effects can be seen in Figure 3. We additionally perform pairwise comparisons that can be found in Table 2.
Final mixed-regression model for the contrastive task.
Note. Intercept is the control condition. N_BPVS = raw vocabulary scores; bilB = bilingual group; condcrit = non-stressed condition; condcrita = stressed condition.
‘*’: significance at 0.05 level; ‘**’: significance at 0.01 level; ‘***’: significance at 0.001 level.

Plotted fitted values for fixed effects of contrastive task mixed-regression model.
Pairwise comparisons for all three conditions and the two language groups for the contrastive task.
Note. crit= non-stressed; ctrl = control; rita= stressed; M = monolingual; B =. bilingual.
‘**’: significance at 0.01 level; ‘***’: significance at 0.001 level.
The control condition was used as the reference level. Performance in the control condition was significantly correlated with higher vocabulary scores and monolingualism. Performance was significantly lower in both stressed (est = −2.26, SE = 0.19, z = –11.93, p <.0001) and non stressed conditions (est = –2.62, SE = 0.19, z = –13.83, p <.0001) compared to control. In addition, there was a significant interaction between condition and bilingual status, with bilinguals performing significantly better than expected from their control performance and demographic variables compared with monolinguals in both stressed and non-stressed conditions. This can be seen from the scores including only relevant (i.e., semantically correct) answers: while the difference between bilingual and monolingual performance in control is almost 10 points (0.92 vs. 0.83), this gap drops to only 3 points in stressed condition (0.58 vs. 0.55). This suggests that a demographically matched sample of bilinguals would have outperformed the monolingual group.
Discussion
There was an effect of condition in the stressed condition only. This may be due to the fact that the inference is relatively weak and it is reinforced by the additional cue of prosodic focus which helps direct children’s attention towards the use of the modifier, making it more ostensive and raising its informative potential. Since vocabulary scores and SES were both shown to significantly predict target choices, it was unsurprising to find that the bilingual group (who were younger, with lower vocabulary scores and lower SES) was performing significantly worse in control condition than the monolingual group. However, when this delay in structural language was accounted for by examining target (non-irrelevant) answers in both critical conditions relative to performance in the control condition, a significant interaction between bilingual status and condition was found, with bilingual children performing significantly better in the critical conditions than what would have been expected given performance in control condition and demographic variables. This can be seen from the scores including only relevant (i.e., semantically correct) answers: while the difference between bilingual and monolingual performance in control is almost 10 points (0.92 vs. 0.83), this gap drops to only 3 points in stressed condition (0.58 vs. 0.55). This suggests that a demographically matched sample of bilinguals would have outperformed the monolingual group. In this instance, bilinguals appear to be ‘doing more with less’. While bilinguals found themselves slightly impaired in the regular use of structural language and semantics for reference resolution in the control condition, they were almost as sensitive to and efficient as older, more vocabulary-proficient monolinguals at using the pragmatic cues (i.e., the use of a modifier) to reason about the speaker’s communicative intentions in providing that cue. This indicates that the differences in exerting pragmatic skills between bilinguals and monolinguals might go further than a simple ‘pragmatic bias’ whereby this particular type of cue is preferred to other types such as semantic content or object similarity.
We thus see a positive effect of bilingualism on relative performance in a task where no inhibition of an irrelevant cue was involved but where participants had to reason about the amount of information provided by the speaker. The next task (emotional affect) examines the performance of the same group of children in a case where (still without irrelevant cue to be inhibited) success was based on a similar type of (prosodic) cue but did not require reasoning about the amount of information provided. In this next case, success could be achieved directly by recognising and pairing emotional affect with an object state on the basis of a ‘natural’ conceptual relationship (e.g., sad tone and damaged object). If our hypothesis that enhanced pragmatic competence in bilinguals derives from a better ability to use informativeness to derive meaning and not simply from better attention to speaker cues is correct, performance should not differ between monolinguals and bilinguals.
Emotional affect task
Stimuli
In this task, children were shown a display with two novel objects and heard a prompt containing a novel word (e.g., ‘Oh, look at the dax! Have you seen the dax?’). They then were shown the two same objects, one of which had been negatively or positively modified (i.e., broken or dirtied, decorated with star or coloured pink, etc.), and heard another prompt with the novel word this time uttered with sad, happy, or neutral emotional affect (e.g., ‘[sad voice] Oh look at the dax! Can you touch the dax?’).
We predicted that there would this time not be any differences in performance between the monolingual and bilingual groups, since this task required simply to pair an emotional affect with a referent which had been made more salient by said affect, and not to reason about the communicative intentions behind providing the cue, that is, the cue was self-explanatory or self-sufficient and there was no need to reason about its informativeness.
The task was adapted from Berman et al. (2013). Stimuli were 12 pictures, each displaying 2 novel objects, 1 damaged object (i.e., dirtied or broken: mud splash, green splash, hole, or dismantled parts), and one enhanced object (i.e., featuring flower, star, or brightly lit up). An example stimulus can be seen in Figure 4.

Example stimulus emotional affect ‘(sad/happy/neutral voice) Can you touch the figoo?’.
Procedure
Stimuli were implemented and presented on a touchscreen laptop using Superlab 5.0 (Cedrus, San Pedro, CA). Clicks were recorded as raw X and Y pixel coordinates of the point where the screen had been touched and answers were subsequently coded by matching the coordinates to the corresponding answer area among the two possible choices (left or right). Children were tested in a quiet room at their school. The task was part of Mr. Puppet’s story and his meeting with aliens who played with strange new objects and damaged some of them but also made some of them ‘look prettier’. Children first completed two test trials, one mutual exclusivity trial (to ensure they paid attention to the linguistic information) and one to test their understanding of the task. In each trial, they were presented first with the two novel objects in an unaltered state and prompted with ‘Oh! Look at these, have you seen these?’ and then the altered objects along with an instruction of the type ‘Oh! Look at the nurmy, can you touch the nurmy?’ recorded with a positive, neutral, or negative voice (emotional affect). The instructions were recorded by a female native speaker of English in a soundproof room using a Sennheiser ME 64 cardioid microphone connected to a Tascam HD-P2 Compact Flash Audio Recorder. Recordings were made in 24-bit mono with a sample rate of 44.1 kHz and checked for the prosodic and amplitude features characteristic of each type of affect. Experimental design was within-subject with 4 trials in each condition (positive, neutral, and negative, 12 in total). There were two lists of items counterbalanced for word/object pairings and target position. Trials were randomised.
Results
We again use R and the lme4 package. We start with a baseline model with participant and item intercepts (adding slopes leads to non-convergence again). Adding condition does improve significantly the model, but adding bilingual group does not. Adding vocabulary raw scores also improved the model, but neither age nor FSM percentage did. The final model thus has condition and BPVS scores as fixed effects. The final model can be seen in Table 3 and fitted values of fixed effects in Figure 5. Results showed that negative responses were above chance in control condition, significantly higher in negative condition than control but not in positive condition. There was also a significant effect on vocabulary scores with higher scores leading to a negative bias in control. No other effects were significant.
Final mixed-regression model for the affect task.
Note. Intercept is the control condition. N_BPVS = raw vocabulary scores; condneg = negative affect condition; condpos = positive affect condition.
‘*’: significance at 0.05 level; ‘**’: significance at 0.01 level; ‘***’: significance at 0.001 level.

Plotted fitted values for fixed effects of contrastive task mixed-regression model.
Discussion
The full summary of results for both tasks can be found in Table 4. Plots of performance for the contrastive task can be found in Figures 6 and 7 and for the affect task in Figures 8 and 9. There was a negative bias in the neutral affect control condition whereby negative choices (i.e., dirty or broken objects) were preferred to positive ones (i.e., enhanced/decorated objects). This default preference for a negative interpretation or outcome is found in the literature and potentially results from a higher salience of negative versus positive events (Berman et al., 2013), which is why answers in the negative and positive conditions were compared with performance in the control condition rather than to chance performance in the regression model. Children also succeeded in picking the correct answer only in the negative condition and seemed to fail to perceive or use positive affect to direct their referent choices in the positive condition. This is an effect that has previously been found and is potentially also the result of a higher salience of negative versus positive. Just as negative events appear to be more salient than positive ones, negative prosody or emotional affect might have higher salience than positive, which would lead to children being able to interpret and use it earlier (Berman et al., 2010; Nelson & Russell, 2011). The higher salience of negative events and affect appears rational from a biological/survival instinct point of view, since ignoring warning about bad outcomes might have direr consequences than ignoring good news.
Summary of results.
Note. Raw scores for the contrastive task include all types of errors, i.e., chance = 0.2.

Contrastive task: all children.

Contrastive task: bilinguals and monolinguals.

Affect task: all children.

Affect task: bilinguals and monolinguals.
No interaction between bilingual status and performance in critical condition relative to control condition was found this time, i.e., bilinguals performed both worse than monolinguals and on par with what was expected from demographic variables. However, performance in this task relied entirely on being able to recognise and associate the specific valence of the cue (negative or positive) with the corresponding event (negative, i.e., damaged object, or positive, i.e., enhanced object). Since the prompts always had the same linguistic form, there was no advantage of having a stronger vocabulary knowledge of modifiers as in the first task, and contrary to specific language exposure, there is no principled reason why bilingually exposed children should have had more or less experience with or in interpreting different types of emotional affect. The psychological literature has generally distinguished between cognitive and affective perspective-taking on the grounds that they appeared to rely on different abilities and there has recently been neuroimaging evidence to support this distinction, with different substrates being used in the regions of the brain involved in related tasks, and executive function playing a greater role in the former (Healey & Grossman, 2018). We hypothesise that our contrastive task taps into cognitive perspective, while the affect task engages more affective perspective-taking. Furthermore, we hypothesise that solving the specific challenges involved in learning language in a bilingual environment may rely more heavily on developing cognitive than affective ability and thus be more likely to affect the development of the former rather than the latter kind.
Unlike in the contrastive task, success in this task did not require reasoning about why the speaker was providing a certain cue since emotional affect, contrary to prosodic stress, has an intrinsic value or valence (negative or positive) which can be directly linked to the corresponding (damaged or enhanced) referent. We further develop this explanation in the general discussion.
General discussion
The goal of this paper was to investigate monolingual and bilingually exposed children’s use of prosodic cues for reference resolution of a novel word. Previous literature has examined bilingual and monolingual children’s performance in reference resolution tasks that involved using a socio-pragmatic cue such as pointing, eye gaze, or emotional affect and found bilingual children generally performing better than monolinguals (e.g., Yow et al., 2017; Yow & Markman, 2015). However, given that the experimental paradigms used required ignoring an irrelevant cue (e.g., semantic meaning, object similarity, or object salience) in favour of the pragmatic one, it remained unclear whether these results were driven (a) by differences in executive functions and specifically the ability to inhibit dispreferred cues, (b) by differences in attentional biases (i.e., more attention, in general, towards social of cues or a preference over other types of cues), or (c) by differences in pragmatic reasoning per se (i.e., reasoning about speaker’s communicative intentions). Given that the general bilingual cognitive advantage has been difficult to replicate (cf., Paap & Greenberg, 2013) and that none of the studies that found a socio-pragmatic advantage found a difference in executive control between monolinguals and bilingually exposed children (cf., Champoux-Larsson & Dylman, 2018; Fan et al., 2015; Yow & Markman, 2015, 2016), this possibility did not seem promising. Instead, here we focused on investigating the latter two possibilities. To do so, we tested monolingual and bilingually exposed children aged 4–6 years old in two tasks using different types of prosodic cues for reference resolution of a novel word: contrastive stress (e.g., ‘Touch the WET gorp’) where pragmatic inference is facilitated by the stress marking and emotional affect (e.g.,‘[sad tone] Oh look at the figoo, can you touch the figoo?’) where there is no need for pragmatic inference beyond recognising the affective content of the cue. Importantly, in both of the tasks, successful performance did not rely on ignoring an irrelevant cue and therefore did not rely on inhibitory control abilities, which might explain some of the previous findings.
We demonstrated that prosodic stress does significantly improve the performance of a contrastive inference for novel word fast mapping in children as well as adults (an open question, cf. Kronmuller et al., 2014), with children appearing to be unable to reliably perform such an inference when the use of a contrastive modifier was not emphasised by contrastive stress. This is in line with previous work with adults demonstrating a facilitating effect of intonational focus for contrastive inference in referent resolution (Sedivy et al., 1995). We also found a gender difference, with males producing less pragmatic answers than females, an effect that has been documented before (Stiller et al., 2015).
Moving to the findings that are critical to our research questions, we found a significant interaction between bilingualism and relative performance, with bilingually exposed children performing better than expected from control condition and demographic variables in the contrastive stress task (i.e., close to monolingual performance despite being younger, with lower SES and lower vocabulary levels) but not in the emotional affect task (where the ability to overcome a negative bias was as expected from demographic variables in both groups). However, there was a main difference between the two types of prosodic cues in our two tasks: contrastive stress required pragmatic reasoning to be interpreted (since it had no intrinsic valence or meaning, i.e., the hearer will know that the modifier ‘wet’ has been emphasised but since there are two wet aliens this by itself will not be enough to resolve reference without asking why it was emphasised). Emotional affect, on the contrary, intrinsically contained enough information for reference resolution by simple association (i.e., pairing negative emotion with negative event). As we have said, this can be related to Grice (1975)’s dichotomy between natural meanings and non-natural meanings, the difference being the need to recognise a communicative intention behind the cue. That a number of significant effects related to performance in the first task (such as gender or SES) were not significant in the latter further emphasises that the two tasks involve different sets of skills.
Implications for cognitive development
A growing number of studies have found evidence of a bilingual advantage in pragmatic reasoning, which we appear to find here in a relative form. Given that our task did not involve the need to actively inhibit an irrelevant cue, we wish to suggest that, if such an advantage exists, it derives from a better capacity to reason about speaker’s intentions, rather than better executive skills in general. In the ‘Introduction’ section, we suggested a number of factors in the development of bilingual children which both pose a challenge to bilingual children compared with monolinguals but also help hone bilingual children’s skills in specific advantageous ways: first, lack of some learning strategies that monolinguals have access to (e.g., mutual exclusivity); second, more varied and challenging linguistic input (e.g., code switching), and third, a more challenging communicative environment (e.g., communicative breakdowns and predicting linguistic behaviour of interlocutors). We propose that these aspects of the bilingual child’s experience sharpen their use of some other tools they have in their repertoire, and in this study, we showed that these include pragmatic reasoning. In this respect, bilingual children are able to compensate for some of the challenges of their environment by being particularly adept at inferring their interlocutors’ meaning. While, in terms of raw numbers, the bilingual group in our study did not outperform their monolingual peers, they did do better with the critical conditions that required pragmatic inference than would have been expected from control performance and demographic variables. This is potentially a case of ‘doing more with less’.
Implications for pragmatic skills
The lack of a relative bilingual advantage in the second task suggests that these two sets of skills should perhaps be distinguished systematically in research in pragmatics. The ability to use social cues (or speaker-related information) should not be confused with pragmatic reasoning (or making inferences about speakers’ intentions). They require different types of abilities (sensitivity to pragmatic norms, differential allocation of attention or theory of mind, and inferencing), and they may have separate developmental timelines. The distinction between these types of competence might help explain to some extent the variation of the age at which competence with pragmatics is evidenced. Some studies suggest that children are able to use social cues for word learning from a very young age (around 10–12 months) yet when it comes to computing demonstrably pragmatic inferences the evidence of child competence appears later, around 3–4 years old (Frank & Goodman, 2014; Gelman & Markman, 1985). It is possible that very young children’s first use of social cues (and possibly at least part of their subsequent use) relies on simple associative mechanisms and salience and allocating attention efficiently and appropriately to the social environment (i.e., focussing on speaker-related information) without requiring intention-reading through the theory of mind, or the actual computation of a pragmatic inference. This is not to say that children always use social cues as attentional devices and do not engage in inferences about intentions (studies have repeatedly shown that infants are able to find the referent of a novel word as the one the speaker has in mind but is not physically present or accessible, e.g., Akhtar et al., 1996; Tomasello et al., 1996), but simply that they do not necessarily do so in each case. As Ambridge and Lieven (2011) point out, while there have been many studies showing that infants are able to understand other’s intentions, only a few have actually demonstrated that they use this understanding to choose between potential referents for a word (Diesendruck, 2005; Diesendruck & Markson, 2001; Tomasello & Akhtar, 1995). It is possible that, in certain contexts, pragmatic understanding which has traditionally been seen as the result of intention-reading is in fact obtained through more general mechanisms using the linguistic information or social cues in a more direct way and ‘shortcutting’ the pragmatic process. The existence of such an ‘egocentric’ pragmatic competence would then contribute to explaining some pragmatic understanding displayed by children with autistic spectrum disorders (Kissine et al., 2012, 2015; see also Andrés Roqueta & Katsos, 2017, 2020; Katsos & Andrés Roqueta, 2021).
Conclusion
We presented the results of two tasks investigating monolingual and bilingually exposed children’s use of pragmatic inference for fast mapping a referent and novel word. We find the bilingually exposed group to perform above expected levels in the first task, where above chance performance could only be achieved by using a prosodic cue to reason about speaker’s communicative intentions. We conclude that this work provides suggestive evidence for a comparative bilingual advantage in pragmatic reasoning which is not due to better inhibitory skills or to a generally higher sensitivity to social cues, but to performing pragmatic inference by reasoning about communicative intentions in the context of word learning. We propose that these differences are the result of adapting to the challenging aspects of a bilingual learning environment, such as higher risks of miscommunication and the need to efficiently and quickly acquire words from complex and variable input. This is therefore another instance where the highly adaptive nature of child cognition is evidenced, and a case of ‘doing more with less’. Moreover, empirically validating the distinction between using social cues and reasoning about intentions underlying social cues is a fundamental cornerstone of pragmatic theory and may help provide insights about separate developmental timelines for pragmatic competence.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The support of the following research funding bodies is gratefully acknowledged: Isabelle Lorge is grateful to the Wiener-Anspach Foundation and St John’s College, Cambridge for MPhil and PhD studentships respectively, while Napoleon Katsos was supported by a UK Arts and Humanities Research Council grant (Multilingualism: Empowering Individuals, Transforming Societies, AH/N004671/1).
