Abstract
The present study experimentally investigates the role of epistemic vigilance in children’s comprehension of verbal irony. We tested 186 children aged from 4/5 to 7/8 using an offline irony comprehension task and an epistemic vigilance task. In the vigilance task, we assessed children’s tendency to trust reliable informants and mistrust unreliable ones. The same informants appeared in the irony task as the target of the ironical remarks, allowing us to examine whether prior information about their reliability influenced irony interpretation. Our findings indicate that children’s epistemic vigilance significantly predicted irony comprehension when irony was directed at an unreliable target. These results provide empirical support for the hypothesis that epistemic vigilance enhances understanding of the dissociative, critical stance expressed by the ironical speaker.
Introduction
Whether it is a playful remark from a parent, a humorous line in a cartoon, or a mocking comment from an older sibling, children, like adults, encounter instances of irony regularly. Despite its frequent occurrence in everyday communication, they often fail to comprehend it and mistake it for an error or a lie (Matsui, 2019). To illustrate this, consider the following example: A mother asks her daughter to clean up a messy table. If the child does not fulfil this request, the mother could later utter: ‘Well done, the table is really clean!’. In principle, several readings of this utterance are possible: it could be interpreted as a genuine mistake (the mother has not realised that the table is still messy), a polite lie (she does not want to hurt her daughter’s feelings), or an ironical remark (she is expressing a critical stance). These alternative interpretations map onto a developmental trajectory of irony (mis)understanding. Children typically interpret such remarks as errors at first, assuming that the speaker is simply unaware of the true state of affairs. Later, they tend to view them as lies, attributing to the speaker an intention to deceive. Only around the age of 6/7 do children come to recognise that these utterances are intentional but non-deceptive falsehoods, used to convey a critical or mocking attitude. This opens up the question of what makes older children able to understand the communication of a deliberate, but not deceptive, falsehood (for a discussion, see Mazzarella & Pouscoulous, 2021). An ironical speaker intentionally utters a proposition that is false or contextually inappropriate, while remaining epistemically competent (being aware of its falsity) and communicatively sincere (not intending to deceive). Crucially, their main goal is to implicitly communicate a critical or dissociative stance towards the literal content of the utterance (Wilson & Sperber, 2012).
In the example above, the mother’s comment does not reflect her actual belief but signals a deliberate rejection of the literal meaning and the underlying expectation that the child should have cleaned the table. By presenting praise in a context where it is clearly undeserved, the speaker highlights the gap between expectation and reality, prompting the listener to infer their evaluative stance. Therefore, to grasp irony and exclude alternative interpretations, children must actively assess the speaker’s competence and benevolence and infer the speaker’s implicit attitude. Given this, it is not surprising that irony understanding emerges relatively late in development, with substantial evidence indicating it is rarely understood before the age of 6 (for a review, see Fuchs, 2023; Milosavljevic, 2024).
This prolonged developmental trajectory also suggests that irony comprehension is likely to depend on a range of socio-cognitive abilities underlying belief, intent and attitude recognition (Pexman, 2023). Developmental studies show that successful irony interpretation often correlates with children’s vocabulary and general language skills (Matthews et al., 2018), as well as executive functions—particularly cognitive flexibility, which may support shifting between literal and nonliteral interpretations (Zajączkowska & Abbot-Smith, 2020). In addition, irony often involves a dual emotional intent—simultaneously critical and humorous—which can be difficult for children to interpret as it relies on emotion recognition skills, particularly the ability to understand that a speaker can express conflicting emotions, such as disapproval and amusement, at the same time (for further discussion, see Pexman, 2023). However, irony comprehension is most strongly linked to the development of higher-order Theory of Mind (ToM) abilities, especially as reflected in second-order belief reasoning (for a discussion, see Mazzarella & Pouscoulous, 2023). This connection arises from the shared demand to represent embedded mental states—for example, understanding that Mum thinks the table is messy and that she wants the child to recognise this as well. Crucially, though, irony comprehension goes beyond belief attribution alone: it requires recognising the speaker’s dissociative attitude towards the literal meaning—an interpretive step that hinges on understanding not only what a speaker believes, but how they position themselves relative to what they say.
The present study contributes to research on the socio-cognitive foundations of irony comprehension by examining one specific component of this repertoire—epistemic vigilance—which is hypothesised to support children’s developing understanding of irony (Matsui, 2019; Mazzarella & Pouscoulous, 2021, 2023; Wilson, 2009). Epistemic vigilance refers to the exercise of critical alertness towards incoming information and relies on the capacity to assess the reliability of a source of information, by considering both its competence (whether the source possesses accurate knowledge) and its benevolence (whether they are willing and motivated to provide truthful information), and the credibility of the content of information (its truthfulness and relevance), and to calibrate trust accordingly (Sperber et al., 2010). Research shows that even pre-schoolers do not trust informants indiscriminately, but instead calibrate their trust based on cues such as the informant’s knowledge and honesty (for a review, see Sobel & Finiasz, 2020). Nevertheless, the relationship between irony and epistemic vigilance has not yet been systematically investigated (but see Milosavljevic et al., 2025). In what follows, we shall first elaborate on the nature of this relationship and then present and discuss our experimental findings on irony understanding and epistemic vigilance in children aged between 4/5 and 7/8.
Epistemic Vigilance and Irony Comprehension
Irony comprehension may rely on the exercise of epistemic vigilance in different ways (for an in-depth discussion, see Mazzarella & Pouscoulous, 2023). First, to rule out alternative interpretations, such as error or deception, children need to critically appraise the speaker’s epistemic and moral reliability. More specifically, by actively assessing the speaker’s competence, children can exclude the possibility that they are mistaken (that they falsely believe that the literal proposition is true), and by assessing the speaker’s honesty, they can exclude the possibility that they are deceptive (that they wish to persuade the addressee that it is true). Second, as irony often targets propositions that are manifestly false or contextually inappropriate, irony understanding may also be facilitated by the evaluation of the credibility of the proposition literally expressed. By exercising vigilance towards the content of the ironical statement, children can detect its incongruence with the available contextual information or background knowledge, which represents one of the strongest cues to irony (see, e.g., Deliens et al., 2018; Rivière et al., 2018). Finally, to comprehend irony, children must infer the speaker’s dissociative stance, recognising that the speaker is not endorsing the literal content but is instead distancing themselves from it. Its recognition hinges on the attribution of epistemic vigilance to the speaker: children must infer that the speaker has identified the falsity or absurdity of the targeted proposition, and the unreliability of its source, and is communicating a dissociative, evaluative stance. In line with this view, Mazzarella and Pouscoulous (2021) argue that irony comprehension is facilitated when the unreliability of the target is made salient in the discourse context. The irony target refers to the individual or entity about whom the ironical remark is made, that is, the person whose behaviour, statement, or expectation is implicitly criticised or mocked through irony (e.g., the child who has failed to clean the table properly). When the irony target has previously been framed as an incompetent or untrustworthy source of information, the speaker’s utterance may be more readily interpreted as an ironical critique or mockery. For all these reasons, the emergence of irony understanding is hypothesised to be closely intertwined with the development of children’s epistemic vigilance (Mazzarella & Pouscoulous, 2021, 2023; Scianna, 2023).
In line with this hypothesis, Milosavljevic et al. (2025) conducted a pioneering study investigating children’s epistemic vigilance ability to evaluate the reliability of a source of information and revealed that this ability predicts irony understanding. Children were exposed to two informants who systematically provided conflicting testimonies and were then asked which of the two informants (accurate vs. inaccurate) they would approach to obtain a piece of relevant information. Children’s epistemic vigilance was thus assessed through a selective-trust task, measuring their ability to prefer information from a reliable over an unreliable source. Furthermore, they were tested on irony comprehension via a picture-selection task. Their results provided the first experimental evidence that 5/6 and 6/7-year-old children’s performance in a selective-trust task predicted their success in irony understanding. Vigilant children who selectively trusted a reliable informant over an unreliable informant were more likely to understand verbal irony than their less vigilant peers, thus corroborating the hypothesis that epistemic vigilance may buttress irony comprehension. However, the exploratory nature of the analysis that showed that epistemic vigilance predicts irony understanding calls for a conceptual replication of this finding.
The Present Study
This study aims to investigate the developmental trajectory of irony understanding and the role of epistemic vigilance in irony comprehension by (a) manipulating the reliability of the target of the irony and (b) assessing whether children’s epistemic vigilance is a good predictor of irony comprehension. For this reason, we decided to include a broader age range from 4/5 to 7/8, which encompasses but is not limited to the age group in which irony understanding is more susceptible to emerge, thereby allowing for a more comprehensive exploration of its developmental trajectory.
The experiment comprised an Induction Phase and a Test Phase. In the Induction Phase, children were presented with short stories involving an informant whose reliability (reliable vs. unreliable) was explicitly manipulated. To assess children’s epistemic vigilance, we used a false communication task measuring their tendency to trust reliable and mistrust unreliable informants based on both dispositional (trait-based) and behavioural (performance-based) cues. Trait-based cues reflect stable characteristics, whereas performance-based cues concern observable, contingent behaviour within the task, such as prior accuracy. This allowed assessing their ability to monitor an informant’s past accuracy, integrate multiple epistemic cues and adaptively calibrate their trust in response. In the subsequent Test Phase, the same informant (previously framed as reliable or unreliable) was portrayed as the target of an ironical remark. Children’s pragmatic ability to interpret the target utterances as literal or ironical was measured using the same picture-selection task as in Milosavljevic et al. (2025), originally inspired by Köder and Falkum (2021). This design allowed us to examine whether children’s prior assessments of an informant’s reliability influenced their interpretation of irony, shedding light on how epistemic tracking contributes to pragmatic understanding.
We expected to track a developmental trajectory in irony understanding, with older children performing better in the irony comprehension task than younger ones. Furthermore, we predicted that irony understanding would be facilitated in the unreliable condition, where the dissociative attitude communicated by the ironical speaker could be more promptly inferred. We also predicted that more vigilant children would be more accurate in the irony comprehension task. The study was preregistered on the Open Science Framework (OSF) at the following link [https://osf.io/nms5p/].
Methods
Participants
One hundred and eighty-six Swiss French-speaking children took part in the study, distributed across the four school levels: 1 HarmoS 1 (4/5-year-olds), 2 HarmoS (5/6-year-olds), 3 HarmoS (6/7-year-olds) and 4 HarmoS (7/8-year-olds). Twelve participants were tested but were not included in the analysis: four participants were excluded due to having an official diagnosis of autism spectrum disorder, one participant was not a native French speaker and thus was excluded due to not meeting the language requirement, and seven participants were excluded as they did not meet the age criterion (i.e., a maximum of 12 months age range between participants within a given class level with the critical window set on 1.08 of the relevant year). Thus, the final sample included 174 participants with no history of speech and language difficulties or any known visual, hearing, or cognitive impairment: 37 4/5-year-olds (girls= 24, Mage = 5;21, range = 4;8–5;7 years), 52 5/6-year-olds (girls=33, Mage = 6;27, range = 5;8–6;8 years), 39 6/7-year-olds (girls =19, Mage = 7;30, range = 6;8-7;8 years) and 46 7/8-year-olds (girls=26, Mage=8;25, range=7;8–8;7 years). They were recruited and tested in primary schools in the Canton of Neuchâtel, Switzerland, that agreed to participate. Written informed parental consent and participants’ verbal assent were obtained before testing. Each child received an age-appropriate book as a reward for participating in the study. An adult control group of 20 native Swiss French-speaking graduate and undergraduate students (female=15, Mage=23; range: 20–29 years) was included to ensure the materials and experimental design were sound and that their performance was at ceiling. The study was approved by the Research Ethics Committee of the University of Neuchâtel.
Materials and Procedure
Each child participated in a single 15-min session consisting of two blocks of short stories presented on a computer using PowerPoint. The stimuli involved pre-recorded stories depicting interactions between a mother and a child (a boy or a girl), accompanied by illustrations. After each story, children answered questions targeting different aspects of the narrative. Each participant saw two Induction stories, two Familiarisation stories and five Test stories in each block, presented in that order. The induction stories served to introduce the reliability manipulation, while also providing an independent measure of vigilance. The familiarisation trials were designed to familiarise participants with the narrative format and to ensure they could reliably differentiate between the two emoticons used throughout the test phase. The test trials were always presented last and were intended to assess children’s comprehension of various utterance types, including irony. The only differences between the two blocks were the gender and reliability of the child character: one block featured stories involving a girl character, and the other block featured stories with a boy character. Also, one block involved stories with a child being presented as reliable, and the other block involved stories with a child being portrayed as unreliable. These were fully counterbalanced across participants. All sessions were video recorded for subsequent coding (See Figure 1 for an overview of the structure of the experiment).

An overview of the experimental session with both blocks.
To maximise ecological validity while maintaining experimental control, in all the stories, all character utterances—including the full set of the mother’s statements and children’s lines (voiced in childlike tones)—were pre-recorded by a professionally trained actress. Only the neutral narrative segments (story descriptions and comprehension questions) were delivered live by the experimenter, to allow for natural pacing and interactive engagement with the child.
Induction Stories
Four stories with images were created involving an interaction between a mother and a child (either a boy or a girl) (See Figure 2 for an example of two Induction stories). Before seeing the first story, the experimenter provided an explicit description of the child’s reliability, who was presented either as reliable (e.g., ‘Emma is really attentive, and often says things that are true.’) or as unreliable (e.g., ‘Emma is a bit forgetful, and often says things that are wrong.’). Then, the participants would be presented with the story setting. For each story, the reliable/unreliable child would answer a question of the mother (e.g., question ‘How did the race end?’ —answer ‘The rabbit won’/ ‘The rabbit lost’, see Picture 2 in Figure 2). Participants were asked to choose between the two images representing different story outcomes (e.g., ‘What do you think happened in the end? Did the rabbit win the race or did the rabbit lose the race? Can you point to the right picture?’). This served as a first measure of epistemic vigilance, assessing the children’s ability to consider the informant’s reliability trait when providing their response, that is, whether they are able to trust the reliable informant’s testimony and mistrust the unreliable informant’s testimony and select the correct outcome accordingly (Epistemic vigilance measure 1) (See Picture 3 in Figure 2). Then, participants received feedback about the (in)accuracy of the testimony of the (un)reliable informant: the mother in the story shows a piece of relevant evidence. The feedback was accompanied by a visual depiction of the piece of evidence (e.g., the image of a rabbit holding the trophy) and was explicitly confirmed by the experimenter (‘Look at the picture, what Emma said is wrong. The rabbit won the race, Emma is really forgetful, she made a mistake again.’) (See Picture 4 in Figure 2). The participants would then see a second story with the same structure. Participants’ choice to endorse or reject the testimony of the (un)reliable child was used as a second measure of epistemic vigilance, based on both past (in)accuracy and trait description (Epistemic vigilance measure 2) (See Picture 6 in Figure 2).

An example of two Induction stories (translated into English) in the Unreliable condition. The utterances produced live by the experimenter are in bold, while the mother and child’s statements were pre-recorded.
Familiarisation Stories
Children were then exposed to two familiarisation stories, each featuring the same two characters (the mother and the child) and accompanied by one illustration showing the outcome of the child’s behaviour, which could be either positive (e.g., the child making a beautiful drawing) or negative (e.g., the child spilling water all over the terrace). The mother would react with a positive comment (e.g., ‘What a beautiful drawing! You’ve really done a nice job!’) or a negative comment (e.g., ‘What a mess! There is water everywhere!’). Children were asked to select one of the two face emoticons that best depicted the feelings of the mother (‘Can you tell me how is Mum feeling? Can you please point to the image?’). The experimenter would provide feedback to children when they made a mistake and explain why a specific emoticon was appropriate for each situation, ensuring that the children could distinguish between them accurately.
Test Stories
Ten pre-recorded and illustrated stories were created that involved interactions between the same mother and the child. The structure of each story was consistent (see Figure 3 for a test story timeline). Each story began with the mother making a request that involved adhering to a specific social norm, such as cleaning up the sofa, tidying a messy table or a room, dressing appropriately, or placing items in their proper location. A control question was then administered to check children’s understanding of the mother’s request by asking them to choose between the picture expressing the desired state of affairs versus the one expressing the undesired one. The child character was then shown to either fulfil or fail to fulfil the mother’s request. In both cases, the child would state to have met the request regardless of the outcome. The mother’s reaction varied depending on the outcome. More specifically, in case of a positive outcome, she would be complimenting the child’s positive action by uttering a Literal praise (e.g., ‘Well done! The table is really clean!’). In case of a negative outcome, she could be criticising the child’s unsatisfactory behaviour by uttering a Literal criticism (‘That’s bad! The table is still messy!’), or she could react with an Ironical criticism that had the same surface structure as Literal praise but was delivered with a subtle ironical intonation. 2 A key addition to the study design was the inclusion of a Control Condition, in which the incongruity between the mother’s expectation and the actual state of affairs would prompt a positive literal statement (e.g., ‘The table is not quite tidy, but it’s ok, you have helped me a lot to do the shopping this morning.’). Its purpose was to check that children did not uniquely rely on the (positive/negative) state of affairs to reply but considered the mother’s utterance. Importantly, the picture representing the negative outcome differed from the one representing the undesirable state of affairs in the control question by presenting an intermediate negative outcome and demonstrating that the child made some effort to comply with the mother’s request, though still being far from the desired one (thus avoiding that the child would be perceived as deceptive). After hearing the target utterance, children were presented with the picture-selection task and were asked to judge the inner feelings of the mother by pointing to one of the two face emoticons (angry vs. happy). Each participant saw only one version of each story, although Figure 3 illustrates all possible variants.

A test story timeline (translated into English) for all four utterance conditions. The narrative text and questions (in bold) were produced by the experimenter, while the statements of the mother and the child were pre-recorded by a native Swiss French speaker.
The stories used for the Literal criticism and Literal praise were preceded by additional narrative elements meant to reinforce the reliability manipulation. Each story would start with the mother asking a question about a given state of affairs that was always negative (The mother asks Emma: ‘Is the table clean?’ and the table is dirty) followed by the child’s response that was inaccurate in the Unreliable condition (‘Yes, the table is clean!’) or accurate in the Reliable condition (‘No, the table is messy!’”). A control question was then asked to check the understanding of the story and the reliability manipulation (e.g., “Look at the picture, can you tell me if what Emma said was right?’). The experimenter would correct the children if they failed to provide the correct answer to this question to ensure they understood the reliability manipulation. After that, we would proceed with the story as described above, beginning with the mother’s request to the child to act upon this negative state of affairs (e.g., to clean up the messy table) (See Figure 3)
In the Induction Phase, the order and valence of the stores were fixed for all participants: the boy block always began with a negative-outcome story, while the girl block started with a positive-outcome story. This structure was chosen for simplicity. Meanwhile, block order and reliability assignment were fully counterbalanced across participants to ensure balanced representation and prevent systematic bias. In the Familiarisation Phase, the order of the two stories representing a positive or negative outcome was counterbalanced.
In the Test Phase, participants were presented with five stories per block in four Utterance Conditions (Literal Criticism, Irony, Literal Praise, Control). Each story included a comprehension question and ended with a target utterance. The position of the correct emoticon was counterbalanced across participants, ensuring that each participant saw it on the same side for every story. The order of presentation of the Utterance Conditions was fixed (Literal Criticism-Irony-Literal Praise-Irony-Control). 3 Furthermore, the position of the image representing the correct answer in the comprehension questions was also fixed (Right, Left, Left, Right, Left). Each participant saw two stories in the Irony Condition per Block, or four in total.
All the stories in French and their translations in English are available at the OSF link: [https://osf.io/kgdnr/].
Coding
Correct answers across various tasks in the experiment were coded as 1, while incorrect answers were coded as 0.
Induction Stories
Participants’ answers were coded as correct if they endorsed the testimony of the reliable informant and rejected the testimony of the unreliable informant. Children’s epistemic vigilance performance was scored from 0 to 4, with one point assigned for each correct response across the four vigilance stories. This composite score served as the vigilance measure in the main analyses.
Test Stories
The answers to the control question were considered correct if the picture representing the desired state of affairs was chosen and incorrect if the picture depicting the undesired one was selected. The answers to the utterance comprehension questions (happy vs. angry emoticon) were coded differently depending on the condition. The choice of the happy emotion to depict the mother’s feelings was considered correct in the Literal praise and Control Conditions, while the angry emoticon was considered correct in the Literal criticism and Irony Conditions. Irony comprehension performance was also scored from 0 to 4, but responses were analysed at the item level rather than as a composite score to examine performance patterns across conditions.
Results
Data processing, analyses and plotting were conducted using the R software (version 4.2.3). All materials, including R scripts for generalised linear mixed-effects models (GLMMs), descriptive statistics, diagnostic plots and supplemental files, are available on the OSF platform at the following link: https://osf.io/kgdnr/. For the GLMM analyses, we started by fitting theoretically driven models aligned with our main hypotheses. In case of convergence issues, we proceeded by simplifying the random part and, if needed, the fixed-effect structure. We checked model assumptions using the DHARMa package for diagnostic inspection (Hartig, 2018). Analysis of Deviance with type III Wald Chi-square tests was used to assess each term; only those with statistically significant results were further examined. To clarify specific contrasts and facilitate interpretation, Tukey-corrected least square means post hoc tests were conducted using the emmeans package (Lenth, 2016).
As a preliminary step, we checked for potential effects of Gender and Block. We found no association between either of the two variables and Performance, so we excluded them from further analyses (see the R script on the OSF for details, https://osf.io/kgdnr/).
Children’s Comprehension of Ironical Versus Literal Utterances
We then conducted descriptive analyses and performed multiple comparison t-tests to determine whether performance exceeded chance level for each Utterance Type (Literal criticism (LC), Literal praise (LP), Irony, Control) and School Year group (4/5-year-olds, 5/6-year-olds, 6/7-year-olds and 7/8-year-olds), applying the Benjamini-Hochberg procedure to control the False Discovery Rate (FDR) at 5%. The results of t-tests showed that all children were at ceiling for LP, and all children, but the 4/5-year-olds (who performed above chance level, p < .001), were at ceiling for LC. As expected, all four groups performed significantly above chance level but not at ceiling in the Control Condition (p < .001). In the Irony Condition, the 5/6-year-olds (p = .028) and 7/8-year-olds (p = .003) performed significantly above chance, while 4/5-year-olds (p = .216) and 6/7-year-olds (p = .738) performed at chance. The control group of adults was at ceiling in all literal conditions and performed significantly above chance level (albeit not at ceiling: M=91%) in the Irony Condition (p < .001). 4
Irony Comprehension in Participants Who Passed Control
Following Milosavljevic et al. (2025), to assess irony understanding more stringently, we created a data subset that included only the participants who passed the Control Condition (i.e., with the Control score = 2). This included 147 participants (27 4/5-year-olds; 44 5/6-year-olds; 35 6/7-year-olds; and 41 7/8-year-olds). We ran multiple comparisons t-tests to check whether the Irony performance for each School year group was above the chance level using the Benjamini-Hochberg procedure to control the FDR at 5%. The results showed that only 7/8-year-olds performed significantly above chance level (p = .029). The performance for 4/5-year-olds was now significantly below chance level (p = .029), while the performance for both 5/6-year-olds (p = .297) and 6/7-year-olds (p = .723) was now at chance level (See Figure 4).

Percentage of accurate answers in the Irony Condition of participants who have passed the Control divided by different School year groups (4/5-year-olds, 5/6-year-olds, 6/7-year-olds and 7/8-year-olds).
Vigilance Score Distribution
As a measure of children’s epistemic vigilance, we created a variable Vigilance with five levels (ranging from 0 to 4), corresponding to the number of correct answers that children provided in all four Induction stories in epistemic vigilance measures (‘0’ = 0/4 correct answers and ‘4’ = 4/4 correct answers). It reflects both their tendency to trust a reliable informant and their ability to withhold trust from an unreliable one. Figure 5 illustrates the proportions of each Vigilance score for different School year groups. To further explore differences in Vigilance between different School year groups, pairwise comparisons were performed using t-tests. The FDR correction was applied using the Benjamini-Hochberg procedure (p < .05) to control for multiple comparisons. Significant differences were found between 4/5-year-olds and 6/7-year-olds (p = .022), between 4/5-year-olds and 7/8-year-olds (p = .013), while no significant difference was observed between 4/5-year-olds and 5/6-year-olds (p = .321), between 5/6-year-olds and 6/7-year-olds (p = .110), between 5/6-year-olds and 7/8-year-olds (p =.055) and between 6/7-year-olds and 7/8-year-olds (p = .755). 5

Proportions of answers for vigilance score for different school year groups.
Mixed-Effects Model of Irony Comprehension (Full Sample)
For our main analyses, we ran GLMMs with Irony Performance as the dependent variable. The initial, theoretically motivated model included Vigilance, Reliability and School year group, along with their three-way interaction, as fixed effects and random intercepts for subjects and items, plus a by-subject random slope for Reliability. Following Bates et al. (2015), we simplified the random-effects structure to achieve convergence. The final model included fixed effects of Vigilance, Reliability and School year group, and the two-way interaction between Vigilance and Reliability, with random intercepts for subjects only. The Analysis of Deviance indicated an effect of School year group (χ² (3) = 60.38, p < .001) and a tendency for an interaction between Vigilance and Reliability (χ² (1) = 3.18, p < .075).
Mixed-Effects Model of Irony Comprehension (Participants Who Passed Control)
As a second step, to assess irony understanding more stringently, we fitted the same theoretically driven generalised linear mixed-effects model on the subset of participants who passed the Control Condition. The dependent variable was Irony Performance, and the fixed effects included Vigilance, Reliability, School year group and their three-way interaction. The random structure initially contained random intercepts for subjects and items and a by-subject random slope for Reliability. Following Bates et al. (2015), we simplified the random- and fixed-effect structures to achieve convergence. The final model included fixed effects of Vigilance, Reliability and School Year Group and the two-way interaction between Vigilance and Reliability, with random intercepts for subjects only6,7. The Analysis of Deviance indicated an effect of School year group (χ² (3) = 68.23, p < .001), an effect of Reliability (χ² (1) = 4.18, p < .041), 8 and a significant interaction between Vigilance and Reliability (χ² (1) = 5.57, p = .018). No main effect of Vigilance was found (χ² (1) = 0.48, p =. 487). To further inspect these results, we used the Tukey-Corrected least square means that revealed a significant difference in Irony Performance between 4/5-year-olds and 5/6-year-olds (B = −16.62; z = −7.29; p < .001), between 4/5-year-olds and 7/8-year-olds (B = −17.44; z = −7.61; p < .001), between 5/6-year-olds and 6/7-year-olds (B = 15.27; z = 6.16 ; p < .001) as well as between 6/7-year-olds and 7/8-year-olds (B = −16.08; z = −6.66; p < .001). To further explore differences in Irony Performance across the Reliable and Unreliable conditions at different levels of Vigilance, pairwise comparisons of estimated marginal means were conducted with Tukey adjustment applied to control for the family-wise error across multiple comparisons. The results revealed that when Vigilance equals 0, the participants in the Reliable condition performed significantly better than those in the Unreliable condition (B = 2.20; z = 2.04; p = .041), although the limited number of observations available (12) undermines the robustness of this finding, which we thus refrain from interpreting. The analyses further revealed that at the highest level of vigilance (Vigilance score = 4), the participants performed significantly better in the Unreliable condition than in the Reliable condition (B = −1.39; z = −2.08; p = .037). 9
Discussion
The principal aims of the study were to examine the developmental trajectory of irony comprehension and to determine if epistemic vigilance supports it. Our results indicate that irony comprehension emerges relatively late, as we found clear evidence of irony understanding only at 7/8 years of age. To examine the role of epistemic vigilance in irony understanding, we investigated whether prior information about a character’s reliability would facilitate irony comprehension—specifically, whether irony would be easier to detect when the target of the ironical remark had previously been presented as an unreliable source of information. Furthermore, we assessed whether children’s epistemic vigilance was a good predictor of their performance in the irony comprehension task. We found that vigilant children did show an advantage in irony understanding when the target was unreliable. In what follows, we shall discuss these findings in turn.
When Does Irony Understanding (Really) Emerge?
The design of the present study allowed us to establish that children’s performance in the irony comprehension task could not be driven by a mere sensitivity to the presence of a mismatch between the speaker’s expectations and the context. Indeed, while this mismatch was present in both the Control and the Irony Condition, it was only in the Irony Condition that participants were justified in selecting the angry face emoticon. This distinction allowed us, through the inclusion of our Control Condition, to restrict our analysis to those children whose behavioural responses were not reducible to a simple reaction to mismatch detection. When considering only the participants who passed the Control Condition, and thus applying a more stringent criterion for irony understanding compared to previous studies (e.g., Köder & Falkum, 2021), our findings showed that only 7/8-year-olds understood irony above chance level. These findings align well with developmental literature suggesting a rather prolonged trajectory of irony understanding (for a review, see Milosavljevic, 2024).
It is important to stress that our results do not exclude the possibility that younger children may be sensitive to some irony cues and detect the incongruity between expectations and reality much before (in the preschool period), as evidenced by a handful of studies relying on an implicit measure such as eye gaze (e.g., Köder & Falkum, 2021). However, the ability to successfully translate this sensitivity into accurate responses in explicit task formats (by choosing the correct emoticon) may emerge much later, along with some other important developmental milestones, including the ability to process multiple cues simultaneously. This ability is essential for integrating lexical information (i.e., the literal meaning of the utterance) with contextual one (situational outcome, nonverbal cues) available in ironical scenarios. In its absence, younger children often default to a single cue, exhibiting either a lexical bias—interpreting utterances based on their literal wording—or a contextual bias—relying primarily on contextual mismatch or emotional context (e.g., Aguert et al., 2010; Matsui, 2019).
Our findings highlight the importance of differentiating the children who possess a sophisticated irony understanding from those who merely recognise the presence of a mismatch between the context and the speaker’s expectations. Hence, the role of the Control Condition is crucial and should be considered in future research to ensure an accurate assessment of children’s nuanced understanding of irony.
Does Epistemic Vigilance Buttress Irony Comprehension?
To experimentally investigate the role of epistemic vigilance in irony understanding, we chose to manipulate the reliability of the target of the ironical remark, operating on the premise that an ironical speaker expresses a dissociative, critical attitude towards a proposition deemed false or irrelevant and towards its source, who is perceived as unreliable. In line with this, we anticipated that the speaker’s ironical intent and attitude would be more readily recognised when the target of irony was previously portrayed as unreliable (Mazzarella & Pouscoulous, 2023). Based on Milosavljevic et al. (2025), we also predicted that performance in the epistemic vigilance task would serve as a predictor of children’s performance in the irony comprehension task. Specifically, we expected that children who demonstrated the ability to actively evaluate the informant’s reliability and adjust their trust accordingly in the Induction Phase would also be more likely to critically assess the competence and benevolence of the ironical speaker in the irony comprehension task. This active evaluation would, in turn, facilitate the recognition of the speaker’s ironical attitude.
Our findings revealed an effect of reliability manipulation, underscoring its relevance to the analysis. However, this effect cannot be meaningfully interpreted in isolation, as it was involved in a significant interaction with vigilance. Indeed, children who demonstrated strong epistemic vigilance skills towards the source of information (as evidenced by their performance in all four induction vigilance stories) showed a better understanding of irony when the target was presented as unreliable. This provides some empirical support for the hypothesis that epistemic vigilance plays an important role in the recognition of the speaker’s critical, ironical attitude (Mazzarella & Pouscoulous, 2021, 2023). It confirms that the most vigilant children (those who were consistently able to mistrust the unreliable and trust the reliable informant) were less likely to interpret irony as a factual error or deliberate falsehood when the speaker had been previously established as unreliable. The observed interaction highlights the specific role of epistemic vigilance in irony understanding. If general cognitive abilities were the main driver of performance, we would expect a main effect of Vigilance across both conditions, rather than an interaction limited to the Unreliable condition. Importantly, our additional analyses showed no relationship between children’s epistemic vigilance and performance in the Control condition, further reinforcing its specific relevance to irony. Thus, while general cognitive or linguistic abilities may certainly contribute to success in the comprehension task proposed (in both the Irony and the Control Conditions), our Vigilance measure cannot be reduced to the contribution of these factors alone.
Our findings extend the results of Milosavljevic et al. (2025) and shall be discussed with respect to two methodological modifications that have been implemented in the present study. First, as in their study, reliability information (based solely on past accuracy) did not significantly affect irony performance, we decided to explicitly manipulate not only the informant’s history of accurate testimonies but also the information about their general behavioural tendencies (reliability trait) to make children’s perception of the target’s (un)reliability much more stable throughout the task. This was motivated by the developmental literature suggesting that children may still consider an informant as generally trustworthy despite a history of past inaccuracies (Vanderbilt et al., 2014). In the present study, the descriptions of reliability traits were consistently aligned with the informant’s history of (in)accuracy. However, it would be interesting to explore how children would navigate their decisions if these were incongruent or less predictable (e.g., Bhatti et al., 2024). Furthermore, as in Milosavljevic et al. (2025), we relied exclusively on epistemic cues (past accuracy) and avoided manipulating the speaker’s moral traits, as, due to the greater consequences that they may carry, may not be perceived as conducive to ironical responses in a natural setting (Hukker et al., 2024).
Second, we extended the results of Milosavljevic et al. (2025) to a new epistemic vigilance measure. Instead of using a selective-trust task that assesses children’s ability to trust a reliable over an unreliable informant, we employed a false communication task that measured participants’ ability to evaluate the reliability of a single informant and calibrate trust accordingly. It is considered to be more challenging for younger children as it requires them to potentially reject the only information available, and in case of an unreliable source, infer that the opposite is true to avoid misinformation (e.g., Vanderbilt et al., 2011). While children start quite early to selectively trust the reliable over the unreliable source based on different reliability cues, they succeed in false communication tasks only around the age of 6 (for a review, see Mascaro & Morin, 2014; Mazzarella & Vaccargiu, 2024), as evidenced by the developmental trajectory related to our Vigilance measure. However, although false communication tasks may be more ecologically valid than selective-trust tasks—since in everyday exchanges, children seldom interact with multiple interlocutors providing conflicting information—they still do not fully mirror children’s natural experiences in communication, as they assign to them a more passive role of overhearers. Therefore, future studies should consider directly involving children as active participants in tasks (e.g., addresses) to establish an even more naturalistic setting.
Although the present paradigm yielded robust findings, it does not allow us to fully disentangle the respective contributions of contextual and prosodic cues to irony comprehension. While the Control Condition rules out the possibility that participants’ responses were driven solely by contextual incongruency, no comparable control was included for prosodic variation. However, it is plausible that both contextual incongruency and prosody jointly contributed to irony understanding, and that prosody alone was insufficient to drive it. Indeed, developmental evidence on the role of prosody in irony comprehension remains inconsistent and inconclusive: most studies suggest that children begin to use prosodic cues only around the age of 8 (for a review, see Fuchs, 2023), and that consistent detection of irony based on prosody alone emerges only around the age of 9 (Panzeri & Giustolisi, 2025). Moreover, our ironical intonation was deliberately very subtle to ensure it sounded natural, suggesting that children relied on integrating multiple communicative cues rather than prosody alone.
Conclusion
The present study provides direct empirical support for the relationship between epistemic vigilance and irony understanding. It corroborates the hypothesis of Mazzarella and Pouscoulous (2021) that more vigilant children are in a better position to recognise that the information provided by the unreliable informant is false or irrelevant and that the speaker’s remark may be motivated by an ironical intent. They also extend on the previous claims that epistemic vigilance may be part of the rich network of socio-cognitive capacities that support irony understanding and that develop gradually and interactively, thus contributing to explaining its prolonged developmental trajectory (Pouscoulous, 2023). More specifically, they confirm that irony requires not only advanced language skills and the ability to reason about others’ mental states, such as beliefs, intentions and emotions, but also the capacity to evaluate critically the reliability of a source of information and attribute this ability to the speaker. As a result, these findings underscore the importance of delving deeper into the interplay between pragmatics and epistemic vigilance.
Footnotes
Acknowledgements
We are grateful to all members of the Cognitive Science Centre for their feedback at all stages of this research and to Alexis Stawarz for his help with data collection. We thank Giorgio Arcara for his guidance in conducting the statistical analyses and Lucio Ruvidotti for the illustrations for our experimental material. We also extend our warmest thanks to the school consortia CERISIERS and CESCOLE of the Canton of Neuchâtel, as well as to the teachers, parents and, above all, the children who participated in this project.
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This research was supported by a Swiss National Science Foundation Eccellenza Grant (186931) awarded to Diana Mazzarella.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
