The Influence of Prior Semantic Knowledge in Noisy Channel Interpretation

Abstract

How do comprehenders interpret semantically implausible sentences? Previous studies proposed a noisy-channel framework of sentence comprehension, where communication between a speaker and a comprehender happens in a noisy channel. The comprehender rationally adopts an interpretation of a sentence based on how likely the interpretation is (the semantic prior) and how likely is the interpretation corrupted into the perceived sentence because of noise (the likelihood). The theory predicted that comprehenders would be more likely to adopt a literal interpretation of an implausible sentence if their prior of implausible sentences were higher. To test this hypothesis, Gibson et al. manipulated the proportion of implausible test sentences in two sets of experiments, where participants read a number of sentences and answer a comprehension question following each sentence. Although their results supported the hypothesis, the experiment could be confounded (a) by participants’ adaptation effect (due to different experiment lengths) and (b) by different participants having different strategies to do the task (due to the between-subject design). In our study, we manipulated the semantic prior and controlled for these potential confounds. We found participants exposed to more implausible sentences were indeed more likely to interpret implausible sentences literally. Our results hence offer additional support for the noisy-channel framework.

Keywords

noisy-channel rational inference sentence comprehension

Introduction

Messages are constantly conveyed between a speaker – who encodes their intended meaning in the message – and a comprehender – who decodes the message to recover the speaker’s intended meaning. This might sound like a trivial process: after all, as speakers of a human language, we are used to constantly alternating between the role of speaker and comprehender and exchanging thoughts and ideas using language. However, it is not as easy as one might think, as there is often noise in the communication process (Brehm, 2023). For example, the speaker might utter disfluent speech, the environment of the conversation might be noisy, or the comprehender might not be paying close attention to the speaker, and as a result, the meaning that the comprehender decodes at their end is not always the intended meaning of the speaker. Remarkably, despite the presence of noise, comprehenders often manage to recover the speaker’s intended meaning.

How do we as language users achieve this? Past studies have offered models on how a comprehender extracts meaning given a signal (e.g. Ferreira, 2003; Gibson, 2000; Hale, 2001; Levy, 2008a; Lewis et al., 2006; MacWhinney & Bates, 1989; Tabor et al., 2004). However, most of the proposals treat the communication as taking place in a noise-free environment. In contrast, some recent proposals (e.g., Gibson et al., 2013; Levy, 2008b) integrate the presence of noise in their models. These proposals model the communication between a speaker and a comprehender as happening in a noisy channel (Shannon, 1948). In particular, they model that the speaker has an intended utterance $s_{i}$ , and the comprehender gets a perceived utterance $s_{p}$ as a result of the noisy channel. The comprehender’s goal is to infer the intended utterance $s_{i}$ given the perceived utterance $s_{p}$ , and the confidence can be modeled by a probability distribution $p (s_{i} | s_{p})$ . According to Bayes’ Rule, the probability $p (s_{i} | s_{p})$ can be calculated by Equation 1:

p (s_{i} | s_{p}) \propto p (s_{i}) \cdot p (s_{i} \to s_{p})

(1)

The first term $p (s_{i})$ is the prior, which is the probability of the intended utterance, whereas the second term is the likelihood, representing how likely it is for the intended utterance $s_{i}$ to result in the perceived utterance $s_{p}$ .

This noisy-channel framework has been experimentally tested in numerous studies. Gibson et al. (2013) tested the framework by investigating comprehenders’ interpretation of different syntactic alternations, such as the active-passive alternation or the double-object/prepositional phrase object alternation as in 2 below. These are syntactic constructions that are very close to one another in meaning and in form, varying only the word order and some morphology. All languages have such alternations: they allow us to say the same idea in different word orders, depending on which elements are already introduced and which are yet to be introduced. People like to start with information that is already part of the discourse and proceed to new information (Chafe, 1970; Givón, 1984; Givón 1987; Lambrecht, 1994; Birner and Ward, 1998; Clifton & Frazier, 2004).

Gibson et al. (2013) focused on syntactically well-formed but semantically implausible versions of these materials and examined how participants interpreted them. For example, when a comprehender encounters a sentence “the mother gave the candle to the daughter” (utterance $s_{1}$ ), they have to consider two possibilities: one where the intended utterance is indeed “the mother gave the candle the daughter,” which is not corrupted by noise; another where the intended utterance is “the mother gave the candle to the daughter” (utterance $s_{2}$ ) but resulting in the perceived utterance due to noise. The comprehender needs to compute the quantities $p (s_{i} = s_{1} | s_{p} = s_{1})$ and $p (s_{i} = s_{2} | s_{p} = s_{1})$ . From Equation 1, the relative magnitude of these two quantities depends on the following: (a) $p (s_{i} = s_{1})$ - the prior probability of the utterance $s_{1}$ ; (b) $p (s_{i} = s_{2})$ - the prior probability of the utterance $s_{2}$ ; (c) $p (s_{1} \to s_{1})$ - the probability of utterance $s_{i}$ not being corrupted by noise; and (d) $p (s_{2} \to s_{1})$ - the probability of utterance $s_{2}$ being corrupted into $s_{1}$ . After the calculation, if $p (s_{i} = s_{1} | s_{p} = s_{1})$ is larger, the comprehender interprets the sentence literally, whereas if $p (s_{i} = s_{2} | s_{p} = s_{1})$ r is larger, the comprehender makes an inference and adopts a non-literal interpretation of the perceived sentence.

Gibson et al. (2013) tested the noisy-channel framework in several ways. First, they manipulated the plausibility of the test sentences: a plausible utterance has a higher prior probability than an implausible utterance. In the previous example, since the candle is inanimate and hence cannot receive anything or anyone, the utterance $s_{1}$ has a low prior probability, whereas $s_{2}$ has a high prior probability. Second, they manipulated the type of noise operations needed to corrupt one structure to another. If one corruption is more likely to take place compared to another, the comprehender is more likely to make more inferences on sentences that are result of the more likely corruption. Gibson et al. (2013) considered two types of noise operations: insertion and deletion, and they argued that deletions had a higher likelihood than insertion, based on a size principle (Tenenbaum, 2000): deletions can only act on words that are present in the intended utterance, whereas insertions can act on virtually any word in a language’s vocabulary. In their experiment, Gibson et al. (2013) considered sentences under several grammatical constructions, including the double object (DO) and prepositional object (PO) alternation, which varied in plausibility:

(2) a. DO, plausible: The mother gave the daughter the candle.

b. DO, implausible: The mother gave the candle the daughter.

c. PO, plausible: The mother gave the candle to the daughter.

d. PO, implausible: The mother gave the daughter to the candle.

Comprehension question: did the daughter receive someone/something?

Literal interpretation: “Yes” for plausible, “No” for implausible.

Participants read one version of the sentences in (2) and were asked the comprehension question. For each version, the proportion of responses where participants interpreted the sentence literally was calculated. Two predictions were made by Gibson et al. (2013): first, plausible sentences (e.g., 2a, 2c) are more likely to be interpreted literally than implausible sentences (e.g., 2b, 2d); second, implausible PO sentences like (2d) are more likely to be interpreted literally than implausible DO sentences like (2b). This is because an implausible PO sentence (2d) is potentially a result of an insertion of the preposition “to” from a plausible DO sentence (2a), whereas an implausible DO sentence (2b) is potentially a result of a deletion of “to” from a plausible PO sentence (2c)¹. Since deletions are more likely to take place than insertions, participants are predicted to make more inferences when they encounter (2b) than (2d), hence less likely to adopt a literal interpretation. Results in Gibson et al. (2013) were in line with these two predictions.

Third, Gibson et al. (2013) also manipulated the presence of noise in the filler sentences. In one experiment, the filler sentences given to the participants were plausible and syntactically good English sentences, such as “The colonel was knighted by the queen because of his loyalty.” In contrast, in another experiment, half of the filler sentences were replaced with syntactically illicit sentences caused by various noise operations (e.g. “The colonel was knighted for by the queen because of his loyalty”). Gibson et al. (2013) predicted that these syntactically illicit filler sentences would raise the likelihood of noise operations $p (s_{1} \to s_{2})$ , and therefore participants would make more inferences. As predicted, they found that in almost all constructions, participants given fillers containing syntactic errors were more likely to make inferences.

Fourth, and most relevant to the current study, Gibson et al. (2013) manipulated the semantic prior by changing the proportion of implausible sentences. A higher proportion of implausible sentences raises the prior of implausible utterances, and the framework predicts that participants will be more likely to interpret implausible sentences literally. In Experiments 1A-1E of Gibson et al. (2013), participants read 20 test sentences of one of five types of syntactic alternations, together with 60 filler sentences. For example, one of these sets of materials investigated the double-object/prepositional phrase object alternation as in 2. In Experiment 3 of Gibson et al. (2013), participants read 100 test sentences, consisting of materials from all five alternations in one experiment, together with the same 60 filler sentences. As a result, the proportion of implausible sentences was higher in Experiment 3 than in Experiment 1 (50/160, or 31.25% vs. 10/80, or 12.5%, since half of the test sentences were implausible). They found that, as predicted, participants were more likely to interpret implausible sentences literally.

However, the manipulation of the semantic prior in Gibson et al. (2013) was potentially confounded. Their approach was to pool plausible and implausible sentences across different alternations together in one single experiment (Experiment 3), and in this way, the proportion of implausible sentences increased, and they showed that this manipulation led to an increase in participants’ literal interpretation of implausible sentences. Their approach introduces two potential confounds as to why the literal interpretation rate increased: a confound of experiment length and a confound of participant idiosyncrasy. The first potentially confounding factor is a result of an adaptation effect: as participants see more and more implausible test sentences throughout the experiments, they could adjust their semantic prior to have more expectation of implausible sentences and hence are more likely to interpret them literally. Indeed, this effect was found in Delaney-Busch et al. (2019): the change in participants’ N400 amplitude (a measure of a participant’s semantic processing) over a prime-target matching task can be largely predicted by a model of their by-trial target word probability estimate, suggesting that participants’ semantic prediction shifted throughout an experiment to adapt to the statistical structure of the task. It is possible that in Experiment 3 of Gibson et al. (2013), participants were exposed to more implausible sentences and had more adaptation to the semantic prior, compared to participants in Experiment 1, and the results reported in the original study, which was an average over the course of the experiment, could not show whether the reported effect was actually due to the difference in semantic prior, or due to the difference in experimental length.

The second potentially confounding factor was what we would label as idiosyncrasy of strategy across participants: each participant had a different strategy to complete the task. Some participants may choose to consistently interpret sentences literally, even though they are implausible (“consistently literal” henceforth); some participants may switch back and forth, interpreting some implausible sentences literally and others non-literally (“switching” henceforth); and others may choose to consistently interpret sentences non-literally (“consistently non-literal” henceforth). Since the experiment to measure the effect of semantic prior is between-participant in nature, it was unclear to what extent the effect observed in Gibson et al. (2013) was actually due to the effect of semantic prior manipulation or due to differences in participant strategies. To measure the effect of the semantic prior manipulation in a between-participant design, one should minimize the influence of individual differences by recruiting a large number of participants.

To see if the effect of semantic prior found in Gibson et al. (2013) was in fact confounded, we plotted the original results in Figure 1a and the average by-trial literal interpretation rate in Figure 1b. If semantic prior has an effect on sentence interpretation, we expect no noticeable difference between participants’ literal interpretation rates in the initial trials of Experiment 3 (blue line in Figure 1b) and Experiment 1 (red line in Figure 1b). Then, as trials progress, the effect of the semantic prior becomes more pronounced, and we should see a consistent difference in literal interpretation rate between the two conditions. Results in Figure 1b were not in line with this prediction: the difference in literal interpretation rate between the two conditions was already present in the beginning trials of the experiment. This implies that the results found in Gibson et al. (2013) could still be because of the idiosyncrasy of participant strategies in Experiment 3. This also implies that the difference in the literal interpretation rate in Gibson et al. (2013) could be due to an adaptation effect: as shown in Figure 1b, participants were more likely to interpret implausible sentences literally later in the experiment, as they encountered more implausible sentences. In addition, the adaptation effect was stronger in Experiment 3 than in Experiment 1 due to differences in experiment length.

Figure 1.

Experiment length and participant idiosyncracy could be confounds in ( Gibson et al., 2013 ) regarding the effects of semantic prior. (a) A reproduction of the results reported in Gibson et al. (2013), showing the literal interpretation rate (y—axis) for stimuli under each construction (x—axis), grouped by syntactic alternations. Red bars in each panel correspond to Experiment 1 in Gibson et al. (2013), and blue bars in each panel correspond to Experiment 3, where the rate of implausible sentences was increased. (b) Average by-trial interpretation rate for stimuli under each construction in Experiment 1 (red) and Experiment 3 (blue). Acronyms: actpass - active/passive alternation, locinv - locative inversion alternation, transintrans - transitive/intransitive alternation, dopo_goal - double object/prepositional object alternation involving the preposition to, dopo_ben - double object/prepositional object alternation involving the preposition for.

To further examine the effect of participant idiosyncrasy, we compared four groups of participants who were presented with the same sentences and were asked to do the same task. The first two groups were taken from Experiment 1 in Gibson et al. (2013), where participants were presented with active/passive sentences and DO-goal/PO-goal sentences, respectively. The remaining two groups were taken from Experiment 3 in Chen et al. (2023), where participants were also presented with active/passive sentences and DO-goal/PO-goal sentences, respectively². For each type of sentence material, we calculated the portions of different types of participants (i.e., consistently non-literal, switching, and consistently literal). The results are shown in Figure 2: despite being presented with the same sentences and asked to do the same task, different groups of participants have different distribution of behaviors. A higher proportion of participants in Gibson et al. (2013) consistently interpreted implausible active/passive sentences literally, and a higher proportion of participants in Chen et al. (2023) switched between literal and non-literal interpretation in active/passive, compared to in Gibson et al. (2013). A lower proportion of participants consistently interpreted implausible DO-goal sentences non-literally in Chen et al. (2023) compared with in Gibson et al. (2013), whereas a higher proportion of participants switched back and forth when interpreting implausible DO-goal sentences in Chen et al. (2023). Therefore, in order to ensure that the effect observed in Gibson et al. (2013) is actually due to the effect of interest (semantic prior), instead of due to participant idiosyncrasy, one needs to recruit a large enough number of participants to ensure each participant’s idiosyncrasy is mostly smoothed out.

Figure 2.

Different groups of participants had different distribution of responses when given the same materials. This plot shows how participants responded to implausible sentences across two identical experiments (Chen et al., 2023; Gibson et al., 2013). Red: those who always interpreted implausible sentences non-literally (i.e. 0% literal interpretation rate); blue: those who switched back and forth between literal and non-literal interpretation when presented with implausible sentences (i.e., between 0% and 100% literal interpretation rate); green: those who consistently interpreted implausible sentences literally (i.e. 100% literal interpretation rate). The numbers in each panel indicate the number of participants.

The current study serves to address these confounds, and hence test the effect of semantic prior on noisy-channel sentence interpretation. As an overview, one group of participants is given the filler sentences in Gibson et al. (2013), whereas another group of participants is given filler sentences that are syntactically licit but semantically implausible. We address the potential confounds in Gibson et al. (2013) in two ways: first, participants in different experimental conditions read the same number of sentences. We also track the literal interpretation rate across participants throughout the course of the experiment. In addition, each experimental condition has the same number of participants, and in the second experiment, we recruited a large number of participants (200 per condition), hoping to smooth out the idiosyncrasy in their semantic prior. If participants are indeed more likely to interpret implausible sentences literally given a more implausible semantic prior, we should expect such a difference to be present once participants were adapted to the new semantic prior. Instead, if the results in Gibson et al. (2013) were indeed caused by a difference in experiment length, we should expect no difference in literal interpretation rate in the two groups of participants. In addition, if the results in Gibson et al. (2013) were due to participant idiosyncrasy, we should expect the literal interpretation rate in two groups of participants to be consistently different throughout the experiment.

Experiment 1

Methods

We followed the methods from previous studies (e.g., Gibson et al., 2013): participants read sentences varying in constructions and plausibility and were asked a comprehension question. One group of participants was given plausible filler items, and another group was given implausible filler items. In each condition, we calculated the proportion of the trials where participants interpreted the sentences literally. Below is a detailed account of the methods.

Participants in both groups were asked to read 80 sentences: 20 test sentences and 60 filler sentences. The 20 test sentences were taken from the DO-goal/PO-goal materials in Gibson et al. (2013), systematically varying in syntactic construction (DO-goal or PO-goal; for simplicity, we will refer to them simply by DO and PO) and plausibility (plausible or implausible), with 5 sentences in each combination [See (2)]. Critically, the semantic prior is manipulated by the type of filler sentences participants read: one group of participants were given 60 plausible filler sentences, taken from Gibson et al. (2013), such as “the chef fried the rice” (henceforth the plausible filler condition), whereas another group of participants were given 60 implausible filler sentences, generated from one-word substitution from the plausible fillers, such as “the shrimp fried the rice” (henceforth the implausible filler condition). Substitutions make these implausible sentences unlikely to be a noisy version of plausible sentences (Poliak et al., 2024), forcing participants to alter their prior. The prior for implausible interpretations is higher under the implausible filler condition, compared with the plausible filler condition. Half of the literal answers to these sentences are “yes”, and the other half are “no.” Table 1 provides a summary of the materials.

Table 1.

Example Sentence Stimuli Used in This Study.

Plausible filler condition	Implausible filler condition
60 plausible filler sentences	60 implausible filler sentences
(e.g., “the chef fried the rice”)	(e.g. “the shrimp fried the rice”)
5 implausible DO sentences (e.g., “the mother gave the candle the daughter”)
5 implausible PO sentences (e.g., “the mother gave the daughter to the candle”)
5 plausible DO sentences (e.g., “the mother gave the daughter the candle”)
5 plausible PO sentences (e.g., “the mother gave the candle to the daughter”)

120 participants were recruited from Prolific³ who were native English speakers located in the United States, with an approval rate higher than 95%. The study was hosted on Qualtrics⁴. 60 participants were in the plausible filler condition, and 60 participants were in the implausible filler condition. Before the experiments, participants were asked to complete 5 English sentences in a grammatical way as a proficiency check. After the proficiency check, all 80 trials were presented on the same webpage. Each trial contained a sentence, followed by a comprehension question and two buttons, one for “Yes” and another for “No”. Participants were free to edit their responses. There was no time limit for the experiment. The expected completion time for the experiment was 15 min, and participants were paid $3.00 for their submission, regardless of how long it took for them to complete. Only participants with a higher than 75% filler accuracy rate were included in our analysis.

This study was not pre-registered. The data and the analysis scripts are available at https://osf.io/k5vqj.

Results

One hundred twenty-two participants in total were originally recruited, and two participants were excluded from the data analysis due to low filler accuracy. The median completion time of the experiment was 13 min. In all conditions, plausible sentences were interpreted literally in more than 90% of the trials and were hence not analyzed further.

We ran a Bayesian mixed-effects logistic regression using the MCMCglmm package (Hadfield, 2010) in R (R Core Team, 2013). We coded the sentence construction (DO vs. PO) and the filler condition (plausible vs. implausible fillers) as fixed effects. Following the maximum random effect structure under our experimental design (Barr et al., 2013), we included random intercepts for participants and items, random by-participant and by-item slope for construction, and random by-item slope for filler condition. In lme4 (Bates et al., 2014) syntax, the formula would be written as in (3):

\begin{matrix} L i t e r a l r e s p o n s e \sim c o n d i t i o n * c o n s t r u c t i o n + \\ (1 + c o n d i t i o n * c o n s t r u c t i o n | i t e m) + \\ (1 + c o n s t r u c t i o n | p a r t i c i p a n t) \end{matrix}

(3)

In the analysis, the priors were set to be uninformative (Baayen et al., 2008), and the number of iterations was set to be 10000, with a thinning interval of 10, and a warm-up period of 3000 iterations. For each main effect parameter and the interaction, we report the 2.5% percentile, the mean, and the 97.5% percentile of the posterior distribution. We also report the value $p_{M C M C}$ , the Bayesian counterpart of the $p -$ value in frequentist statistics: we say a result is significant if $p_{M C M C} <$ .05.

The results are presented in the upper facet of Figure 3, and the results of the statistical analyses are shown in Table 2. First, implausible PO sentences were more likely to be interpreted literally than implausible DO sentences ( $β$ = 1.473, $p_{M C M C} <$ .001). Critically, participants were more likely to interpret implausible sentences literally when they were given implausible filler sentences than when they were given plausible filler sentences ( $β = - 1.128$ , $p_{M C M C} = 0.011$ ). This effect is uniform across both constructions, as we found no interaction between the filler condition and the sentence construction ( $β = - 0.094$ , $p_{M C M C} = 0.823$ ).

Figure 3.

Participants with higher implausible semantic prior are more likely to interpret implausible sentences literally. Percentage of literal interpretation of implausible double-object (DO, red) and implausible prepositional object (PO, blue) sentences, faceted by filler conditions (implausible vs. plausible fillers) and experiments. The numbers in each panel indicate the number of participants.

Table 2.

Results of the Mixed-Effect Logistics Regression in Experiment 1 and Experiment 2, Including the 2.5% Percentile, the Mean, the 97.5% Percentile, and the $p_{M C M C}$ . *, **, and *** indicate $p_{M C M C}$ is below .05, .01, and .001, Respectively.

	Variable	2.5% percentile	Mean	97.5% percentile	$p_{M C M C}$
Experiment 1(n = 120)	Filler condition	−2.042	−1.128	−0.304	.011*
Experiment 1(n = 120)	Construction	0.707	1.473	2.138	<.001***
	Interaction	−1.031	−0.094	0.826	.823
Experiment 2 (n = 404)	Filler condition	−1.636	−1.183	−0.621	<.001***
Experiment 2 (n = 404)	Construction	1.621	2.051	2.433	<.001***
	Interaction	−0.783	−0.097	0.462	.754

Experiment 2

Methods

Experiment 2 is an exact replication of Experiment 1, except that the number of participants was 400. Those who participated in Experiment 1 were ineligible for this experiment. The data and the analysis scripts are available at https://osf.io/k5vqj.

Results

Four hundred twenty-nine participants in total were originally recruited, and 25 participants were excluded from the data analysis due to low filler accuracy. Two hundred three participants from the implausible filler condition and 201 participants from the plausible filler condition were included in the analysis. The median completion time for the study is 12.5 min. In all conditions, plausible sentences were interpreted literally in more than 90% of the trials and were hence not analyzed further. We adopted the same statistical analysis procedures as in Experiment 1.

The results are presented in the lower facet of Figure 3, and the statistical analysis results are shown in Table 2. First, just as in Experiment 1, implausible PO sentences were more likely to be interpreted literally than implausible DO sentences ( $β$ = 2.051, $p_{M C M C} <$ .001). Critically, as in Experiment 1, participants were more likely to interpret implausible sentences literally when they were given implausible filler sentences than when they were given plausible filler sentences ( $β = - 1.183$ , $p_{M C M C} < 0.001$ ). This effect is again uniform across both constructions, as we found no interaction between the filler condition and the sentence construction ( $β = - 0.097$ , $p_{M C M C} = 0.754$ ).

By-Trial Analysis

In Experiments 1 and 2, a group of participants was presented with plausible filler sentences, whereas another group was presented with implausible filler sentences. We found that, as predicted, participants given implausible filler sentences were overall more likely to interpret implausible sentences literally than those given plausible filler sentences. We also replicated results reported in Gibson et al. (2013): PO sentences were more likely to be interpreted literally than DO sentences. However, it remains unclear whether such a difference is consistent throughout the experiment. In this section, we analyzed participants’ responses by trial, under different conditions and sentence constructions.

Figure 4 shows the mean proportion of literal interpretation across trial numbers, grouped by sentence constructions and experiments. Similar to Experiments 1 and 2, we ran a Bayesian generalized linear mixed-effects regression in each sentence construction and experiment. Conditions (implausible vs. plausible fillers) and trial numbers are coded as fixed effects. We also included random intercepts of participants and items and random by-participant and by-item slopes for trial number and random by-item slope for conditions, as shown in Equation 4.

\begin{matrix} L i t e r a l r e s p o n s e \sim c o n d i t i o n * t r i a l n u m b e r + \\ (1 + c o n d i t i o n * t r i a l n u m b e r | i t e m) + \\ (1 + t r i a l n u m b e r | p a r t i c i p a n t) \end{matrix}

(4)

Figure 4.

The difference in literal interpretation rate is consistent under different filler conditions. The proportion of literal interpretation (y–axis) under different filler conditions (green for implausible fillers, orange for plausible fillers) is plotted against the trial number (x—axis).

The statistical analysis results are shown in Table 3. First, similar to Figure 1, we found that participants were adapting their semantic prior as the experiment progressed, as implausible sentences presented later in the experiment were more likely to be interpreted literally compared to those presented earlier in the experiment. This was also shown as a significant, positive effect of trial number in the statistical analysis ( $p_{M C M C} s < 0.001$ )

Table 3.

Results of the Mixed-Effect Logistics Regression in the By-Trial Analysis, Including the 2.5% Percentile, the Mean, the 97.5% Percentile, and the $p_{M C M C}$ . *, **, and *** indicate $p_{M C M C}$ is Below .05, .01, and .001, Respectively.

	Variable	2.5% percentile	Mean	97.5% percentile	$p_{M C M C}$
Experiment 1,	Filler condition	−1.580	−0.251	1.170	.694
DO sentences	Trial number	0.028	0.060	0.084	<.001***
(n = 120)	Interaction	−0.059	−0.027	0.009	.106
Experiment 1,	Filler condition	−2.108	−0.502	1.152	.503
PO sentences	Trial number	0.017	0.046	0.078	<.001***
(n = 120)	Interaction	−0.061	−0.022	0.013	.300
Experiment 2,	Filler condition	−2.296	−1.472	−0.631	<.001***
DO sentences	Trial number	0.016	0.030	0.454	<.001***
(n = 404)	Interaction	−0.019	−0.002	0.021	.854
Experiment 2,	Filler condition	−1.371	0.352	0.411	.402
PO sentences	Trial number	0.029	0.051	0.072	<.001***
(n = 404)	Interaction	−0.056	−0.033	−0.009	.002 **

Second, apart from the adaptation effect, it could be observed from the graph that in both experiments, participants given implausible fillers were indeed consistently more likely to interpret implausible sentences literally, compared with those given plausible fillers, as the trend was consistent in both types of sentences and across different trials. This observation was partially supported by statistics, as there were no significant main effects of filler condition in Experiment 1 ( $p_{M C M C} s > 0.402$ ) and for PO sentences in Experiment 2, but there was one for DO sentences in Experiment 2 ( $p_{M C M C} < 0.001$ ). The lack of statistical significance in Experiment 1 was possible due to the low number of subjects, whereas the lack of significance in PO sentences in Experiment 2 was due to a significant interaction between condition and trial number among PO sentences in Experiment 2 ( $p_{M C M C} < 0.002$ ). This could be observed in the lower-right panel in Figure 4: participants presented with plausible fillers had an initially higher but later lower literal interpretation rate than those with implausible fillers. In addition, participants in both conditions were equally likely to interpret implausible sentences literally at the beginning of the experiment in Experiment 2, but as the experiment progressed, participants adapted to the semantic prior in their respective condition and had different literal interpretation rates in both conditions. In Experiment 1, such a trend was less clear, possibly because of the low number of subjects.

Distribution of Responses

Following the procedures in Figure 2, we plotted the distribution of participant responses from Experiment 1 (120 participants in total) and Experiment 2 (404 participants in total) in this study. Since Experiment 2 is a replication of Experiment 1 with more participants, this gives us a direct comparison to investigate participant idiosyncrasy. The results are presented in Figure 5, showing the distribution of different types of participants (i.e., consistently non-literal, switching, and consistently literal) in each filler condition and sentence construction.

Figure 5.

Across two experiments in this study, the distribution of participant responses is relatively stable. This plot shows how participants responded to implausible sentences in Experiment 1 (120 participants in total) and Experiment 2 (404 participants in total) in this study. The data is organized by constructions (DO vs. PO, columns) and filler conditions (implausible vs. plausible fillers, rows). Red: those who always interpreted implausible sentences non-literally (i.e. 0% literal interpretation rate); blue: those who switched back and forth between literal and non-literal interpretation when presented with implausible sentences (i.e. between 0% and 100% literal interpretation rate); green: those who consistently interpret implausible sentences literally (i.e. 100% literal interpretation rate). The numbers in each panel indicate the number of participants.

The results suggest that across the two experiments, the distribution of participant responses is relatively stable within each combination of filler condition and sentence construction. In both experiments, most participants switched between interpreting implausible DO sentences literally and making inferences on them, regardless of filler conditions. A higher proportion of participants in the implausible filler condition consistently interpreted implausible PO sentences literally than those who switched between literal and non-literal interpretation, while the opposite was true for those in the plausible filler condition.

Discussion

Numerous studies in the past have tested various aspects of the noisy-channel framework (e.g. Bader and Meng, 2018; Buxó-Lugo and Slevc, 2024; Cai et al., 2022; Chen et al., 2023; Gibson et al., 2013; Gibson et al., 2017; Liu et al., 2020; Poliak et al., 2024; Poppels & Levy, 2016; Ryskin et al., 2018; Zhan et al., 2023; Paape, 2024, also see Traxler, 2014 for a review), but few studies have tested the prior component by manipulating a comprehender’s semantic prior. A test was conducted in Gibson et al. (2013), but their experiments could be confounded: the experiment that was intended to elicit a higher prior for implausible interpretation was also longer, with the consequence that the results in Gibson et al. (2013) could be potentially due to experiment length, instead of a difference in semantic prior. In addition, it was unclear whether the difference was actually due to the difference in the proportion of implausible sentences, as the noisy-channel framework would predict, or it could be just due to participants in Experiment 3 being more likely to interpret implausible sentences literally, since Experiment 3 had much fewer subjects.

Our study addressed these two issues by controlling for experiment length. In the experiment, two groups of participants read the same test sentences under double-object (DO) and prepositional object (PO) constructions, but one group was presented with implausible filler sentences, while another was presented with plausible filler sentences. This experimental design manipulated the semantic prior without also varying the experiment length. We also recruited a larger number of participants than Gibson et al. (2013), while keeping the same number of participants for each condition, in order to mitigate the effects of participant idiosyncrasy. We predicted that by exposing a comprehender with more implausible sentences, the comprehender would have a higher prior of implausible utterances, and therefore, when they encounter an implausible sentence, they would be more likely to interpret it literally. We also predicted that such an effect should be continuously present after participants are adapted to the new semantic prior. Our findings were consistent with the predictions: in both experiments, participants who were exposed to implausible filler sentences were more likely to interpret implausible test sentences literally, compared with those who were exposed to plausible filler sentences. In addition, such a difference was consistent at the trial level once participants adapted to the semantic prior in their respective experimental condition. We also replicated previous results in Gibson et al. (2013) that PO sentences were interpreted literally more often than DO sentences, plausibly because deletions are less likely to take place than insertions. Our results indicate that if a comprehender repeatedly receives messages that sound implausible, it might be a rational strategy for the comprehender to assume that the sender just tends to send implausible messages, rather than continuing to assume that the sender is saying something plausible.

Our study shows the dynamicity of human semantic prior and the rationality of comprehenders in sentence interpretation. This finding is also broadly in line with previous studies such as Ryskin et al. (2018), which showed the dynamicity of noise likelihood – comprehenders adapted to the noise likelihood according to the noisy sentences they were exposed to. For example, comprehenders exposed to more sentences with deletion errors are more likely to infer deletion as the noise operation. Both this study and Ryskin et al. (2018) showed that comprehenders can quickly adapt to the semantic prior and the noise model of listeners and rationally interpret sentences they perceive. This ability is critical in that communication is always changing - different speakers have different semantic priors and noise likelihood, depending on their native language and even the modality where communication takes place, and different environments have different levels of noise. Therefore, comprehenders need to quickly adapt to the parameters specific to the setting where the conversation takes place, in order to maximally recover the speaker’s intended message.

Our study is the first to raise the issue of individual idiosyncrasy in noisy-channel comprehension: different participants may have different strategies in completing the task. In many previous noisy-channel studies (e.g. Gibson et al., 2013; Poliak et al., 2024; Poppels & Levy, 2016; Zhan et al., 2023), the effects of interest were mostly studied by within-participant experiments, where the same group of participants was presented with sentences manipulated in various ways, and they interpreted sentences accordingly. In these experiments, since it was the same group of participants that read different sentences, the effect of individual idiosyncrasy was small. However, in between-subject experiments (e.g. Chen et al., 2023; Gibson et al., 2017), where different groups of participants were presented with different sets of sentences, the effect of participant idiosyncrasy cannot be ignored, especially when the effect size of interest is small. There are two takeaways from this. First, future studies should ensure that the effect of participant idiosyncrasy is mitigated in between-subject experiments. Results in Figure 5 give us some initial insights into how many subjects are considered enough. Since the distribution of participant responses does not seem to change much when the number of participants varies from 60 (Experiment 1) to 200 (Experiment 2), we speculate that 200 participants per between-participant condition could be an upper bound in order to mitigate the idiosyncrasy effects. Second, future studies could potentially look into the causes of individual variations in noisy-channel comprehension. We speculate that there are at least two factors: one is that different participants may rely on different ways to complete the experiment efficiently, and another factor is that given the same sentence, different participants may assign different plausibility.

A limitation of the study is that so far our predictions on the effects of semantic prior are inexact: participants exposed to a higher proportion of semantically implausible sentences are more likely to interpret implausible sentences literally. However, the extent to which plausibility in the experiment may influence the comprehender’s semantic prior is still unclear (Ryskin & Fang, 2021). Future work should develop a more detailed account of how comprehenders update their semantic prior by integrating information at different timescales.

Footnotes

Acknowledgements

We would like to thank Rachel Ryskin for her feedback on the draft. We would also like to thank the audience of the 2022 MIT MSRP-bio poster session and the 36^th Annual Conference on Human Sentence Processing.

ORCID iDs

Sihan Chen

Edward Gibson

Ethical Considerations

Experiments in this work have been approved by MIT’s Committee on the Use of Humans as Experimental Subjects, Protocol 403000040 (Title: Principles of Language Processing).

Consent to Participate

We obtained written informed consent from each participant in the beginning of the experiment.

Consent for Publication

We have removed identifying information from participants.

Author Contributions

SC: data curation, formal analysis, investigation, methodology, project administration, software, supervision, validation, visualization, writing - original draft, writing – review and editing. LW: data curation, formal analysis, investigation, methodology, software, validation, visualization, writing – original draft. EG: conceptualization, methodology, project administration, supervision, writing – review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work as supported by NSF Award 2121074 “CompCog: Noisy-channel processing in human language understanding” to Gibson.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data and the analysis scripts are available at .

Notes

References

Baayen

R. H.

Davidson

D. J.

Bates

D. M.

(2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.

Bader

Meng

(2018). The misinterpretation of noncanonical sentences revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(8), 1286.

Barr

D. J.

Levy

Scheepers

Tily

H. J.

(2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.

Bates

Mächler

Bolker

Walker

(2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823.

Birner

B. J.

Ward

G. L.

(1998). Information status and noncanonical word order in English (Vol. 40). John Benjamins Publishing.

Brehm

(2023). Chapter one – what’s an error anyway? Speaker- and listener-centered approaches to studying language errors. In Federmeier

K. D.

Montag

J. L.

(Eds.), Speaking, writing and communicating, Volume 78 of The Psychology of Learning and Motivation (pp. 1–39). Academic Press.

Buxó-Lugo

Slevc

L. R.

(2024). Integration of input and expectations influences syntactic parses, not just sentence interpretation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 50(3), 500–508.

Cai

Z. G.

Zhao

Pickering

M. J.

(2022). How do people interpret implausible sentences? Cognition, 225, 105101.

Chafe

W. L.

(1970). Meaning and the structure of language. University of Chicago Press.

10.

Chen

Nathaniel

Ryskin

Gibson

(2023). The effect of context on noisy-channel sentence comprehension. Cognition, 238, 105503.

11.

Clifton

Frazier

(2004). Should given information come before new? Yes and no. Memory & Cognition, 32(6), 886–895.

12.

Delaney-Busch

Morgan

Lau

Kuperberg

G. R.

(2019). Neural evidence for Bayesian trial-by-trial adaptation on the n400 during semantic priming. Cognition, 187, 10–20.

13.

Ferreira

(2003). The misinterpretation of noncanonical sentences. Cognitive Psychology, 47(2), 164–203.

14.

Gibson

(2000). The dependency locality theory: A distance-based theory of linguistic complexity. In Marantz

Miyashita

O’Neil

(Eds.), Image, language, brain: Papers from the first mind articulation project symposium (pp. 94–126). The MIT Press.

15.

Gibson

Bergen

Piantadosi

S. T.

(2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences, 110(20), 8051–8056.

16.

Gibson

Tan

Futrell

Mahowald

Konieczny

Hemforth

Fedorenko

(2017). Don’t underestimate the benefits of being misunderstood. Psychological Science, 28(6), 703–712.

17.

Givón

(1984). Syntax: A functional-typological introduction. John Benjamins.

18.

Givón

(1987). Beyond foreground and background. Coherence and Grounding in Discourse, 11, 175–188.

19.

Hadfield

J. D.

(2010). Mcmc methods for multi-response generalized linear mixed models: The mcmcglmm r package. Journal of Statistical Software, 33, 1–22.

20.

Hale

(2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of Second Meeting of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, USA, 1–8.

21.

Lambrecht

(1994). Information structure and sentence form. Cambridge University Press.

22.

Levy

(2008a). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–1177.

23.

Levy

(2008b). A noisy-channel model of human sentence comprehension under uncertain input. In Proceedings of the 2008 conference on empirical methods in natural language processing. Association for Computational Linguistics, USA, 234–243.

24.

Lewis

R. L.

Vasishth

van Dyke

J. A.

(2006). Computational principles of working memory in sentence comprehension. Trends in Cognitive Science, 10(10), 447–454.

25.

Liu

Ryskin

Futrell

Gibson

(2020). Structural frequency effects in noisy-channel comprehension. In Proceedings of the 26th Architectures and Mechanisms for Language Processing.

26.

MacWhinney

Bates

(1989). The crosslinguistic study of sentence processing. CUP.

27.

Paape

(2024). How do linguistic illusions arise? Rational inference and good-enough processing as competing latent processes within individuals. Language, Cognition and Neuroscience, 39(10), 1334–1365. https://doi.org/10.1080/23273798.2024.2387226

28.

Poliak

Ryskin

Braginsky

Gibson

(2024). It is not what you say but how you say it: Evidence from Russian shows robust effects of the structural prior on noisy channel inferences. Journal of Experimental Psychology. Learning, Memory, and Cognition, 50(4), 637–649. https://doi.org/10.1037/xlm0001244

29.

Poppels

Levy

(2016). Structure-sensitive noise inference: Comprehenders expect exchange errors. Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 378–383.

30.

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

31.

Ryskin

Fang

(2021). The many timescales of context in language processing. Psychology of Learning and Motivation, 75, 201–243.

32.

Ryskin

Futrell

Kiran

Gibson

(2018). Comprehenders model the nature of noise in the environment. Cognition, 181, 141–150.

33.

Shannon

C. E.

(1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.

34.

Tabor

Galantucci

Richardson

(2004). Effects of merely local syntactic coherence on sentence processing. Journal of Memory and Language, 50(4), 355–370.

35.

Tenenbaum

(2000). Rules and similarity in concept learning. In Solla

S. A.

Leen

T. K.

Muller

K.-R.

(Eds.) Advances in neural information processing systems 12 (pp. 59–65). MIT Press.

36.

Traxler

M. J.

(2014). Trends in syntactic parsing: Anticipation, Bayesian estimation, and good-enough parsing. Trends in Cognitive Sciences, 18(11), 605–611.

37.

Zhan

Chen

Levy

Gibson

(2023). Rational sentence interpretation in Mandarin Chinese. Cognitive Science, 47(12), e13383.