Abstract
The precise nature of the prosodic contribution to disambiguating open and polar questions with indefinite content pro-forms in Korean (e.g.,
1 Introduction
This article contributes to the debate surrounding the nature of the prosodic contribution to disambiguating otherwise identical strings in Korean that can be interpreted as open questions, polar questions, or statements. Grammatically, the different meanings are accounted for by differences in the scope of
We report the findings of two experiments: a pilot comprehension study testing the application of the gating paradigm to naturally produced stimuli, followed by a large-scale speech comprehension experiment with natural and manipulated stimuli. The experiments were designed on the assumption that expanded pitch range during a focus-bearing constituent was the critical prosodic feature in disambiguation and aimed to specify more closely what level of F0 range would be interpreted by hearers as pitch expansion. Naturally produced stimuli were manipulated to reduce the range of F0 in the AP containing the focus-bearing constituent. The placement of AP boundaries was not controlled. Participants heard repeated, incrementally longer fragments of the stimuli and were asked to identify whether they had heard an open question (wh-question), a polar question (yes-no question), or a statement, or whether they did not yet know.
Our results suggest that F0 range alone does not determine hearers’ interpretation of an ambiguous utterance. Although a gradient effect of the size of the F0 range at the verb was found when interpreting polar question stimuli, this competed with a strong tendency for hearers to interpret indefinite pro-forms as signaling open questions, whether or not they were associated with a greater F0 range. These findings have implications for our understanding of the role of prosody within the overall content of spoken language. Furthermore, they suggest that in attempting to account for the prosodic contribution of the meaning of an utterance within formal theories of grammar, it is insufficient to add prosody only as a module alongside syntax and semantics. Instead, grammatical models must also take account of an interaction between prosody and lexical semantics.
The organization of this article is as follows. We first begin with an introduction to the role of prosody in Korean and the ways in which it has been previously examined and analyzed by the field. We then provide the broader context in which research is situated, which is at the interface of syntax and prosody. Both the theoretical and experimental motivations of our study are presented, followed by our reporting of the study itself, detailing the methods and the results, and a discussion of the implication of our findings and methodological issues. We also touch on issues regarding formal analysis of the Korean syntax-prosody interface using Lexical Functional Grammar (LFG) and conclude by proposing further avenues for future research.
2 Background
Prosody in Korean is associated with grammatical mood, with characteristic prosodic patterns for declarative, interrogative, propositive, and imperative moods (Jun, 2005). It can also disambiguate open and polar questions in sentences with ambiguous
madam-top when/sometimes feel.dizzy-pst-pol Declarative: “Madam sometimes feels dizzy.” Open: “Madam, when do you feel dizzy?” Polar: “Madam, is there any time that you feel dizzy?”
2.1 Korean prosody
Jun (2005) provides an account of standard Korean prosody without lexical tone, stress, or pitch accent, in which prosodic phrases are marked by tone patterns at their boundaries. Above the level of the prosodic word, the core building block is the AP, which Jun defines as “a tonally demarcated unit which can contain more than one lexical item” (p. 205). The underlying tonal pattern for an AP is THLH, with T-H at the left edge of the AP and L-H at the right end of the AP. The first tone, underspecified for the value of “Tone” and given here as T, is usually L but can appear as H when it is associated with a syllable that has either an aspirated (/kh/, /th/, /ph/) or tense (/k*/, /p*/, /t*/) initial obstruent (Jun, 1996; Jun & Oh, 1996, p. 39). It is also possible for the final tone to be realized as L (Jun, 2000). The four tones that specify an AP are associated autosegmentally with the syllables contained in the AP, illustrated schematically in Example (2) taken from Jun and Oh (1996, p. 40). Where an AP has four syllables, each tone is associated with its own syllable, as shown in Example (2a). If there are more than four syllables, the pitch declines from the T-H on the left edge to the L-H on the right edge across the AP, as shown in Example (2b). Where an AP has three or fewer syllables, either or both of the second and third tones of the underlying form are not realized. This makes the specifying pattern T-(H)-(L)-H, so for a three-syllable AP, tonal options are T-H-H or T-L-H, as shown in Example (2c), and for a two-syllable AP, the specifying pattern is T-H, as shown in Example (2d): (2) a. T H L H σ σ σ σ b. T H L H σ σ σ σ σ c. T H/L H σ σ σ d. T H σ σ
Jun (2005) defines Intonation Phrases (henceforth IPs) as one or more APs that have a final boundary tone pattern, indicated in text with the symbol % after the tones. The final boundary tone replaces the final H of the last AP in the IP, which no longer appears. This boundary tone pattern is observed on the final syllable of the IP, which is lengthened. From Jun’s inventory of nine IP-final boundary tones (Jun, 2005, p. 216), the relevant tonal patterns are HL%, characteristic of declarative statements, and LH%, which is characteristic of questions. Jun and Oh (1996, p. 44) found that, in Seoul Korean, polar questions mostly ended in H% with occasional LH% cases, and wh-questions mostly ended with an LH%, although H% and HL% were also observed. These suggest that the statements end with a low-tone target, and both types of questions end with a high-tone target in most cases.
2.2 Studies examining question focus in Seoul Korean
An early investigation of question focus in Korean was the production study reported by Jun and Oh (1996), who accounted for the disambiguation of sentences such as (1) by the placement of AP boundaries. In their study, stimuli were presented as a two-sentence dialogue. The ambiguous target utterance contained an initial element, a CPF, and a verb. This was either preceded or followed by a sentence that indicated the intended reading, which could be an open question, a yes/no question, or an incredulity question, where the speaker is surprised by the preceding statement. For the polar question reading, an AP boundary is predicted before the final verb, whereas for an open question, this boundary is predicted to be absent. Alongside their analysis of final boundary tones, Jun and Oh (p. 46) found in their experimental data that where a CPF functions as a question word in an open question, it appears in a single AP with the following verb, whereas when the CPF functions as an indefinite pronoun in a polar question, there is an AP boundary before the verb (Figure 5, p. 48).
Jones (2016), responding to Jun and Oh (1996), carried out a speech production experiment where native speakers of Seoul Korean
younger.brother-sbj what/something.obj drink.pst-pol Declarative: “Younger brother drank something.” Open: “What did younger brother drink?” Polar: “Did younger brother drink something?”
younger.brother-sbj what/something.obj outdoor festival-loc drink.pst-pol Declarative: “Younger brother drank something at the open-air festival.” Open: “What did younger brother drink at the open-air festival?” Polar: “Did younger brother drink something at the open-air festival?”
For the short questions, 66% of open questions were produced with no boundary between the CPF and the verb, as predicted by Jun & Oh, and 34% were produced with an intervening boundary. Polar questions showed the converse pattern, with 34% of utterances having no boundary between the CPF and the verb (
In contrast, Yun (2019) contends that only dephrasing after the CPF contributes to the reading of an utterance, whereas the raising of the pitch of the CPF does not contribute to that interpretation. Sentences in which the CPF itself was manipulated to have a higher pitch were only interpreted as open questions 10% of the time. On the contrary, sentences in which pitch points following the CPF were erased (dephrased) were interpreted as open questions 66% of the time. The interaction between pitch raising and dephrasing after the CPF had statistical significance
In later research, Yun and Lee (2022) argue that three prosodic factors, namely the F0 peak height of the CPF, an L tone following the CPF, and the IP boundary tone, are the cues that speakers exploit when distinguishing between polar and open readings of ambiguous questions. In their study, which comprised both a perception and a production experiment, Yun and Lee found that F0 values of the H tone pitch peak in the CPF are higher when reading open questions than when reading polar questions. Female speakers had a tendency for a greater difference in pitch across the readings relative to male speakers. In their perception experiment, for stimuli that were read as a polar question with natural prosody, the removal of the L tone following the CPF and the changing of the sentence boundary tone from H% to LH% increased the likelihood of an open question interpretation. Figure 1 shows the effect of the manipulation, which removed the L tone but maintained the H% boundary tone. Representations of all their manipulations are given in Yun and Lee (2022, Figures 7, 8, p. 33–34). Likewise, for stimuli originally read as an open question, changing the sentence boundary tone from LH% to H% was effective in eliciting a polar question response. However, the addition of an L tone after the CPF was not, by itself, effective. They further found that only changing the F0 height, without also changing other factors, did not alter the participants’ interpretation, that is, when the pitch of the CPF in a sentence originally read as a polar question was raised, and no other factors were manipulated, participants rarely interpreted the manipulated sentence as an open question.

Schematic of one of Yun and Lee’s experimental manipulations. Diagram p1 shows the naturally produced stimulus; p3 shows the effect of removing the L tone. Reproduced from Yun and Lee (2022, Figure 7, p. 33).
Yun and Lee report such findings to be surprising, in particular the lack of effect of raising the pitch of the CPF on perceiving a question as an open question. But they offer two possible accounts, one being that the raised tone was potentially under-represented in the experiment and the other that the raised tone is not significant when disambiguating question types. They posit that increasing the F0 and intensity of the CPF would possibly enhance perceptual salience. They also note the interaction between the pitch peak of the CPF and the final IP boundary tone; when both a raised pitch peak and an LH% boundary tone are present, perception of an open question increases.
2.3 Related studies
Related studies have examined the prosodic realization of question focus in other Korean dialects and the realization of contrastive focus in Seoul Korean. Hwang (2009), investigating question focus in Kyungsang Korean, posits that the critical difference between open and polar questions is the final boundary tone. A perception experiment was conducted in which sentences were constructed from one interrogative clause being embedded within another. In some cases, the wh-phrase occurred in the embedded clause (in situ), whereas in others, it occurred in the matrix clause (scrambled). In addition, clauses varied by prosody; in some cases, both clauses would have the same prosody (both indicative of either an open or a polar question), whereas in others, clauses would differ in prosody (i.e., Clause 1 having open question prosody, with Clause 2 having polar question prosody). Participants
The results from Hwang indicate that in questions that also contained embedded questions, the prosody of the matrix clause (cueing either an open or a polar question) influenced the interpretation of the sentence as the corresponding question type. In addition, these interpretations arose regardless of the position of the wh-phrase itself (either embedded or in the matrix clause). In Kyungsang Korean, which contains distinct particles for open questions (
Turning to contrastive focus, production studies of Seoul Korean show that a raised F0 peak at the focus site is generally important, although often observed alongside other elements. Jun and Lee (1998) found that the start of the focus scope was marked with an AP boundary and that there was a tendency for the AP with the focused constituent to extend into following words. Pitch expansion was also observed in the focused AP, and this signal was more important than duration or post-focus compression in Korean. Lee & Xu (2010) reported F0 expansion during the focused AP, but in this case, it was reliably followed by F0 compression. Hatcher et al. (2024) also found that contrastive focus was expressed primarily through F0 modulation rather than through phrase boundaries. In this study, the nature of prosodic expression depended on the position of the focused constituent in an AP: focus at the start of the AP could result in an elevated F0 peak. They found no clear evidence for the impact of contrastive focus on phrase formation and argue that focus is “just one of several potentially competing structures that determine a sentence’s phrasing” (p. 1). However, Lee et al. (2015), comparing prosodic patterns in Seoul Korean, Mandarin Chinese, and English, found no conclusive evidence for the role of F0 or other prosodic elements, commenting that prosody was “neither clearly marked in production nor accurately recognised in perception” (p. 4754).
2.4 Summary
The role of prosody in perceiving and processing ambiguity in Korean remains a source of robust debate, with expanded and compressed F0 range and lexical biases being seen as factors in disambiguation alongside the AP boundaries. The present study aims not only to contribute further to the debate but also to sharpen our conception of the role of prosodic cues in syntactical analyses.
The gating paradigm (Grosjean, 1980, 1996a) has been used to investigate perceptions of prosody and its contribution to hearers’ interpretation and predictions of audio stimuli, including questions (e.g., Grosjean, 1996b; Hansen et al., 2023; Petrone & Niebuhr, 2014a). Accordingly, we chose this paradigm to explore the potential contribution of the proposed expanded pitch range feature.
3 Experiment 1
We began with a pilot study to test the practical application of the gating paradigm in this question and validate the claims that prosody alone allows hearers to identify the scope of focus and thus disambiguate between statements, open questions, and polar questions.
Participants were played repeated, incrementally longer fragments of six naturally produced utterances, starting each time at the beginning of the utterance, and were asked to categorize the utterance as statement/polar/open/unknown on the basis of what they had heard.
3.1 Method
3.1.1 Stimuli
Six stimulus sets were created. For each set, a three-way ambiguous sentence containing a CPF was used. All of the ambiguous sentences followed the template in Figure 2. Open and polar question stimuli were created from recordings collected as part of Jones (2016), from eight native speakers of Seoul Korean (six female and two male) aged between 18 and 35, studying at the University of Oxford. For each recording, participants were presented with a screen showing contextual information and the ambiguous sentence. They were asked to read the sentence aloud as a question in such a way that it made sense in context, and such that a hearer would be able to infer the context. Where the target was an open question, the context was “You know some, but not all, details of an event.” and where the target was a polar question, the context was “You don’t know whether an event happened.”

Template for preparing stimuli for Experiment 1.
A native speaker of Seoul Korean reviewed the recordings from Jones’ experiment, and for the open and polar question types, selected the recording that most clearly portrayed the associated meaning, resulting in recordings made by five female speakers and one male speaker. The native speaker identifying the utterance types was naïve to the experiment and the underlying research. The native speaker also had broad training in linguistics but very minimal training in syntax and/or prosody.
Statement stimuli were recorded at the time of this experiment by a further female native speaker of Seoul Korean who was asked to read the sentence aloud so that it would be unambiguously understood as a statement. All of the above-mentioned stimuli were recorded digitally with a sampling rate of 44.1 kHz using a professional-grade microphone in a sound-attenuated room.
Using Praat (Boersma & Weenink, 2023) and scripts provided by Lennes (2017), the selected recordings were divided into audio segments following Row 2 of the template in Figure 2. From these segments, a series of incrementally longer utterances was generated, each utterance adding one segment. Thus, for participants, the first constituent and the intermediate constituents were each presented as word-length fragments, and the CPF and verb were presented as fragments that increased one syllable at a time.
3.1.2 Procedure
Participants accessed the stimuli via the PsyToolkit website (Stoet, 2010, 2017), after confirming that they were native speakers of Korean and giving their consent to participate. Stimuli were divided into three cohorts using a Latin square design such that each participant was presented with six trials, two of each stimulus type (open, polar, statement). These trials were presented in a random order. Within each trial, the individual utterances were presented in turn, in increasing length (see Figure 6, which illustrates the similar procedure for Experiment 2). After each presentation, participants selected a checkbox to indicate whether they had heard a statement, an open question, or a polar question, or whether they did not know. Participants then had to click a button to confirm their choice and move to the next item.
3.1.3 Participants
In total, 26 participants started the experiment, of whom 12 completed all six trials and a further participant completed three trials. The remaining 13 participants dropped out of the experiment during the first trial; we discuss this further in Section 3.3. All participants were native speakers of Seoul Korean residing in Korea or the United States, recruited through social networks. Further demographic information was not collected.
3.2 Results
Results are presented for the 75 completed trials. Table 1 shows the responses given by participants to the stimuli during their presentation. The figures for CPF and verb, where the incremental step was by syllable, include all of the steps.
Hearers’ Disambiguation of the Stimuli.
The pilot results show that statements (19/25 trials) and open questions (24/25 trials) were ultimately reliably disambiguated, and that for open questions, the raised F0 peak at the focused CPF often allowed disambiguation even before the possibility of post-focus compression was available. For statements, there was a tendency to erroneously disambiguate at the CPF or subsequently, with evidence for the correct meaning building during the later stages of the utterance. The picture for polar questions is more complicated. Similar to statements, there was a tendency to identify the utterance as an open question once the CPF had been heard, but ultimately disambiguation was not reliably successful, with only 8/25 trials correctly identified and 16/25 trials incorrectly identified as open questions.
3.3 Interim discussion
The gating paradigm worked, but the high participant dropout rate suggested that the nature of the interface and the size of the incremental steps needed to be improved for the full experiment. Feedback from participants suggested that the detailed operation of the pilot experimental website, where multiple actions were required between each presentation of an utterance, may have had an effect, and our method for Experiment 2 was amended to address this by reducing the number of increments within each trial and by building a smoother interface with only one click needed to progress to the next utterance.
One possible explanation for the incorrect disambiguation for statements and polar questions at the CPF could be that these indefinite pro-forms are preferentially parsed as a question word in the absence of an easily accessible antecedent. Because the utterances are out-of-the-blue, this leads to a default question reading for the word, which is associated with an open question reading for the whole utterance. For statements, the unambiguous HL% boundary tone is inconsistent with an open question reading and so forces the correct reanalysis, but for questions, the cue from the LH% boundary tone is still consistent with an open question reading. For reanalysis to occur, the hearer must also pay attention to the expanded pitch range at the verb, and if they have already committed to the open reading, this may be less likely.
4 Experiment 2
The results from Experiment 1, the pilot study, were only partially in line with our predictions. For open questions, the presence of a CPF (and potentially its associated prosody) seemed to provide a strong early cue to disambiguation, but for polar questions, this seemed to be a distractor, even without the prosody associated with open questions. We also had concerns about the high dropout rate of participants early in the experiment and that the cumbersome experimental interface might have provided a confound.
In light of this, we explore further and investigate the impact of manipulating F0 levels, using an improved web interface and reducing the number of repeated presentations for each stimulus. Our research question was:
4.1 Hypothesis and predictions
Our main hypothesis, and the basis on which we started the study, was that expanded pitch range is the primary cue used by hearers to decide which constituent of the question is in focus, and therefore, how to disambiguate the occurrence. Expanded pitch range, following Jones (2016), was assumed to be present at the focused constituent. This gave us specific regions of interest in our test stimuli. For open questions, the region of interest was the AP containing the CPF, and for polar questions, the region of interest was the AP containing the verb plus the sentence-final question tune LH%.
The predictions according to this hypothesis for the two extremes of the variation continuum (naturally produced stimuli vs. stimuli with expanded pitch range removed) are shown in Table 2. We predicted that for the intermediate levels of variation, a gradient response would be observed,
Main Hypothesis Predictions—Expanded Pitch Range (EPR) Assumed as the Determiner.
However, from the pilot study, there was a tendency for polar questions to be mistakenly identified as open questions. We thus had an additional hypothesis that there is a lexical preference for questions with CPFs to be interpreted as open questions, and that this preference may override the effect of polar question prosody. The predictions for this second situation are shown in Table 3. Again, we predicted a gradient effect for those situations where prosody is involved in hearers’ decision-making, larger F0 ranges being associated with more successful disambiguation of the utterance type.
Additional Hypothesis Predictions—Interaction Between Expanded Pitch Range (EPR) and Lexical Preference.
4.2 Method
4.2.1 Participants
Participants were recruited through Prolific. 1 All participants reported their first language as Korean. A total of 124 participants completed the experiment, of which 85 identified as female, 38 as male and 1 did not state a gender. The age range of participants was 18–68 years with a mean age of 32.08 years and an interquartile range of 25–36 years. In total, 102 participants were born in Korea, 17 in the United States, 3 in Canada, 1 in Germany, and for 1 participant, these data were unavailable. In total, 57 participants were residing in the United States, 26 in Korea, and 19 in Canada. A total of 14 participants were residing in other English-speaking countries, and eight were residing in other countries. Participants were paid GBP 1.75 for participation, which represented a payment of GBP 12.00 per hour at the median length of time to carry out the experiment.
4.2.2 Stimuli
Twenty-one sets of stimuli were generated by a native speaker of Seoul Korean, recorded in a sound-attenuated room at a sampling frequency of 44.1 kHz. Each set consists of the same ambiguous sentence, read aloud three times with the speaker asked to produce the sentence as a statement, an open question, or a polar question, respectively, as in Example (5):
school.days during who/someone-obj secretly love-pst-pol a. “(I) secretly loved someone when I was at school.” (statement) b. “Who did you secretly love when you were at school?” (open question) c. “Did you secretly love someone when you were at school?” (polar question)
Once the recordings were generated, we then created the corresponding TextGrids in Praat (Boersma & Weenink, 2023) using a script readily available online. 2 Working within Jun’s (2005) description of the Korean AP, we followed Jones (2016) in assuming that focus is associated with expanded pitch range. The schematic diagram in Figure 3 shows an idealized version of the prosodic patterns that we are assuming; the placement of phrase boundaries in the naturally produced stimuli is shown in Table 4. All 63 naturally produced stimuli had an AP boundary after the first constituent, and no stimuli had AP boundaries within either the CPF or the verb. In total, 23 of the 63 stimuli had a boundary pattern matching the schematic diagram (pattern F in Table 4).

Schematic diagram of the prosody for the three types of stimuli showing constituent boundaries and declination. The regions of interest for open and polar questions are shown in gray.
AP Boundary Placement Within Stimuli.
There are no significant differences in the distribution of patterns between the three stimulus categories.
The remaining 40 stimuli showed some variation from the idealized pattern in the placement of AP boundaries after the first constituent. Table 5 shows how the presence or absence of AP boundaries at specific points during the stimuli was distributed between the different categories. Open question stimuli were significantly different from the other two categories in the presence of an AP boundary immediately after the CPF
Differences Between Stimulus Categories in the Placement of AP Boundaries at Specific Points During the Utterances.
In 10 of the 21 stimulus sets, the AP boundary patterns were identical across all three categories. Of the remaining 11 sets, 10 had polar questions and statements patterning together; one set had open and polar questions patterning together; and one set had open questions and statements patterning together.
Having recorded and analyzed the baseline stimuli, we proceeded to generate test stimuli by manipulating the F0 contour in the region of interest, which was the CPF for open question stimuli and the verb for polar question stimuli. Stimuli for declarative statements were not manipulated. All manipulations were based around F0 measurements from the entire AP that included the region of interest. Up to four points were measured, depending on how Jun’s T-H . . . L-H tone pattern was realized. An example of the measurement points for one AP in one stimulus is shown in Figure 4.

Praat-generated F0 contour for the final AP of a polar question stimulus. The four reference points T, H1, H2, and L are used in calculating F0 values for the variant stimuli.
From each baseline open and polar question utterance, we created a set of five test stimuli. The sets had four equal steps between the full extent of pitch expansion in the region of interest and a baseline F0 contour measured in the corresponding region of the comparator stimulus. For open questions, the comparator was the polar question, and for polar questions, the comparator was the open question. All manipulations
For open questions, the region of interest was not at the edge of the sentence, and so initial and final F0 were the same for original and manipulated stimuli. For polar questions, the verb was in focus, and so the region of interest was the AP containing the verb. Because this AP was also IP-final, the AP-final H tone was replaced by the LH% boundary tone associated with questions. Because the statement had a sentence-final HL% boundary tone, the pitch expansion contrast for polar questions was with the corresponding open question. Again, we created test stimuli that have four equal steps in the region of interest, shown in Figure 5.

Creation of variants for the open question:
For polar questions, we followed Jones (2016) and assumed that there was also an expanded pitch range at the final LH% tone. We therefore also created four equal steps in the utterance-final verb and particle. Here, the extent of manipulation was the natural F0 difference between open questions and polar questions.
The aforementioned variants were produced by manipulating the F0 contour using Praat (Boersma & Weenink, 2023) using the following procedure:
For each stimulus in a set, the regions of interest were identified.
For each element of the region of interest, the log F0 was taken at the start, the maximum, the minimum, and the end of the phrase. Logarithms were used so that the manipulated variants would be equally spaced in terms of pitch rather than frequency. The start-maximum F0 range for APs, here called
For the open and polar questions, the
The
3. For the open question, there was one element of the region of interest and one point to manipulate: the
The F0 contour of the natural open question was streamlined to remove all points except the start, the maximum, the minimum (if this was not also the end of the phrase), and the end of the phrase. The maximum point was then changed to the revised F0. Four variants were produced for each open question, with the proportion
4. For the polar question, the AP forming the region of interest includes the sentence-final LH% tone. Within the region of interest, there are three points to manipulate: the maximum of the AP before the sentence-final LH% tone, the pitch at the start of the sentence-final LH% tone, and the maximum of the sentence-final LH% tone. (a) For the maximum of the AP, the focus F0 range
(b) For the pitch at the start of the sentence-final LH% tone, the focus pitch range
(c) For the maximum of the sentence-final LH% tone, the focus pitch range
(d) Having calculated the manipulated values, the F0 contour of the natural polar question baseline was streamlined to remove all pitch points except the start, the AP maximum, the AP minimum (if this was not also the end of the phrase), the boundary between the AP and the LH% tone, the maximum of the LH% tone, and the end of the LH% tone (if this was not also the maximum). The AP maximum, the boundary pitch, and the LH% tone maximum points were then changed to the revised pitches. Again, four variants were produced for each open question, with
We expected that manipulation might reduce the audio quality, and thus, the intelligibility of the stimuli. All 168 manipulated stimuli were validated by asking native speakers of Seoul Korean
Once the full set of manipulated utterances was ready, the individual gating stimuli were prepared, with one set of stimuli for each manipulated utterance. The stimuli were segmented using Praat following the model in Figure 6; there were five segments for each stimulus. The open question region of interest with the CPF was first presented in Stimulus 2, and the polar question region of interest was first presented in Stimulus 4. Only in Stimulus 5 did participants hear the tune associated with either a question or a declarative statement.

Content of individual stimuli within a set. Regions of interest are marked in gray.
Following segmentation, the gating stimulus files were produced using a script amended from the Speech Corpus Toolkit for Praat (Lennes, 2017).
4.2.3 Procedure
Participants were presented with stimuli via a website written using OpenSesame (Mathôt et al., 2012) and jsPsych (de Leeuw et al., 2023), which was powered by a JATOS server (Lange et al., 2015) hosted at the University of Groningen. After giving consent to participate, participants were shown instructions, which included explanations for what statements, open questions, and polar questions are, respectively. Having confirmed that they had read the instructions, participants continued to the data collection screen. Four buttons were presented in a horizontal row at the center of the screen with labels in Korean
Stimuli were played automatically when the page loaded, and once the participant had made a choice, the page re-loaded to play the next stimulus. It was not possible for participants to replay the stimuli.
Because each stimulus set contained 11 members (five variants of open questions, five variants of polar questions, and one declarative statement), participants were randomly allocated to one of 11 cohorts. Each cohort heard one member of each of the stimulus sets in a Latin Square design, a total of 21 trials with no repetition of stimulus sets. During the experiment, each participant was presented with a mixture of open questions, polar questions, and sentences, and for the open and polar questions, there was a mixture of the five variant levels of prosody. The order of presentation of the stimulus sets was random for each participant.
For each trial, participants were presented with the five stimuli in the utterance set, in increasing order of length. Once all five stimuli had been heard, the next utterance set was presented. Four times during the experiment, at the end of an utterance set, participants were asked a question to confirm they were paying attention, in line with guidance from Prolific. The question was a multiple-choice question, and the question included the answer that was required to be given. Participants who answered two or more of these attention questions incorrectly were excluded from the study.
4.3 Results
We begin with a presentation of the data in Section 4.3.1 before introducing a descriptive statistical model in Section 4.3.2.
4.3.1 The data
In this section, we present the raw experimental data. Participants who failed the attention checks as described above were excluded (2/126). All data points from the remaining 124 participants were included in the analysis.
Did participants accurately disambiguate the stimuli? Figure 7(a) shows how participants disambiguated open question stimuli after the whole of the stimulus had been heard, and Figure 7(b) shows the same for the polar question stimuli.

The impact of variant on participants’ responses to question stimuli after the stimulus had been completely heard. (a) Open stimuli. (b) Polar stimuli. X-axis = the percentage of natural prosody present. Y-axis = the number of responses.
Open question stimuli were reasonably reliably disambiguated (range = 79%–84%). However, polar question stimuli were disambiguated much less reliably (range = 29%–55%). Only with 100% of natural prosody were more than 50% of the stimuli reported as polar questions. For all other manipulations, there was a preference to report the stimuli as open questions.
Statement stimuli had no prosodic variants, so Figure 8 shows how participants’ responses to these stimuli changed over time. In some trials, statements were identified as open questions at the CPF and following adverbial, and at the verb, the responses were spread between don’t know, statement, and open question. However, by the end of the stimulus set, statement stimuli were being reliably identified.

Responses to statement stimuli during iterative presentation. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle
How did prosody affect disambiguation during the utterances? Figure 9(a) shows the proportion of open question stimuli that were correctly identified with different levels of natural prosody, and Figure 9(b) shows the same for polar question stimuli.

The impact of manipulating prosody on the timing of correct responses. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle
For the open questions, accuracy increases as more of the stimulus is heard, with accuracy increasing most strongly after Segment 3 and a smaller increase in accuracy from Segment 2 to Segment 3. There is some gradient effect associated with the proportion of natural prosody at the CPF and the subsequent adverbial, but this disappears by the end of the utterance. It appears that Segment 2 is more important in disambiguation than Section 3. However, this is not to the extent that would support a claim that expanded pitch range at the CPF (Segment 2) is unambiguously associated with open questions. If this were the case, we would have expected a higher level of disambiguation at Segment 2 modulated by the proportion of focus prosody present.
For the polar questions, disambiguation seems to begin at the verb (Segment 4), but the highest level of disambiguation takes place once the sentence-final LH% tone at the particle
Figure 10(a) shows how prosody affected the incorrect disambiguation of open question stimuli during the repeated presentations. Figure 10(b) shows the same for polar question stimuli.

The impact of manipulating prosody on the timing of incorrect responses. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle
For open questions, there is a slight increase toward the end of the sentence but no discernible gradient effect of prosody. For polar questions, some participants identify the stimulus as an open question as soon as the CPF is heard (Segment 2), and misidentifications continue to increase as more of the stimulus is heard. In this case, there appears to be a gradient effect of prosody at Segment 5 but not earlier, with increasing natural prosody leading to fewer inaccurate disambiguations. A gradient effect would be in line with theoretical predictions, but the level of incorrect disambiguations is not, particularly the continuing increase at Segments 4 and 5, once the verb has been heard.
4.3.2 Statistical model
Five generalized additive models (GAMs; Hastie & Tibshirani, 1990) were constructed using the packages
The maximal formula used for the models is shown at (6). The left-hand side term
For statements, right-hand side terms (a) and (c) were not used in the model because no pitch manipulation was used in preparing statement stimuli, and the value of the parameter
Figure 11 shows the effect of the segment on participants’ correct identification of statements. By the region of interest, Segment 5, statements are being correctly identified as expected. The model predicts a slight reduction in accuracy at Segment 2, when the CPF is heard.

Effect of segment on the correct identification of statement stimuli. The region of interest is
The pairs of models with and without the fixed interaction between segment and variant were compared. The amount of deviance explained was similar between pairs (63.0% for the open category, 51.8% and 51.7% for the polar category). Using the Akaike information criterion (AIC), for the polar category, the model with the interaction between variant and segment was preferred (AIC 2464 vs. 2481), and so this model was selected. For the open category, the model
The open category model explained 66.9% of the variance in the data, with a significant
Figure 12(a) shows the effects of segment and variant according to the preferred model on participants’ correct prediction of open question stimuli, and Figure 12(b) shows the same for polar question stimuli. For open questions, the stimuli were correctly identified significantly above chance levels (0.25) after Segment 4. There is no evidence of an effect of the variant on the time that stimuli were correctly identified. In other words, there is no evidence that expanded pitch range played a role in participants’ decisions.

Effect of segment and variant on the correct identification of question stimuli. Region of interest for open questions is
For polar questions, the stimuli were never significantly correctly identified above chance levels. However, there is evidence of an interaction between variant and segment; in other words, a greater amount of expanded pitch range increased the likelihood that participants would correctly identify the stimuli. However, it was only at Segment 5 and with 75% or higher expanded pitch range that correct identifications were at chance levels; earlier in the sentence for all levels of pitch expansion, and at Segment 5 for 50% of lower pitch expansion, participants were significantly more likely to identify the stimulus incorrectly (as an open question, a statement, or unknown).
5 Discussion
We undertook a large-scale online study where participants listened to recordings of syntactically ambiguous utterances that had been produced using prosodic patterns that are canonically associated with statements, open questions, or polar questions. We manipulated the size of F0 variation in the stimuli, to explore the role of a proposed feature
The results confounded our expectations. Statements, where the stimuli had no prosodic variation, were reliably disambiguated at the end of the utterance, as predicted. However, for the two question types, predictions were not met, but in different ways for each question type. Open questions were ultimately reliably disambiguated, but disambiguation rose above chance levels only once the verb had been heard. There was no gradient effect arising from the prosodic manipulation. Polar questions were never reliably disambiguated, and there was a significant effect of the prosodic manipulation only in interaction with the position in the sentence. But even the most accurate disambiguation, with 75% or more of natural prosody once the whole utterance had been heard, was not significantly above chance levels. Accordingly, we cannot support a position that prosody is the primary determinant of disambiguation in this case.
5.1 The role of prosody
5.1.1 The nature of the prosodic expression
Although Jones (2016) takes expanded pitch range as applying across a number of syllables in the focused constituent, the method we used to construct the stimuli used the F0 peak within the relevant AP, streamlining the contour between this point and the boundaries of the phrase. This approach is more in line with the F0 peak as described by Yun and Lee (2022), where a positive association was seen between the height of the F0 peak and an open question reading. However, we did not see a reduction in open question interpretation as the F0 peak at the CPF decreased, which would have been predicted by Yun and Lee’s results.
5.1.2 Post-focus compression
It is also possible that post-focus compression is a necessary element of prosody, alongside expanded pitch range. In a similar case involving disambiguating wh-interrogatives from wh-declaratives in Mandarin, Yang et al. (2020) found that open questions showed a more compressed F0 range relative to their declarative counterparts. The stimuli for open questions all had the natural prosody of an open question after the AP containing the CPF. Thus, even for the variants where the expanded F0 range had been removed, the subsequent F0 range compression may have been detectable. We did not control for this, and so the question remains open for further investigation.
5.1.3 The status of AP boundaries
Our study did not set out to explore the role of AP boundaries in disambiguation, but it is possible to make some inferences about their impact. If the placement of AP boundaries is the crucial determiner of disambiguation, then we would expect to see no gradient effect in either the open or polar categories, because only the pitch peaks within the regions of interest were manipulated, and the low tones at phrasal edges were unchanged. For open questions, we would also expect to see successful disambiguation at or shortly after the region of interest Segment 2; the study design means that there is time to fully process the sentence fragment before making a decision. However, a gradient effect
5.2 Factors other than prosody
5.2.1 The status of CPFs
Our design assumed that there is no preferred reading for CPFs, but the data suggest that there is a degree of lexical preference for interpreting them as open questions rather than indefinite pronouns. For statement stimuli, where there was no manipulation of natural prosody, the 10%–15% of participants who identified an utterance type at the CPF or the subsequent adverbial largely thought the utterance was an open question. Even when the sentence-final boundary tune had been heard, just more than 10% of participants continued to identify the statement stimuli as open questions. For the polar category, at least 75% of natural prosody at the verb, Segments 4 and 5, was required to bring disambiguation up to chance levels, and even with full natural prosody at the verb, an open question reading was as likely as a polar question reading. Within the limits of the study, it was not possible to carry out a corpus investigation to explore this further.
5.2.2 The role of context
Cross-linguistic evidence shows that context interacts with prosody in disambiguating ambiguous utterances. Snedeker and Trueswell (2003) studied syntactic ambiguity in prepositional phrase attachment in English and found that speakers produced strong prosodic differences when contextual information was insufficient to disambiguate between syntactic structures. These prosodic cues significantly contributed to listeners’ ability to disambiguate. However, when speakers were unaware of the ambiguity, they produced weaker prosodic cues, making it more difficult for listeners to rely on them for disambiguation. Similarly, Hansen et al. (2023), investigating prosodic grouping in coordinated name sequences in German, examined how prosodic cues such as F0 range, final lengthening, and pause signaled internal grouping within three-name sequences. Using a gating paradigm, they tested whether listeners could predict these groupings based on boundary-related prosodic information. They found that only minimal prosodic information related to grouping was necessary; most of the listeners were able to disambiguate after the first name before the grouping information was available. Interestingly, listeners used different disambiguation strategies: some preferred to wait for as much information as possible, whereas others started the identification process early on. In our study, where utterances were presented out of the blue with no supporting context, we also found late disambiguation, which underscored the interpretation that listeners tried to wait for as much information as possible before they attempted disambiguation. Moreover, we saw that listeners relied little on the prosodic information from the CPF; instead, the lexical meaning of the CPF appeared to have biased the listeners’ interpretation toward questions. These findings echo Song et al. (2022), showing that prosodic features are not the only factors in disambiguating Korean polar and wh-questions, as subtle lexical meaning can also influence the interpretations. A similar phenomenon has also been reported in other languages (e.g., see Zhang, 2018, p. 146, for Tianjin Mandarin).
5.2.3 Other methodological points
Our sample size is relatively large, and this may have contributed to unexpected patterns being revealed. However, we note that our pilot study
One possible confound is that throughout our study, participants potentially forgot the meanings of the different terms
An aspect that may have prevented the early identification of open questions was the presence of statements in the stimuli. Because the defining distinction in Korean between statements and questions is the utterance-final tune (HL% for statements vs. LH% for questions), which in the polite speech style is associated with the particle
5.3 Implications for the prosody–syntax interface
Jones (2016) presents a model of this phenomenon in LFG (Bresnan & Kaplan, 1982). LFG is a modular, declarative, constraint-based, computationally robust grammar theory that supports analyses of language from the spoken or written utterance through to representations of meaning and discourse. Different elements of language such as syntax, semantics, prosodic structure, and information structure are represented in distinct modules; LFG analyses propose constraints both within individual modules and also at the interface between modules. The theory is thus well-suited for developing accounts of the relationship between prosody and meaning.
Two main approaches have been proposed to analyzing the interface between prosody and meaning in LFG. Bögel (2015, 2022) takes a bottom-up approach where information on F0 and syllable duration is combined with lexical representations such that the interface is modeled on a word-by-word basis. Dalrymple and Mycock (2011) and Mycock and Lowe (2013) take a top-down approach which models the interface at the edges of prosodic and syntactic constituents. More information about the formal treatment of prosody in LFG can be found in Bögel (2023).
Jones’s 2016 model follows the edge-based approach. In his analysis of the data, F0 expansion was seen not only at the F0 peak in an AP but also at the following syllables in the phrase. Accordingly, he assumed that F0 expansion spread leftwards from the right edge of an AP. He also assumed that the position within the sentence of the constituent edge associated with this expansion was associated with the right edge of the syntactic element bearing question focus, from which he derived a formal account of the different readings of open and polar questions. Our results do not support that analysis. In our experiment, F0 expansion was linked to an H tone within the AP, rather than at the right edge. There also seems to be a lexical preference for a question-word reading of the CPF, whether or not the CPF is produced with F0 expansion (see Table 6).
Participants’ Interpretations of Question Stimuli.
However, the contribution of prosody is also not entirely absent; the presence of natural canonical polar question prosody at the verb partially inhibited participants from interpreting the stimuli as open questions and brought decisions to chance levels. A successful analysis needs to allow for this interaction while recognizing that there are differences—whether individual or situational—in the weight that is given to prosodic evidence in making a decision. An initial analysis of these data using lexical preferences in the edge-based approach is presented in Jones et al. (2024).
6 Conclusion
We began our research for this article with an assumption—based on the results of a previous production experiment and in line with the prevailing view in the literature—that prosody was central to the correct perception of Korean sentences containing indefinite content pro-forms that are ambiguous between statements, open questions, and polar questions. Our results lead us to believe that the situation is considerably more complex. Although we found that prosody does play a role in hearers’ correct identification of polar questions, it appears that in the absence of other contextual information, the presence of an indefinite content pro-form creates a strong bias toward an utterance being interpreted as an open question.
Footnotes
Acknowledgements
Our thanks go to Lillian Phillips for her help in creating the pitch-manipulated stimuli and recruiting participants, to Jacolien van Rij for her support in constructing and interpreting the statistical model, and to the editor, Hae-Sung Jeon, and three anonymous reviewers for their comments, which have substantially improved the paper. No external funding was used to carry out the study.
