Abstract
This study investigates the processing of wh-dependencies in English by native speakers and advanced Mandarin Chinese-speaking learners. We examined processing at a filled gap site that was in a licit position (non-island) or located inside an island, a grammatically unlicensed position. Natives showed N400 in the non-island condition, which we take as evidence of gap prediction; no N400 emerged within the island. Learners yielded P600 in the non-island condition, suggesting learners did not predict a gap, but rather experienced syntactic integration difficulty. Like natives, learners showed no effects inside the island. Island sensitivity was also observed for both natives and learners in an offline acceptability judgment task. We also explored whether event-related potentials (ERP) responses were related to attentional control (AC), a cognitive ability that has been related to predictive processing in native speakers, in order to examine whether variability in processing in learners and native speakers is similarly explained. Results showed that increased AC was associated with larger N400s for natives and larger P600s for learners in the non-island condition, suggesting that increased AC may be related to prediction for natives and to integration effort for learners. Overall, learners demonstrated island sensitivity offline and online, suggesting that second language (L2) processing is indeed grammatically-guided. However, ERP results suggest that predictive processing in the resolution of wh-dependencies may be limited, at least for learners whose first language (L1) does not instantiate overt wh-movement.
I Introduction
In languages such as English, questions and relative clauses have been argued to involve wh-movement (e.g. Chomsky, 1981, 1986). In the literature, the relationship between the fronted wh-word and the position in the sentence from which it originated is known as a wh-dependency. In (1b), the wh-element (what), also called the ‘filler,’ has been displaced from its original position in the syntactic structure, called the gap site, and moved to the beginning of the clause.
(1) a. Harry ate chocolate. b. I wonder what Harry ate ___.
During real-time processing, it is not possible to integrate a fronted wh-element into the syntactic and semantic representation immediately upon encountering it. Instead, the wh-expression must be structurally integrated and interpreted at the actual gap site. This process has been characterized as ‘active’ or predictive, such that the parser attempts to resolve a wh-dependency (i.e. predict a gap) in a top-down manner before confirmation in the bottom-up signal as to the location of the actual gap site (i.e. a missing constituent) (e.g. Crain and Fodor, 1985; de Vincenzi, 1991; Frazier, 1987; Frazier and Clifton, 1989; Nicol, 1993; Nicol and Swinney, 1989; Pickering and Traxler, 2001, 2003). During processing, the parser prioritizes wh-dependency resolution, such that the search for a potential gap site begins immediately upon encountering a wh-filler and continues until the wh-dependency is successfully resolved. The parser is hypothesized to predict potential gap positions at each grammatically licensed position in the sentence until the dependency is completed (e.g. Clifton and Frazier, 1989; Frazier, 1987; Frazier and Clifton, 1989; Frazier and Flores D’Arcais, 1989). The current study examines whether wh-dependency resolution proceeds similarly in second language (L2) processing, testing whether learners engage in predictive processing.
We also investigate to what extent grammatical information is used to predict potential gap sites, examining whether the parser avoids predicting gaps within positions that are not licensed by the grammar. Wh-movement in English is constrained, such that extraction is barred from specific positions called ‘islands’ (Ross, 1967). For example, (2a) contains a relative clause, one type of island domain, and (2b) demonstrates that extraction from the relative clause renders the sentence ungrammatical.
(2) a. Meghan likes the store [that sells dark chocolate]. b. * Which chocolate does Meghan like the store [that sells __]?
Psycholinguistic evidence indicates that the parser is sensitive to island constraints and does not attempt to resolve the wh-dependency within an island (e.g. Stowe, 1986; Traxler and Pickering, 1996). However, it is still unclear whether L2 learners similarly show evidence of the real-time application of island constraints during processing (e.g. Aldwayan et al., 2010; Boxell and Felser, 2017; Canales, 2012; Felser et al., 2012; Johnson et al., 2016; Kim et al., 2015; Omaki and Schulz, 2011). The current study builds on this body of psycholinguistic research to investigate these issues using electroencephalography (EEG), a technique that can shed light on the time-course of wh-dependency resolution and can also provide information about whether natives and L2 learners process wh-dependencies using qualitatively similar mechanisms.
1 Wh-dependencies and predictive processing
Many studies have examined the processing of wh-dependencies using a ‘filled-gap’ paradigm (e.g. Boland et al., 1995; Bourdages, 1992; Crain and Fodor, 1985; Frazier and Clifton, 1989; Stowe, 1986; see also Fujita and Cunnings, 2020). In Stowe’s (1986) seminal study, participants read sentences such as those in (3). Reading times in sentences with wh-extraction as in (3a) were compared to declarative sentences without extraction (3b).
(3) a. My brother wanted to know who Ruth will bring us home to __ at Christmas. b. My brother wanted to know if Ruth will bring us home to Mom at Christmas.
Stowe found that native speakers showed reading time slowdowns at us in (3a) as compared to (3b), an effect referred to as a filled-gap effect, which has been argued to reflect the prediction of a gap in a position that is already filled with lexical material (e.g. Clifton and Frazier, 1989; Stowe, 1986). The idea is that the parser engages in active gap prediction prior to confirming that the direct object position is available.
Alternative mechanisms which characterize filled-gap effects as integrative as opposed to predictive have been proposed under different theoretical accounts (e.g. Pickering and Barry, 1991; Pickering and Gambi, 2018), making it difficult to unambiguously tease apart the precise mechanism underlying the processing of filler-gap dependencies. We adopt the analysis that filled-gap effects provide evidence of predictive processing due to evidence such as that provided by Omaki et al. (2015), which suggests that post-verbal gaps are posited before the verb is even encountered. In their design, the transitivity of an intermediate verb was manipulated (e.g. The book that the author wrote/chatted), and native English speakers showed evidence of experiencing a processing disruption upon encountering the intransitive verb, which is only expected if they were predicting an upcoming object gap. Omaki et al. refer to this as hyper-active gap filling, a prediction mechanism initiated by pre-verbal information. As we discuss below, a strength of addressing this question in the present study using EEG is that we are able to examine two distinct event-related-potentials (ERP) components which have been argued to index prediction and integration, using this evidence to shed further light on the mechanisms underlying the processing of wh-dependencies.
There is mixed evidence with respect to whether L2 learners similarly engage in gap prediction, and whether prediction occurs on a similar time-course as for native speakers. In a large-scale self-paced reading study, Johnson (2015) found that both native English speakers and Korean-speaking learners of English showed filled-gap effects in a pre-verbal, subject position, providing evidence of gap prediction. Notably, Johnson found that individual differences in processing capabilities modulated the ability to predict gaps; natives and learners who performed better on a number Stroop task yielded a larger filled-gap effect. This may suggest that during the processing of wh-dependencies, individuals with greater attentional control resources may be better able to focus attention to predict a gap is forthcoming, yielding a larger reading time slowdown upon finding the potential gap position filled.
This finding is in line with proposals which suggest that the ability to predict in the L2 may be dependent on a range of factors, including proficiency and individual differences in cognitive abilities (e.g. Hopp, 2013; Kaan, 2014). That is, it may be the case that only learners with advanced proficiency and sufficient cognitive resources will show evidence of prediction. The current study builds on this previous literature, focusing on L2 learners at an advanced level of proficiency, and following Johnson (2015) by investigating the role of the attentional control. Attentional control has been shown to capture variability in L2 syntactic gap prediction during wh-dependency resolution (Johnson, 2015), lexical-semantic prediction (Zirnstein et al., 2018) and language processing more broadly (e.g. Akhavan et al., 2020; Boudewyn et al., 2012, 2013; Hutchison, 2007). Attentional control is argued to be involved in cognitive processes such as maintaining attention in the presence of distractors, retrieving correct information during interference, and inhibiting habitual responses, capabilities arguably involved in prediction, a process that entails generating and maintaining a predicted element while simultaneously processing the bottom-up input during comprehension (e.g. Kane and Engle, 2003). Brain-imaging studies have suggested that the prefrontal cortex is engaged in tasks that target attentional control as well as during expectation-driven processing in general (e.g. Bar, 2009; Kane and Engle, 2002).
Although some L2 studies have provided evidence of native-like predictive processing (e.g. Foucart et al., 2014; Hopp, 2013; Johnson, 2015; Leal et al., 2017; Zirnstein et al., 2018), others have failed to show predictive effects for L2 learners in contexts in which native speakers predict (e.g. Grüter et al., 2012, 2017; Kaan et al., 2016; Lew-Williams and Fernald, 2010; Martin et al., 2013; Mitsugi and MacWhinney, 2016). Grüter and colleagues (Grüter and Rohde, 2021; Grüter et al., 2017) have proposed that adult L2 learners have a Reduced Ability to Generate Expectations (RAGE Hypothesis) compared to native speakers, such that learners are unlikely to show effects of prediction during language processing in general. A more nuanced discussion regarding L2 prediction considers the utility of predictive processing for learners, as well as the nature of information L2ers may use to generate predictions (Kaan and Grüter, 2021). For example, Grüter et al. (2020) investigated how L2 listeners weigh different linguistic cues during processing. Results indicated that while native Chinese listeners used a Chinese classifier as a grammatical cue to generate an expectation for an upcoming noun, L2 learners showed a preference for using the classifier as a semantic cue. Thus, the ability to rapidly use grammatical cues to generate predictions may be limited in L2 processing.
The present study contributes to this literature by examining whether individual differences in attentional control may shed light on the variability that has previously been observed. It is possible that predictive processing is more limited in L2 learners with more limited cognitive resources, a possibility that studies which focus only on group-level analyses cannot examine. To this end, we examine the role of attentional control in the processing of wh-dependencies in both L2 learners and native speakers, allowing us to investigate whether attentional control abilities explain variability in predictive processing and whether the variability is similarly explained in both populations.
2 Island sensitivity
Islands provide a strong test case to examine whether grammatical information is utilized during wh-dependency resolution. Using self-paced reading, several studies have shown that native speakers do not attempt to posit gaps inside islands and the search for potential gap sites is grammatically constrained (e.g. Johnson et al., 2016; Stowe, 1986; see also Phillips, 2006; Traxler and Pickering, 1996). In the L2 literature, some studies have argued that learners are sensitive to island constraints online (e.g. Aldwayan et al., 2010; Cunnings et al., 2010; Johnson et al., 2016; Omaki and Schulz, 2011), while others have argued that sensitivity is limited to learners whose first language (L1) also instantiates overt wh-movement (Kim et al., 2015) or that the crucial difference between native and non-native sensitivity to island constraints lies in the time-course of processing, with L2ers showing delays utilizing syntactic structure information to constrain processing as compared to natives (Boxell and Felser, 2017; Felser et al., 2012).
Much of this research has focused on addressing the Shallow Structure Hypothesis (Clahsen and Felser, 2006, 2018), which proposes that L2 sentence processing is qualitatively different than native processing. During real-time processing L2 learners are said to rely primarily on non-grammatical information, such as semantic or pragmatic information, to resolve non-local syntactic dependencies (e.g. Felser and Roberts, 2007; Marinis et al., 2005). This processing strategy results from ‘a reduced or delayed ability to extract relevant grammatical information from the input [which] may make L2 comprehenders more likely than L1 comprehenders to initially compute incomplete or shallow syntactic representations’ (Felser, 2019: 17). If learners underutilize abstract syntactic information during processing, then L2ers would not be expected to posit syntactic gaps. Instead, reliance on semantic/pragmatic information may result in L2 processing of wh-dependencies being verb-driven, with processing focused on linking thematic arguments (e.g. Pickering and Barry, 1991). This processing strategy would result in learners appearing to resolve wh-dependencies in an object position by associating the wh-element with a subcategorizing verb during the assignment of thematic roles.
Marinis et al. (2005) examined whether L2 English learners utilize intermediate gaps during the processing of filler-gap dependencies. In sentences with long-distance cyclic movement, landing sites at clause boundaries provide an intermediate gap site where the filler can be reactivated (Chomsky, 1973). For example, (4a) involves wh-extraction from a complement clause, and it has been proposed that there is an intermediate gap at the clause boundary. In (4b), however, no intermediate gap is present because the wh-extraction occurs across a complex NP.
(4) a. The actress who the journalist suggested __ that the talented writer had inspired __ will go on stage tonight. b. The actress who the journalist’s suggestion about the talented writer had inspired __ will go on stage tonight.
Gibson and Warren (2004) found that native English speakers indeed posit gaps at intermediate landing sites, showing faster reading times at the actual gap site (e.g. inspired) in (4a) as compared to (4b). In contrast, Marinis et al. found that for L2 learners from a variety of L1 backgrounds (Greek, German, Chinese, and Japanese), there was no evidence that an intermediate gap was posited, with no reading time difference at the actual gap site. The lack of an intermediate gap effect is taken to suggest that late L2 learners build shallow structures during processing (i.e. not positing syntactic gaps) in comparison to native speakers.
The Shallow Structure Hypothesis also makes direct claims regarding whether learners show sensitivity to islands. With limited or delayed use of syntactic cues in the course of processing wh-dependencies, learners should not show sensitivity to islands online, and object filled-gap effects would be expected to emerge in both grammatical positions and islands. More recent studies arguing in support of the Shallow Structure Hypothesis have proposed that the learners may be sensitive to island constraints, but that this sensitivity emerges at a delay (Boxell and Felser, 2017; Felser et al., 2012).
Another key issue is the role of the L1. Kim et al. (2015) tested Spanish- and Korean-speaking learners of English matched for proficiency. While Spanish instantiates overt wh-movement, Korean does not exhibit overt wh-movement and therefore does not instantiate island constraints similarly to English. Reading time evidence showed that Korean learners attempted to posit a gap within an island, suggesting that Korean learners were not able to utilize structural information to constrain wh-processing, in contrast to the Spanish learners. Thus, Kim et al. argued that only learners whose L1 exhibits overt wh-movement may show sensitivity to islands online. This proposal was challenged by Johnson et al. (2016), who found that both natives and Korean learners of English showed island sensitivity online. Thus, the role of the L1 in the processing of wh-dependencies remains an open question.
Finally, a debate remains regarding the source of island effects. One interpretation is that the parser avoids positing gaps within islands due to the syntactic constraints which govern wh-movement (e.g. Sprouse et al., 2012; Phillips, 2006, 2013; Wagers and Phillips, 2009). However, a second account has argued that island structures may simply present processing bottlenecks, and thus the parser avoids positing gaps within islands due to increased difficulty resolving wh-dependencies in such complex structures (Hofmeister and Sag, 2010; Hofmeister et al., 2012a, 2012b, 2013; Kluender, 2004; Kluender and Kutas, 1993).
One approach to tease apart predictions of these two accounts has been to examine the role of individual differences. Sprouse et al. (2012) argue that processing accounts predict that individuals with higher processing abilities (e.g. greater working memory) should be more likely to posit gaps within islands as they have greater cognitive resources available to attempt to resolve the wh-dependency inside complex island structures. Grammatical accounts, on the other hand, do not predict such a relationship. Much of this literature has focused on offline measures of acceptability (e.g. Aldosari et al., 2022; Sprouse et al., 2012), although we aim to contribute to this debate by examining the role of individual differences in the processing of wh-dependencies using EEG.
3 Neurolinguistic approaches to the processing of wh-dependencies
Event-related potentials (ERP) have been argued to index different kinds of linguistic processing (e.g. Kaan, 2007; Kutas et al., 2006) and thus, can shed light on the qualitative nature of the mechanisms involved. The N400, a negative-going centro-parietal waveform that peaks at around 300–500 ms, was originally linked to the processing of semantic anomalies (e.g. Kutas and Hillyard, 1980), but has more recently been argued to reflect lexical prediction (Federmeier, 2007; Lau et al., 2008, 2013; Michel, 2014; Van Berkum et al., 2005). These studies have shown that the amplitude of the N400 is modulated by lexical pre-activation, with more expected words showing reduced N400 amplitude compared to unexpected continuations. The literature investigating the role of prediction in the N400 has primarily focused on lexical-semantic prediction, as several studies have shown that in sentences that are highly semantically constraining, readers anticipate upcoming lexical material during processing (e.g. Federmeier and Kutas, 1999; Federmeier et al., 2002).
A study by Michel (2014) suggests that the N400 may also reflect prediction of gap sites during wh-dependency resolution, based on evidence that encountering an actual gap in an unexpected position in the sentence elicits a larger N400 than encountering an actual gap in an expected position. Michel compared brain responses at gap sites located either within an island, rendering the sentence ungrammatical, or in a licit position. For example, (5a) contains an island violation because a gap site is located inside a whether-island. Brain responses at the gap site following the verb befriended (i.e. at openly) were compared to the same region in a sentence without an island (5b):
(5) a. * Who had the sailor inquired [whether the captain befriended __ openly before the final mutiny hearing]? b. Who had the sailor assumed that the captain befriended __ openly before the final mutiny hearing?
At the gap site, an N400 effect emerged, with the island condition (5a) yielding a greater N400 than the non-island condition (5b). This effect was attributed to prediction, in that when the parser is in the process of resolving a wh-dependency, encountering an island boundary (i.e. at whether) forced the parser to revise its prediction that a gap is forthcoming. Thus, when evidence of a gap was encountered within an island, an increased N400 emerged because this gap site was not expected. The study also found that individuals with higher reading span scores, a measure of working memory, yielded greater negativities in the island condition. This is attributed to the fact that high-span readers are arguably better able to revise their prediction in order to avoid positing gaps in illicit positions; therefore, encountering evidence of a gap within an island is particularly unexpected, leading to an increased N400. From these findings, Michel (2014) proposed the Gap Predictability Account of island sensitivity, arguing that the N400 reflects predictability of upcoming gap sites, with those in islands being less predictable.
Relatively few L2 ERP studies have examined the processing of filler-gap dependencies. One approach employs a plausibility mismatch paradigm, in which stimuli either contain a filler that is plausible object of a verb or an implausible one. In a recent study, Jessen and Felser (2019) employed sentences such as those in (6).
(6) a. Bill liked the house that Bob b. Bill liked the women that Bob
In the ERP experiment, native speakers of English and German-speaking learners of English yielded N400 at the target verb region (e.g. built) when attempting to integrate the implausible filler (6b) as compared to the plausible filler (6a). This plausibility mismatch N400 effect indicates that both natives and learners evaluate the plausibility of the filler upon encountering a potential subcategorizing verb. Related work from Dallas et al. (2013) did not report a group-level N400 for Chinese L2 learners, but in a follow-up analysis, English proficiency was shown to be related to N400 amplitude, with higher L2 proficiency related to larger N400 effects, suggesting that native-like processing of filler-gap dependencies is possible with increasing L2 proficiency.
ERP studies which have used a filled-gap paradigm have shown mixed results. Several studies reported the emergence of a late positivity, referred to as a P600, at a filled gap site (e.g. Dong, 2014; Hestvik et al., 2012; Jessen et al., 2017; Schremm, 2012, 2013). Early studies elicited the P600 for syntactic anomalies such as grammatical violations and garden path sentences (e.g. Friederici et al., 1996; Frisch et al., 2002; Osterhout and Holcomb, 1992; for a review see Molinaro et al., 2011). However, studies have also shown that P600 can be elicited for well-formed sentences, suggested to index syntactic integration during the processing of long-distance dependencies (e.g. Felser et al., 2003; Gouvea et al., 2010; Kaan et al., 2000; Phillips et al., 2005). Schremm (2012, 2013) reported P600s for Swedish learners of English and native speakers at an object filled gap site. The onset of the natives’ P600 was 200 ms earlier than the learners’, and Schremm suggested that syntactic processing may be delayed in learners, compatible with predictions of Felser et al. (2012). Jessen et al. (2017) also reported P600s for native English speakers and German-speaking learners of English at a filled gap, which was realized as a resumptive indirect object phrase; notably, the latency of the P600 was the same for both groups.
A limitation of these ERP studies is that the target stimuli are ultimately ungrammatical (Dong, 2014; Hestvik et al., 2007, 2012; Jessen et al., 2017; Schremm, 2012, 2013). This makes it difficult to make inferences about the ERP components elicited, since, unlike grammatical sentences with filled gaps where an actual gap site is ultimately encountered later in the sentence (as in Stowe, 1986 in (3)), encountering a filled gap in these sentences serves as an indicator of whether the sentence is well-formed overall.
II Current study
The current study uses ERPs to examine island sensitivity in Mandarin Chinese-speaking learners of English, investigating whether both native speakers and learners show evidence of grammatically-guided gap prediction during online processing. To examine whether L2 learners engage in prediction during wh-dependency resolution, we utilize a filled-gap manipulation with fully grammatical sentences. If readers predict potential gap sites at each grammatically licensed position within the sentence, then following Michel’s (2014) Gap Predictability Account, an N400 should emerge at the filled gap position. Recall that in Michel’s study, an N400 emerged at a region in which a gap site was not predicted because it was located within an island; in the present study, if the parser predicts a gap but then finds it is filled with lexical material, an N400 should also emerge. That is, as Michel argues, N400 would emerge when encountering unexpected lexical material. However, in line with previous filled-gap ERP studies, it is also possible that we will observe P600s at the filled gap site (e.g. Dong, 2014; Hestvik et al., 2012; Jessen et al., 2017; Schremm, 2012, 2013). Accounts which propose that predictive processing is limited in L2 learners would predict that an N400 (or P600) may not emerge at the filled-gap position (e.g. Grüter et al., 2017; Grüter and Rohde, 2021; Martin et al., 2013). However, if predictive processing is modulated by individual differences in cognitive resources (e.g. Hopp, 2013; Kaan, 2014), a relationship between ERP responses to the filled gap and scores on an attentional control task may emerge. For example, learners with increased attentional resources may show more nativelike ERP effects as compared to learners with fewer cognitive resources, who may exhibit a delay or inability to deploy predictive mechanisms. These analyses also allow us to examine whether individual variability in predictive processing in natives and learners is modulated by the same source, specifically by the cognitive resource attentional control, or whether predictive processing in the domain of wh-dependency resolution is simply beyond the capacity of L2 learners.
Our design manipulated whether the potential gap is grammatically licensed by including a condition in which the potential gap site is located inside an island. Psycholinguistic studies which have found evidence of native-like processing for advanced L2 learners, even in cases where the L1 and L2 differ with respect to whether or not there is overt wh-movement (e.g. Aldwayan et al., 2010; Johnson et al., 2016; Omaki and Schulz, 2011), would predict a native-like pattern of L2 ERP results. Specifically, learners should not attempt to posit a gap inside an island, with no N400 emerging in this condition. In contrast, the Shallow Structure Hypothesis predicts that learners use different cues during the processing of syntactic dependencies, relying on primarily semantic and pragmatic information (Clahsen and Felser, 2006). If learners are unable or delayed in their ability to utilize abstract syntactic information to constrain processing, learners should not show sensitivity to island constraints, potentially yielding N400 in both grammatically licit and illicit domains.
Finally, it is important to point out that Mandarin Chinese, the L1 of our learners, does not instantiate overt wh-movement (Huang, 1982), allowing us to address proposals regarding the role of the L1. For example, Kim et al. (2015) suggest that only learners whose L1 instantiates overt wh-movement may show sensitivity to island constraints online. However, certain phenomenon in wh-in-situ languages, such as topicalization, may arguably involve movement (e.g. Cheng, 2009; Lin, 2006; Qu, 1994) and thus, it is difficult to completely rule out transfer from the L1 if the L2 learners are successful. It is also important to point out that island constraints have been claimed to be universal, making it difficult to determine whether L2 island sensitivity is ultimately derived from universal constraints or through the language-specific properties of the L1 (Hale, 1996). Nevertheless, Mandarin Chinese-speaking learners present an interesting test case for whether the processing of wh-dependencies is grammatically constrained, similar to native speakers.
III Method
1 Participants
The participants were 36 native speakers of Mandarin Chinese (9 males, mean age 24, range 18–36 years). Participants were right-handed and had normal or corrected-to-normal vision. All participants considered themselves native speakers of Mandarin, although many participants also spoke another dialect of Chinese. 1 Table 1 provides a summary of background information. Age of first exposure was considered the age at which the participants began English classes in China. Learners did not arrive to the U.S. until after puberty, and on average had spent around two years in the U.S. at time of testing.
Descriptive statistics for L2 learners’ background information (n = 36).
English proficiency was assessed via the Lexical Test for Advanced Learners of English, which is a test of vocabulary knowledge (LexTALE; Lemhöfer and Broersma, 2012) and the Examination for the Certificate of Proficiency in English (University of Michigan, 2003), a test of English grammar. Scores from the two measures were used to create a composite variable. The average score on the composite proficiency measure was 80.73 out of a possible 100 points (range 69.8–92.6), indicating that participants were overall highly proficient. Note that the range of proficiency scores was purposefully truncated due to our recruiting strategy, in which we only tested learners at an advanced level of proficiency given the length and complexity of our stimuli. Forty-four native English speakers (12 males, mean age 20.25, range 18–30) were also tested; all were right-handed and had normal or corrected-to-normal vision.
2 EEG experiment
A sample set of target sentences is shown in Table 2. Each set consisted of four target sentence types, crossing the factors Extraction (no extraction, wh-extraction) and Island (non-island, island). 160 sets of sentences were constructed. The target sentences were divided into four Latin-Square lists (40 targets per condition in each list), such that every participant read a sentence from every set, but not more than one sentence from a given set.
Design of stimulus materials.
Note. Critical regions are highlighted.
Target sentences began with a first name followed by one of four verbs that take a sentential complement (wondered, questioned, revealed, asked). In ‘no extraction’ sentences, this verb was followed by the complementizer if; in sentences containing wh-extraction, the first verb was followed by the wh-item who. There is a determiner phrase (e.g. the editor) in the embedded subject position across conditions. In non-island sentences, a main verb (e.g. interviewed) immediately followed the determiner phrase; in island sentences, the main verb was preceded by that, beginning a relative clause island. 40 transitive main verbs, repeated four times across the 160 sets of sentences, were used to ensure that there was a potential object gap position. Main verbs were required to be able to take a prepositional phrase with an animate object (e.g. with the reporter) so that they could provide an actual gap site in the non-island condition. The critical filled-gap region consisted of a two-word proper name (e.g. Dave Campbell).
Each list also included 80 filler sentences matched in length to the target sentences. 20 sentences included wh-extraction from a subject position and 20 sentences had wh-extraction from a direct object position, the two positions which contain filled gaps in the target sentences. The remaining fillers were biclausal sentences that did not contain wh-movement. Each list was comprised of 160 target and 80 filler sentences for a total of 240 sentences. 2
Participants were instructed to read all sentences carefully and were informed that they would be prompted to answer comprehension questions following some of the sentences. Yes-no comprehension questions followed one third (n=80) of the stimuli to encourage participants to attempt to derive meaning from the sentences as they would during natural language comprehension (e.g. Was it the editor who interviewed Dave Campbell?). The questions did not target the resolution of the wh-dependency. Each experimental session began with six practice trials. Sentences were presented in random order using rapid serial visual presentation (RSVP), with 450 ms for each word and a 300 ms pause between words. The inter-trial interval was 500–1,000 ms, pseudorandomly varied at 50 ms increments to ensure that the ERP did not become time-locked with the presentation rate.
IV Tasks
1 Offline acceptability judgment task
We also assessed whether L2 learners have offline knowledge of island constraints by including an acceptability judgment task which has been commonly used in the L2 literature on islands (e.g. Johnson and Newport, 1991; Li, 1998; Martohardjono, 1993; Schachter, 1990; White and Juffs, 1998) as well as in recent work from Aldosari et al. (2022) (see also Pham et al., 2020). We included this task, which was adapted from Aldosari et al. (2022), in order to have a more comprehensive picture with respect to whether the learners were sensitive to islands both offline and online. Four island types were tested: whether islands, complex noun phrase islands, subject islands, and adjunct islands. The stimuli crossed the factors Island (island, non-island) and Wh-Dependency Length (matrix, embedded) in a 2×2 factorial design, as shown in (7), an example set containing an adjunct island. Sentence type (7d), which contains a long-distance dependency involving movement from within an island, constitutes an island violation.
(7) Background sentence:
The helpful worker thinks that the boss left her keys in the car.
a. Non-island/Matrix Which worker ___ thinks that the boss left her keys in the car? b. Non-island/Embedded Which keys does the worker think that the boss left ___ in the car? c. Island/Matrix Which worker ___ worries [if the boss leaves her keys in the car]? d. Island/Embedded * Which keys does the worker worry [if the boss leaves ___ in the car]?
In total, 16 sets of sentences were constructed for each of the four conditions; the sets were distributed across four lists in a Latin-Square design. Each participant read 64 target sentences, divided into four blocks and randomized. Unlike Aldosari et al. (2022), no filler sentences were included in order to make the length of the single testing session manageable for participants. The context sentence was presented first, and on a subsequent screen, the target sentence was presented along with a seven-point rating scale ranging from ‘totally unnatural’ to ‘perfectly natural.’
2 Attentional control task
Following Johnson (2015), we measured attentional control via a number Stroop task (Bush et al., 2006). We created two versions of this task, in English and Mandarin Chinese, and participants took the task in their L1. Participants were instructed to count the total number of words presented on the screen, and to press the corresponding number on a button box, which ranged from 1 to 4. In congruent trials, participants are presented a screen with 2 to 4 monosyllabic, common animal words (e.g. cat cat). Participants were instructed to enter the number of words they saw on the screen (e.g. 2). In incongruent trials, participants are presented with 2 to 4 monosyllabic number words, and crucially the number words do not match the quantity of words on the screen (e.g. four four four). Participants were asked to enter the quantity of words on the screen, inhibiting the semantic meaning of the words (e.g. enter 3). The speed and accuracy of participants’ responses were measured for each trial. 3
V Procedure
All participants provided informed consent to participate, and completed a background questionnaire and handedness inventory (Oldfield, 1971) prior to the experiment. The number Stroop task was administered first, followed by the EEG experiment. Participants took the offline acceptability judgment task last. Participants received $10 per hour for participating in one testing session which lasted approximately three hours. All tasks were administered using Paradigm presentation software (Tagliaferri, 2005).
VI Data analysis
1 EEG analysis
The electroencephalogram was continuously recorded using an elastic electrode cap (Electro-Cap International, Inc.) containing 32 Ag/AgCl scalp electrodes arranged in a modified 10–20 layout (midline: FPZ, FZ, FCZ, CZ, CPZ, PZ, OZ; lateral: FP1/2, F3/4, F7/8, FT7/8, FC3/4, T3/4, C3/4, TP7/8, CP3/4, T5/6, P3/4, O1/2; AFZ was used as the ground). Three bipolar montage electrode pairs were placed on the outer canthi and above and below each eye to monitor blinks and horizontal eye movements, and an electrode was placed on each mastoid. Impedances for all scalp electrodes were kept below 5 kΩ. Data were sampled at a rate of 1 kHz and referenced online to the left mastoid. Recordings were filtered with a bandpass of 0.1–200 Hz, and amplified with Neuroscan SynAmps 2 (Compumedics Neuroscan, Inc.). The remaining processing steps were completed using MATLAB (Mathworks, Inc.) and the EEGLAB toolbox (Delorme and Makeig, 2004).
EEG was re-referenced offline to the average of both mastoids. Trials containing blinks, horizontal eye movements or muscle artifacts were manually rejected, resulting in the exclusion of 18.4% of target trials (L1: 16.1%, L2: 20.1%). The data were then epoched using a –300 ms to 1,200 ms interval, time-locked to the presentation of the critical word, baseline corrected using a –300 to 0 ms pre-stimulus interval, filtered with a 30 Hz low-pass filter, and averaged. Electrodes which did not successfully record electricity at the scalp were replaced with interpolated values prior to averaging. Electrodes were coded for hemisphere (left, midline, right) and anteriority (frontal, central, posterior), creating the following groups: Left Anterior (FP1, F7, F3, FT7, and FC3), Right Anterior (FP2, F8, F4, FT8, and FC4), Left Posterior (TP7, CP3, T5, P3, and O1), Right Posterior (TP6, CP4, T6, P4, and O2), Midline Anterior (FPZ, FZ, and FCZ), and Midline Posterior (CPZ, PZ, and OZ). EEG data was analysed using linear mixed-effects models with the lme4 package (Bates et al., 2014) in R, with p-values calculated using the lmerTest package (Kuznetsova et al., 2016). ERPs were time-locked to the onset of the first word of the phrase (e.g. Dave for Dave Campbell). Two time-windows were selected for analysis to examine N400 effects (300–500 ms) and P600 effects (500–900 ms).
The dependent variable in each model was the mean amplitude at a given electrode, for a given condition. Model fitting began by including the following fixed factors and all possible interactions: Extraction (no extraction, wh-extraction), Island (non-island, island), Hemisphere (left, midline, right), and Anteriority (anterior, central, posterior). The model included Subject as random intercepts. The model was progressively backwards-fit, such that interactions and fixed effects which did not explain a significant portion of the variance were removed. The baseline condition was the no extraction, island condition ((c) in Table 2) in midline and central region. The comparisons are in relation to this condition. For Hemisphere and Anteriority which have three levels each, analyses tested differences between the baseline level and the two other levels. For each analysis, the best model was identified using the R package LMERConvenienceFunctions (Tremblay and Ransijn, 2015), which utilizes a series of iterative log-likelihood ratio tests to arrive at the simplest model that best fit the data. The final LME model always included a by-participant random intercept which allows the individual means to vary while having a common slope for explanatory effects. For the final models for each time-window, only significant main effects and interactions involving the factors Extraction and Island are reported in text. We consider p < .05 to be significant. 4
The second step of the analysis investigated the potential moderating influence of attentional control scores. In a subsequent LME model, the composite variable Attentional Control was included as a fixed effect and as an interaction term with Extraction, Island, and any electrode factors (Anteriority, Hemisphere) which were significant in the best-fitting overall model. Each model was fitted following the guidelines outlined above. To create the Attentional Control variable, scores on the number Stroop task from two measurements were utilized: interference effects for reaction times and accuracy, which were marginally correlated (L1: r = −0.28, p = .06; L2: r = −0.30, p = .07). On the Attentional Control variable, higher values reflect better performance on task.
2 Acceptability judgment analysis
Data from the offline acceptability judgment task were analysed with linear mixed-effects models; the dependent variable was a participant’s acceptability judgment rating per item, which was z-score transformed prior to analysis. For each island type, the full model included the fixed effects Island Structure (non-island, island), Dependency Length (matrix, embedded), and the interaction term Island Structure × Dependency Length, random intercepts for participant and item, as well as by-participant and by-item random slopes for each factor and the interaction term. The full model was simplified stepwise.
VII Results
Mean accuracy for the comprehension questions in the EEG experiment was 75.2% for the learners and 76.7% for natives. 5 To investigate if comprehension accuracy in the target conditions was influenced by group or experimental conditions, a generalized linear model with a binomial distribution was fit, with Accuracy (0, 1) as the dependent variable, and Extraction (no extraction, wh-extraction), Island (non-island, island), and Group (L1, L2) included as fixed effects. This model revealed no significant effects, showing that accuracy was similar across conditions.
For the ERP analysis, our initial statistical approach included natives and L2ers in one model, testing for an effect of Group. In the overall models for the N400 and P600 time windows, the three-way interaction between Extraction × Island × Group was significant (300–500 ms time window: β = 0.956, t(9810) = 3.353, p < .001; 500–900 ms time window: β = 0.334, t(9820) = 1.997, p < .05). These interactions provided statistical motivation to analyse the native and L2 groups separately.
1 Native ERP results
a 300–500 ms time-window
For the best-fitting overall model for the 300–500 ms time window, the critical interaction Extraction × Island was significant (β = −0.629, t(5395) = −3.273, p < .01). To interpret this interaction, follow-up analyses were carried out by running separate models for the two conditions of the factor Island (non-island, island). As in initial model-fitting, follow-up models were first fit with every possible fixed effect and interaction and then progressively backwards-fit to arrive at the simplest model that best fit the data.
Results for the best-fitting non-island model revealed a significant main effect of Extraction (β = −0.472, t(2675) = −5.676, p < .001), indicating that the amplitude for the target wh-extraction condition was more negative compared to the baseline no extraction condition. The final model did not reveal an interaction between Extraction and the electrode factors (Anteriority, Hemisphere), indicating the negativity was broadly distributed. Figure 1 shows the N400 emerging at the object filled-gap position.

Native speakers’ object filled-gap effect for the non-island conditions at representative electrodes.
During model-fitting for the island condition for the 300–500 ms time window, the term Extraction was removed, indicating that this variable did not explain a significant amount of variance in the final model. Thus, no significant difference emerged between the wh-extraction and no extraction conditions inside the island (i.e. no N400). Figure 2 shows the comparison between the two conditions for the island condition.

Native speakers’ object filled-gap effect for the island conditions at representative electrodes.
b 500–900 ms time window
Results from the overall best-fitting model for the second time window revealed a significant main effect of Extraction (β = −0.111, t (5404) = −2.182, p < .05), such that amplitude for wh-extraction condition was more negative as compared to the baseline no extraction condition. Crucially, the interaction of Extraction × Island was removed during model-fitting, indicating that there was no significant difference in the negativity for island and non-island conditions. In sum, a late negativity which was significant across the scalp emerged for both island and non-island conditions at the filled object position.
c Individual differences
We next explored whether the ERP responses at the object filled-gap sites, both within and outside islands, were significantly modulated by attentional control. In the best-fitting model for the N400 time window (300–500 ms) including the attentional control composite scores, the interaction term Extraction × Island × Attentional Control was significant (β = −0.3278, t(5406) = −4.233, p < .001). To follow-up on this interaction, separate models were run for each level of the factor Island. In the best-fitting non-island model, the critical interaction Extraction × Attentional Control was significant (β = −0.1976, t(2682) = −3.390, p < .001), revealing that higher Stroop scores were associated with a more negative amplitude in the wh-extraction condition as compared to the baseline. In other words, better attentional control resources were associated with larger N400 amplitude (i.e. larger filled-gap effect) in the licit, non-island condition. In the best-fitting model for the island condition, the critical interaction Extraction × Attentional Control was significant (β = 0.1303, t(2682) = 2.733, p < .01), and indicated that increasing attentional control was associated with a less negative ERP effect. Thus, individuals with better attentional control resources showed less N400-like effects within the island.
For the 500–900 ms time window, the best-fitting model including attentional control revealed a three-way interaction, Extraction × Island × Attentional Control (β = −0.4024, t(5406) = −6.222, p < .001). Separate models were then run for each level of the factor Island. In the non-island model, the interaction Extraction × Attentional Control (β = −0.3320, t(2682) = −7.352, p < .001) indicated that with increasing attentional control, the amplitude for the wh-extraction condition became more negative as compared to the no extraction condition. This interaction indicates that higher attentional control was associated with a larger late negativity in the non-island condition. In the best-fitting model for the island condition, all fixed effects involving Extraction were removed during model fitting. Thus, attentional control did not modulate processing inside the island in the 500–900 ms time window.
To summarize the results for the native English speakers, N400 emerged for the non-island condition only, suggesting that the parser posited a gap only in the grammatically licensed position. Increasing attentional control was associated with larger N400s in the non-island condition. There was no evidence of gap prediction for the island condition in the 300–500 ms time window, and increased attentional control was associated with less N400-like effects inside the island. In contrast to previous studies examining filled-gap effects in ungrammatical sentences, we did not observe P600 for filled gaps in the 500–900 ms time window. Instead, we observed an unexpected late negativity in this time window, which was present in both island and non-island conditions, but only modulated by attentional control in the non-island condition.
2 L2 ERP analysis
a 300–500 ms time-window
The best-fitting overall model for the 300–500 ms time window revealed a significant interaction between Extraction × Island (β = 0.2935, t(4421) = 2.458, p < .05). Follow-up analyses were carried out by running separate models for the two conditions of Island. The best-fitting non-island model showed a significant main effect of Extraction (β = 0.3324, t(2191) = 4.029, p < .001), such that the ERP amplitude in the wh-extraction condition was more positive compared to the no extraction condition. This positivity was broadly distributed across all electrodes, with the model showing no interactions with the factors Anteriority or Hemisphere. In contrast, during model-fitting for the island condition, the term Extraction was removed, indicating that there was no significant difference between the amplitude of the wh-extraction and no extraction conditions inside the island.
b 500–900 ms time window
In the overall best-fitting model for the second time window, the significant interaction term Extraction × Island was significant (β = 0.3767, t(4419) = 2.757, p < .01); separate models for each level of Island were then run. In the non-island condition, a positivity emerged for wh-extraction sentences (β = 0.3867, t(2191) = 4.265, p < .001). The final non-island model revealed no interactions between Extraction and the electrode factors, indicating a broadly-distributed positivity (Figure 3). However, the fixed effect Extraction was removed during model-fitting of the Island model, indicating there was no significant effect inside the island in the later time window (shown in Figure 4).

L2 learners’ object filled-gap effect for the non-island conditions at representative electrodes.

L2 Learners’ object filled-gap effect for the island conditions at representative electrodes.
c Individual differences
Next, attentional control composite scores were added as a factor to the LME models. In the 300–500 ms time window, the Extraction × Island × Attentional Control interaction was significant (β = 0.1886, t(4422) = 2.249, p < .05), and thus we examined separate models for each level of the factor Island. For the non-island model, the interaction Extraction × Attentional Control (β = 0.2038, t(2194) = 3.480, p < .001) indicated that increasing attentional control scores were associated with increasingly positive amplitude for the wh-extraction condition. That is, the amplitude of the positivity in the non-island condition was larger for learners with greater attentional control. In the best-fitting model for the island condition, the fixed effects Extraction, Attentional Control, and Extraction × Attentional Control were removed during model-fitting, indicating that attentional control did not modulate ERP effects within the island.
In the best-fitting model for the 500–900 ms time window, the fixed effect Attentional Control and all interactions involving Attentional Control were removed during model-fitting. Thus, attentional control scores did not modulate processing in the later time window for L2 learners.
To summarize the results for the Mandarin Chinese learners of English, a positivity emerged for the non-island condition which was significant from 300–900 ms. We refer to this extended positivity as P600, as P600 effects with similar onset latency have been widely reported for native speakers as well as learners (e.g. Gouvea et al., 2010; Kaan et al., 2016, 2000; Phillips et al., 2005). Individual differences in attentional control were shown to modulate the positivity in the non-island condition from 300–500 ms, such that learners with greater attentional control scores yielded larger positivities. Learners’ processing was crucially island-sensitive, with no significant effects emerging inside the relative clause island in either time window; furthermore, attentional control did not modulate processing within the island.
3 Acceptability judgment task results
Mean acceptability ratings for each condition and each island type are provided in Table 3. Ratings ranged from 1–7, with higher scores reflecting greater acceptability. For native speakers, the interaction between wh-dependency length and island structure was significant for each island type: whether (est = −0.79, SE = 0.12, t = −6.36, p < .001), complex NP (est = −1.16, SE = 0.14, t = −8.11, p < .001), subject (est = −1.96, SE = 0.13, t = −14.73, p < .001), and adjunct (est = −1.58, SE = 0.12, t = −13.31, p < .001). This interaction resulted from low acceptability ratings of the ungrammatical island violation condition compared to the other three grammatical conditions for each island type, indicating that native English speakers were sensitive to island effects in all four island types. Interaction plots for each island type are shown in Figure 5.
Means and standard deviations of raw acceptability ratings for each condition.
Note. Ratings ranged from 1 to 7 with higher number indicating more acceptability.

Native interaction plots for each island type on the acceptability judgment task.
In the models for the L2 learners, the critical interaction term Dependency Length × Island Structure was significant for all island types tested: whether (est = −0.48, SE = 0.19, t = −2.51, p < .05), complex NP (est = −1.15, SE = 0.14, t = −8.19, p < .001), subject (est = −1.64, SE = 0.19, t = −8.42, p < .001), and adjunct (est = −1.07, SE = 0.16, t = −6.80, p < .001). Interaction plots for each island type are shown in Figure 6. Similar to natives, this interaction revealed that learners rated the ungrammatical island violation sentences significantly lower than the other three conditions. Thus, Mandarin-Chinese speaking learners demonstrated island sensitivity to all four island types.

L2 interaction plots for each island type on the acceptability judgment task.
To compare the size of island effects observed in native speakers and L2 learners, we conducted additional analyses where group (L1, L2) was included as a factor. In the models for adjunct and subject islands, a significant three-way interaction between group, wh-dependency length, and island structure emerged, indicating that the ‘island effect’ for natives was larger than for L2 learners for adjunct (est = 0.51, SE = 0.18, t = 2.824, p < .01) and subject islands (est = 0.32, SE = 0.15, t = 2.151, p < .05). A three-way interaction with Group was not significant in the complex NP (est = 0.003, SE = 0.20, t = 0.013, p = .99) or whether island models (est = 0.31, SE = 0.16, t = 1.925, p = .06). In sum, this follow-up analysis revealed that native speakers showed more robust island effects for adjunct and subject islands, islands generally categorized as ‘strong’ (e.g. Huang, 1982), whereas acceptability judgments were similar across groups for complex NP and whether islands, a pattern which was also observed in Aldosari et al. (2022). Despite differences in the robustness of the effects for the strong islands, the patterns for the L2 learners and native speakers are qualitatively similar across island types.
VIII Discussion
Our results suggest that L2 processing is grammatically-guided, as no significant effects emerged inside the relative clause island. Learners also demonstrated island sensitivity offline. However, the learners’ processing of wh-dependencies was revealed to be different from native speakers in the licit non-island condition, with distinct brain responses emerging for natives (N400) and learners (P600). Our analyses further explored whether processing was related to individual differences in attentional control, which has been shown to capture variability in the processing of wh-dependencies and language processing more broadly (e.g. Akhavan et al., 2020; Boudewyn et al., 2012, 2013; Johnson, 2015; Zirnstein et al., 2018); attentional control indeed modulated processing for both groups, albeit in different ways.
1 Native processing
a The N400
Previous studies reporting prediction-related modulation of the N400 examined lexical prediction, or specific predictions for features of particular words (e.g. Federmeier, 2007; Lau et al., 2013; Van Berkum et al., 2005). Our results suggest that the N400 also reflects gap prediction, as indicated by the N400 effect yielded by lexical material appearing where a gap has been predicted (in the current study), as well as encountering an actual gap where it was not predicted (Michel, 2014). The N400 additionally sheds light on the nature of filled-gap effects, since some researchers have hypothesized that these effects reflect prediction, while others have interpreted filled-gap effects observed in reading-time studies as reflecting a costly reanalysis process (Lee, 2004). Given that N400s emerged in the licit filled-gap position rather than P600s, our findings do not straightforwardly support accounts attributing filled-gap effects solely to structural reanalysis; instead, findings suggest that filled-gap effects may reflect, at least in part, encountering unpredicted lexical material. The fact that N400 emerged for native speakers is more in line with proposals that argue for a predictive account of wh-dependency resolution (e.g. Clifton and Frazier, 1989; Omaki et al., 2015), although further studies are needed to confirm our characterization of the processing of filled gaps as predictive.
Our results are in line with psycholinguistic evidence that native English speakers avoid predicting gaps inside islands, where wh-extraction is not licensed by the grammar (e.g. Johnson et al., 2016; Stowe, 1986; see also Phillips, 2006; Traxler and Pickering, 1996). In the critical N400 time window, the results indicated that natives did not attempt to posit a gap within the island, and we note that this evidence for island sensitivity holds regardless of whether filled gap effects are characterized as reflecting predictive processing. Although previous studies have shown sensitivity at island boundaries such as at a complementizer (e.g. Kluender and Kutas, 1993; Michel, 2014) and at actual gap positions within islands in ungrammatical sentences (Michel, 2014), this study provided some of the first electrophysiological evidence of island sensitivity at a filled-gap position located within a fully grammatical sentence.
We also observed that native processing during the N400 time window was modulated by attentional control. In the non-island condition, where gap prediction is grammatically licensed, greater attentional control was associated with increased N400 amplitude. In line with Johnson (2015), this suggests that attentional control resources play an important role in gap prediction during wh-dependency resolution. This finding is also in line with research showing that attentional control modulates prediction in language processing (e.g. Boudewyn et al., 2012; Hutchison, 2007), as well as expectation-driven processing in general (e.g. Bar, 2009; Kane and Engle, 2002). Hutchison (2007), for example, found that individuals with increased attentional control (assessed via operation span, anti-saccade, and a Stroop task) were more sensitive to a context cue, yielding larger prediction effects in a word-pair semantic priming study. In the processing of wh-dependencies, attentional control resources may modulate an individual’s ability to utilize a cue (i.e. the wh-filler) that a potential gap site is forthcoming (e.g. Johnson, 2015).
We also observed that individual differences in attentional control modulated processing inside the island, such that increased attentional control was associated with less negative ERP effects (or, less N400-like effects). This suggests that native speakers with greater attentional resources may have successfully inhibited gap prediction within the island. Attentional control resources may be involved in recognizing the presence of an island domain (i.e. the relative clause marker that) to constrain processing, therefore avoiding ungrammatical gap prediction. Individuals with higher attentional control resources may also be better able to suppress information related to the verb inside the island (e.g. ‘interviewed’ in that interviewed Dave Campbell) given the syntactic context. This possibility is in line with the description of attentional control as an ability involved in maintaining attention on the task at hand despite distracting information in the bottom-up input (e.g. Kane and Engle, 2003; see also Hutchison, 2007). Note that this pattern is the opposite of what might be expected under processing-based accounts of island sensitivity. As Sprouse et al. (2012) argue, the processing account of islands should expect that increased cognitive abilities would be associated with increased gap-filling inside islands. If higher attentional control simply allowed participants to overcome the processing burden and posit gaps inside island structures, higher attentional control should have led to greater N400s within the island, the opposite of what was observed. Our results are more in line with a grammatical view of island effects, which emerge due to syntactic constraints rather than a lack of processing resources (e.g. Phillips, 2006, 2013; Sprouse et al., 2012; Wagers and Phillips, 2009).
b Late negativity
Native speakers yielded a negativity in the 500–900 ms time window, which was significant in both the non-island and island conditions. This late negativity was unexpected. One possible interpretation is that this component serves as an index of thematic role assignment. The filled gap position directly follows a subcategorizing verb (e.g. interviewed), which licenses thematic arguments; several studies have reported an increased negativity in sentences with argument-induced conflicts (e.g. Frenzel et al., 2011; Frisch and Schlesewsky, 2001, 2005). For example, an enhanced negativity has been shown to emerge when encountering two animate noun phrases which can grammatically serve as the same thematic argument (e.g. Frisch and Schlesewsky, 2001). In the current study, the filled-gap position may indeed involve thematic role assignment difficulty due to the presence of the subcategorizing verb. Specifically, the lexical NP in object position (Dave Campbell) and the displaced wh-phrase (who) are both possible arguments for the verb; it is possible that increased negativity for the wh-extraction sentences reflects the availability of two potential arguments for one argument position. Although increased negativities for thematic argument conflicts have been reported in earlier time windows (e.g. 400–550 ms), these studies utilized a different linguistic manipulation, such as contexts with ‘double case’ ungrammaticality (e.g. Frisch and Schlesewsky, 2005). Using a filled-gap paradigm, Hestvik et al. (2012) reported a negativity at a filled-gap site with a similar latency (500–800 ms), suggested to be linked to processing argument structure operations.
As the object position following interviewed is not a grammatically-licit extraction site in the island condition, it should not be the case that a gap is predicted in this position. The fact that a late negativity is also seen in this condition may suggest that having more than one argument in memory which are compatible with the argument structure of the verb may be sufficient to engender some amount of processing difficulty related to argument structure. It is noteworthy that attentional control scores modulated processing in this time window only for the non-island condition, such that natives with better Stroop scores showed larger late negativities. If, as we have argued, natives with increased attentional control resources are better able to engage in anticipatory processing, predictively positing a gap in the direct object position following interviewed may have led to increased conflict between the lexical NP Dave Campbell and the wh-phrase, leading to an increased late negativity. Importantly, attentional control did not modulate this component in the island condition, such that it was not the case that individuals with greater processing resources yielded a larger late negativity, which would be in line with a processing-based account of island sensitivity (e.g. Hofmeister and Sag, 2010). Instead, attentional control was shown to increase the late negativity in the non-island condition only, indicating that greater attentional control resources are associated with a greater ability to link thematic arguments only in grammatically licensed positions.
2 L2 processing
a Island sensitivity
There was no evidence that L2 learners attempted to posit a gap or, as might be expected according to Clahsen and Felser’s (2006) account, link thematic arguments within the relative clause island. In addition, attentional control scores did not modulate processing inside the island. Thus, it was not the case that learners with increased cognitive capabilities attempted to establish a wh-dependency within the island, as might be suggested by processing accounts of islands, as reviewed above (e.g. Hofmeister and Sag, 2010). Our ERP results are in line with psycholinguistic studies which report online island sensitivity for L2 learners, even those whose L1 does not instantiate overt wh-movement (e.g. Aldwayan et al., 2010; Cunnings et al., 2010; Felser et al., 2012; Johnson et al., 2016; Omaki and Schulz, 2011), and is in contrast to Kim et al.’s (2015) proposal, which expects that only learners whose L1 instantiates overt wh-movement would show sensitivity. Taken together, the results from the acceptability judgment task, the ERP evidence, and the lack of a role for individual differences in cognitive abilities modulating island sensitivity on either task is in line with proposals arguing that the L2 grammar and L2 learners’ ability to utilize syntactic constraints online is qualitatively similar to that of native speakers (e.g. Aldosari et al., 2022; Aldwayan et al., 2010; Martohardjono, 1993; Schwartz and Sprouse, 1994).
These results are not in line with the original Shallow Structure Hypothesis (Clahsen and Felser, 2006, 2018), which expects that learners underuse abstract syntactic information and instead rely on semantic/pragmatic information during processing or use syntactic information on a different time-course (Boxell and Felser, 2017; Felser et al., 2012). Note that Felser (2019) makes a more fine-grained argument regarding the type of cues that might potentially facilitate parsing, such that learners may be better able to utilize what Felser (2019: 17) refers to as a more ‘obvious’ cue, such as a relative pronoun at the onset of a relative clause, which may more saliently indicate the boundary of a relative clause island. In our stimuli, the onset of the island is overtly marked by the word that, making it a relatively reliable and salient cue which may have aided learners during processing. Future studies which systematically manipulate the salience of the cues to island boundaries will shed light on the extent to which the robustness of cues modulates island sensitivity in L2 learners.
b Prediction
Whereas native speakers yielded an N400 at the filled gap site, there was no evidence to suggest that the learners similarly engaged in gap prediction. Learners’ lack of N400 effects are broadly in line with Grüter et al.’s (2017) RAGE Hypothesis, which proposes that L2 predictive processing is limited in certain contexts. Interpreting L2 processing at the licit, non-island filled gap is complicated by the fact that an unexpected P600 component emerged, which has multiple interpretations. Before interpreting the component, we note that the positivity for the L2 group is interpreted as a singular P600 component, despite its appearance in both our early and late time windows, based on several factors. There is ample evidence that the onset of the P600 can begin as early as 300 ms, the onset of our first time window (e.g. Gouvea et al., 2010; Kaan et al., 2000; Phillips et al., 2005). Moreover, modeling of the early and late time windows suggests that the positivity patterned the same across the time windows, appearing only in the non-island conditions, consistent with our interpretation of the response as a single response despite spanning the two time windows. As we discuss below, the individual difference analyses revealed that attentional control (AC) modulated processing similarly in both time windows: increasing scores on the Stroop task were associated with larger positivities. Together, this suggests that there is a singular source for L2 positivity linked to processing a licit filled gap.
A large body of ERP research has shown that the P600 is yielded in contexts in which syntactic integration is difficult, such as at an actual gap site in sentences with long-distance filler-gap dependencies (e.g. Felser et al., 2003; Gouvea et al., 2010; Kaan et al., 2000; Phillips et al., 2005), or unsuccessful, such as phrase structure violations and garden path sentences (e.g. Friederici et al., 1996; Frisch et al., 2002; Osterhout and Holcomb, 1992; for a review, see Molinaro et al., 2011). One possibility, then, is that learners’ P600 reflects the fact that they have processed the filled gap as a temporary syntactic anomaly. Another related possibility is that the P600 is linked to syntactic reanalysis processes, linked to increased difficulty in integrating the NP as the object of the verb. One observation we wish to highlight is that while learners’ brain responses are ultimately non-native-like, the P600 emerging for the L2ers suggests that some type of syntactic processing or integration difficulty related to the filled gap has occurred, and crucially this only occurred in the licit context. Other studies have reported P600s at a filled-gap position for L2 learners, as well as native speakers (e.g. Dong, 2014; Jessen et al., 2017; Schremm, 2012). In general, P600s at filled gaps have been argued to reflect syntactic integration difficulty, although it is difficult to draw direct comparisons to our study, given that the filled-gap position in filled-gap studies thus far renders the sentence ungrammatical, as there was not an actual gap located further downstream in the sentence. In our study, learners were not required to integrate and interpret two noun phrases at the filled gap position because this position was ultimately always followed by an actual gap site. Further investigation is needed to directly test whether and how processing a filled potential gap site is affected by a learners’ L1 and the overall grammaticality of the sentence.
Finally, we found that individual variability in both natives and learners was at least in part explained by the same cognitive resource: attentional control. The individual differences analyses revealed that attentional control modulated processing in the licit, non-island condition largely in the same way as for native speakers. Specifically, we found that increased attentional control was associated with bigger filled-gap effects for both groups, increasing the amplitude of the N400 and P600 for natives and learners respectively. Despite the qualitative differences in the brain responses for the two groups, the pattern of increased cognitive resources associated with larger filled-gap effects has been observed in psycholinguistic studies, operationalized as increased reading time slowdowns (e.g. Johnson, 2015). Regarding the modulation of the P600 for our learners, it may be the case that increased attentional control resources may be associated with greater syntactic integration efforts. That is, learners with better Stroop scores may have attempted to integrate the wh-word into the structure, or engaged in syntactic reanalysis processes to a greater degree, resulting in larger P600s at the licit filled-gap site. However, given the multiple interpretations underlying the P600 yielded at the filled gap, future studies are needed to further clarify the role of attentional control in the processing of wh-dependencies by L2 learners.
IX Conclusions
This study investigated the processing of wh-dependencies by native speakers and Mandarin Chinese-speaking learners of English. Overall, L2 learners demonstrated island sensitivity both online and offline. However, while native speakers yielded N400 in the non-island condition, suggesting that they engaged in gap prediction, learners showed a P600 response in this condition, suggesting that learners did not engage in prediction but rather experienced difficulty with syntactic integration. This study additionally explored whether processing was modulated by individual differences in attentional control, and results indicate that increased attentional control is related to prediction abilities for native speakers and may be related to integration effort in learners. Ultimately, the qualitatively different ERP responses for learners suggest that predictive processing in the resolution of wh-dependencies may be limited, at least for learners whose L1 does not instantiate overt wh-movement.
Footnotes
Acknowledgements
We wish to acknowledge Catherine Pham, Delaney Wilson, Haley Schippers, Ran Lu, and Justin Nguyen for their help in conducting this research. Thanks to Saad Aldosari for sharing his materials for the offline acceptability judgment task. We are also grateful to the three reviewers who provided feedback that helped us strengthen this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation BCS Doctoral Dissertation Improvement Grant [#1728019]; the William Orr Dingwall Foundation [2017 Dissertation Fellowship in the Cognitive, Clinical, and Neural Foundations of Language]; and the Language Learning Dissertation Grant.
