Abstract
We investigate sensitivity to island constraints in English native speakers and Najdi Arabic learners of English, examining (1) whether second language (L2) learners whose native language (L1) does not instantiate overt wh-movement are sensitive to island constraints and (2) the source of island effects. Under a grammatical account of islands, these effects arise due to violations of syntactic constraints. Under the resource-limitation account, island effects arise due to processing difficulty. The source of island effects is interesting to examine in L2 learners because it is possible that reduced processing abilities in the L2 may lead to the low acceptance of sentences with island violations simply due to the complexity of the sentences themselves as opposed to an adherence to grammatical constraints. To tease apart these accounts, we followed Sprouse et al. in focusing on individual differences in working memory (WM). We used an acceptability judgment task (AJT) to quantify island sensitivity and an automated operation span task to measure WM. Building on Sprouse et al., the AJT tested four island types, but we made several modifications to the task design to address concerns raised by Hofmeister et al.: the stimuli included a ‘context’ sentence to improve the naturalness of the complex wh-sentences. The stimuli also included complex wh-fillers (e.g. which worker) as opposed to bare fillers (who), as semantically rich wh-phrases have been found to be easier to process. Our results showed that learners, like natives, exhibited island sensitivity, and there was no evidence that individual differences in WM modulated island sensitivity for either natives or learners. Our results are compatible with the grammatical view of island effects and suggest that wh-dependencies in both L1 and L2 grammars are similarly constrained by syntax.
I Introduction
In English, it has been observed that wh-phrases cannot move out of certain syntactic constituents, called islands (Ross, 1967). These islands include, but are not limited to, adjunct clauses (1a), relative clauses (1b), complex NPs (1c), and embedded questions, which are referred to as wh-islands (1d).
(1) a. * b. * c. * d. *
Since the work by Ross (1967), several syntactic theories have been proposed to account for island constraints on wh-movement, such as the Subjacency Condition (Chomsky, 1973), the Condition on Extraction Domains (Huang, 1982), the Barriers System (Chomsky, 1986), the Minimal Link Condition (Chomsky, 1995) and the Phase Impenetrability Condition (Chomsky, 2000, 2001). What holds across all of these theories is that island constraints are argued to be innate and part of a native speaker’s mental grammar.
Several studies investigating acquisition of island constraints have shown that both natives and second language (L2) learners give low acceptability ratings to sentences with island violations, suggesting island sensitivity in both populations. However, the issue is much more debated for learners, with some proposals arguing that sensitivity is only possible for L2 learners whose native language (L1) also instantiates overt wh-movement (e.g. Hawkins and Chan, 1997; Tsimpli and Dimitrakopoulou, 2007) and others arguing that sensitivity is ultimately possible for all learners regardless of L1 (e.g. Martohardjono, 1993; Schwartz and Sprouse, 1996). There is a current debate as to the source of island effects (e.g. Kluender and Kutas, 1993; Hofmeister and Sag, 2010; Sprouse et al., 2012a). In contrast to the grammatical theories of islands referenced above, which attribute island effects to syntactic constraints on wh-extraction, resource-limitation accounts of islands, first proposed by Kluender and Kutas (1993) and expanded in Hofmeister and Sag (2010), attribute the low acceptability of sentences with island violations to processing difficulty due to resource limitations. That is, islands are rejected because they are complex and require additional processing resources, beyond the capacity of most native speakers (Hofmeister and Sag, 2010; Hofmeister et al., 2012a, 2012b, 2013; Kluender, 2004; Kluender and Kutas, 1993).
Sprouse et al. (2012a) propose that one approach to teasing apart these theories is to examine the relationship between sensitivity to islands and an individual’s processing resources, which play a crucial role under the resource-limitation accounts. Sprouse et al. (2012a) argue that these accounts would predict greater acceptability of island violations for individuals with enhanced cognitive resources, as those individuals would have a greater ability to process complex sentences in general. In contrast, grammatical accounts do not predict such a relationship, as island violations should be ruled out across the board regardless of cognitive abilities. Building on Sprouse et al. (2012a), which tested only English native speakers, the present study also examines L2 learners, a population that is particularly interesting to examine from this perspective.
The extension of this issue to L2 learners is well-motivated by active debates in the literature as to whether L2 learners and native speakers process sentences in a qualitatively similar way. For example, it has been suggested by Clahsen and Felser (2006, 2018) that L2 learners demonstrate a reduced sensitivity to abstract grammatical information, such as the constraints on wh-extraction (e.g. Boxell and Felser, 2017). L2 learners have also been shown to have reduced processing abilities in the L2 (see McDonald, 2006) and thus, it is possible that the low acceptance of sentences with island violations in L2 learners is simply due to the complexity of the sentences themselves. It is possible that even in studies where L2 learners and native speakers show similar sensitivity to islands, that the source of the island effects is different. If that is the case, then following Sprouse et al.’s (2012a) logic, those learners with enhanced processing abilities should be better able to comprehend the complex sentences and thus potentially be more willing to rate sentences with island violations higher.
The primary goal of the present study is to examine whether island sensitivity is similar in nature in L2 learners and native speakers. Our first step is to examine whether L2 learners show sensitivity to island constraints, testing Najdi Arabic native speakers who have learned English as an L2. The second question, building directly on Sprouse et al. (2012a), is whether island sensitivity is related to individual differences in working memory (WM) in native speakers and L2 learners, allowing us to shed light on whether the source of island effects is similar in the two populations.
In what follows, we first review studies that used acceptability judgment tasks (AJTs) to examine L2 acquisition of island constraints. Next, we review the studies that have tested the predictions of the grammatical accounts and the processing accounts of island effects. Finally, we discuss the details of the present study.
1 Studies of island constraints in L2 acquisition
Many earlier studies examined L2 learners’ acceptance of sentences with island violations to argue for (e.g. Epstein et al., 1996; Martohardjono, 1993; Li, 1998; White and Juffs, 1998) or against (Johnson and Newport, 1991; Schachter, 1990) the claim that L2 learners’ grammars are constrained by Universal Grammar. Hawkins and Chan (1997) argued that native-like acquisition of constraints on wh-movement in L2 learners may be limited to those whose L1 instantiates overt wh-movement (but see Martohardjono, 1993; Li, 1998; White and Juffs, 1998). In a review of this earlier work, Belikova and White (2009) pointed out that although researchers tended to argue for or against island sensitivity, the results of previous studies showed that learners treated specific types of islands differently. For example, Johnson and Newport (1991), who tested Chinese learners of English, found that learners correctly rejected only 60% of subjacency violation sentences. However, as noted by Belikova and White (2009), learners performed more accurately on the relative clause structure than on the complex NP and wh-island structures. Similarly, Schachter (1990) and Li (1998) found that L2 learners performed more accurately on the relative clause and sentential subject structures than on the complex NP and wh-island structures.
This kind of variation across island types was actually predicted by Martohardjono (1993) who adopted Chomsky’s (1986) Barriers framework, which distinguishes among island types in terms of the number of barriers crossed. Extractions out of adjunct clauses, relative clauses, and sentential subjects that involve crossing two barriers are considered strong subjacency violations. Extractions out of wh-islands and complex NPs that involve crossing one barrier are considered weak subjacency violations. As predicted, Martohardjono (1993) showed that Chinese and Indonesian learners of English rejected wh-questions with strong subjacency violations more strongly than wh-questions with weak subjacency violations.
Belikova and White (2009) more recently presented an alternative account of why L2 learners perform well on strong islands and perform poorly on weak ones. Their proposal is based on a revised version of Huang’s (1982) Condition on Extraction Domains (CED), which has been proposed to account for syntactic islands across languages (e.g. Horvath and Siloni, 2003; Müller, 2007). Based on Huang’s (1982) revised CED, extraction out of non-complements is not possible universally. Therefore, extraction out of strong islands (i.e. adjunct clauses, relative clauses, and sentential subjects) is not possible universally because strong islands are non-complements. However, as pointed out by Belikova and White (2009), this entails that the ungrammaticality of extraction from weak islands (i.e. wh-islands and complex NPs) needs to be attributed to different reasons. For example, Belikova and White (2009) noted that the ungrammaticality of extraction from wh-islands can be attributed to parametric variation that depends on how many landing sites are available in the specifier of CP for the extracted wh-phrase.
Based on Huang’s (1982) revised CED, Belikova and White (2009) proposed that L2 learners are expected to do well on wh-extractions from strong islands (i.e. adjunct clauses, relative clauses and sentential subjects) because strong islands are universal constraints on extraction. In contrast to strong islands, L2 learners are expected to perform less accurately on wh-extractions from weak islands (i.e. wh-islands and complex NPs) because weak islands are not universal constraints. If we adopt Belikova and White’s proposal, results of many studies show that L2 learners, like natives, are sensitive to island constraints (e.g. Johnson and Newport, 1991; Li, 1998; Martohardjono, 1993; Schachter, 1990).
More recent studies have examined whether the properties of the L1 modulate sensitivity to island violations (e.g. Kim et al., 2015; Kush and Dahl, 2022) and whether L2 learners can utilize knowledge of grammatical constraints on wh-dependencies online, with many studies arguing that L2 learners are sensitive to island constraints during processing (e.g. Aldwayan et al., 2010; Boxell and Felser, 2017; Covey et al., submitted; Johnson et al., 2016; Omaki and Schulz, 2011; Perpiñán, 2020) but with some proposals arguing that sensitivity may be restricted to learners whose L1 instantiates overt wh-movement (Kim et al., 2015) or that sensitivity may emerge at a delay (Felser et al., 2012).
To our knowledge, Johnson et al. (2016) is one of the only L2 studies to examine the source of island effects. Using a self-paced reading task, Johnson et al.’s (2016) experiment included sentences with a relative clause island such as (2a/b), manipulating whether or not the sentences involved wh-extraction.
(2) a. ISLAND, NO EXTRACTION My father asked if the actress that married b. ISLAND, WH-EXTRACTION My father asked who the actress that married
The experiment compared reading times at the filled object position Tyler, which is embedded within a relative clause island. Evidence of a reading-time slowdown or ‘filled-gap effect’ in (2b) as compared to (2a) would suggest that learners indeed posited a gap within the island, interpreting the wh-filler who as an argument of the verb married. However, the results showed that both natives and Korean learners of English avoided positing a gap within the island. Furthermore, Johnson et al. (2016) examined the relationship between individuals’ filled-gap effect size and their averaged scores on counting span and reading span tasks, which measure WM. Following Sprouse et al.’s (2012a) logic, they predicted that under the processing account of islands (e.g. Hofmeister and Sag, 2010; Hofmeister et al., 2012a, 2012b, 2013; Kluender, 2004; Kluender and Kutas, 1993), there should be a positive relationship between WM and the size of the filled-gap effect within the island. No such relationship is predicted by the grammatical account. In support of the grammatical account, the results of both natives and Korean learners of English showed no relationship between WM and filled-gap effects within relative clause islands. Importantly, Johnson et al. (2016) did observe significant relationships between WM and licit wh-dependencies for both natives and L2 learners (although the precise effects differed for the two populations), suggesting that WM can indeed capture variability in the processing of wh-dependencies, but only at positions in which extraction is licensed by the grammar.
2 Grammatical vs. processing accounts of islands
As we introduced above, under the resource-limitation accounts, island sensitivity is attributable to a number of processing pressures that combine together simultaneously to overload the parser’s limited processing resources (e.g. Hofmeister and Sag, 2010; Hofmeister et al., 2012a, 2012b, 2013; Kluender, 2004; Kluender and Kutas, 1993). Advocates of the processing account argue that island effects can be ameliorated by manipulating one or more non-structural factors to facilitate processing. Hofmeister and Sag (2010: Experiment 2), for example, manipulated the linguistic properties of the extracted wh-filler phrase in a self-paced reading task to show that semantically more complex wh-fillers (e.g. which employee) as opposed to bare wh-fillers (e.g. who) can facilitate processing of wh-island violation sentences and improve their acceptability. Their argument is based on the idea that the more complex the wh-filler, the stronger its mental representation will be in WM. When a wh-filler has a stronger mental representation in WM, its retrieval will be easier at the gap site, i.e. the subcategorizing verb. Participants first read a declarative background sentence, and then read either a wh-island violation sentence with a bare wh-filler (3a), a wh-island violation sentence with a more complex wh-filler (3b), or a baseline sentence with no island violation (3c).
(3) BACKGROUND SENTENCE Albert learned that the managers dismissed the employee with poor sales after the annual performance review. a. BARE CONDITION * b. WHICH CONDITION * c. BASELINE CONDITION Who did Albert learn that they dismissed after the annual performance review?
Results showed faster reading times for the complex wh-filler condition (3b) at the three regions after the annual that immediately follow the embedded verb dismissed, as compared to the bare wh-filler condition (3a). Interestingly, there was no difference between the complex wh-filler condition (3b) and the baseline condition (3c) at these three regions. These results suggest that processing of wh-island violation sentences can be facilitated by manipulating the semantic complexity of the wh-filler phrase.
In a follow-up AJT where the stimuli were presented in embedded questions instead of direct questions, the wh-island violation sentences with complex wh-fillers received higher acceptability judgments than wh-island violation sentences with bare wh-fillers. Based on the parallel results of the acceptability judgments and the reading times, Hofmeister and Sag (2010) argued that island effects are due to processing difficulty because manipulating non-structural factors, such as the complexity of the wh-filler phrase, can facilitate the processing of wh-island violation sentences and improve their acceptability.
Psycholinguistic studies arguing for a grammatical account of islands have taken several different approaches (e.g. Phillips, 2006; Wagers and Phillips, 2009; for a review, see Sprouse and Villata, 2021). The approach that we build on was inspired by Sprouse, Wagers, and Phillips (2012a) who examined the relationship between individual differences in processing resources and sensitivity to island effects in two acceptability experiments. They tested four island types: adjunct islands, subject islands, complex NP islands, and whether islands. The experiments manipulated the wh-dependency length and the presence of an island structure in four conditions using a 2 × 2 factorial design to measure the strength of island effects, as in (4).
(4) ADJUNCT ISLAND a. Who ___ suspects that the boss left her keys in the car? NONISLAND/ MATRIX b. What do you suspect that the boss left ___ in the car? NONISLAND/ EMBEDDED c. Who ___ worries d. * What do you worry
To measure sensitivity to island effects, they used a measure called the differences-in-differences (DD) score (Maxwell and Delaney, 2003). To measure WM capacity, they used a serial-recall task. Sprouse et al. (2012a) argued that the processing account of islands predicts a negative relationship between serial-recall scores and DD scores. That is, as WM scores increase, island sensitivity scores will decrease. The grammatical account of islands, on the other hand, does not predict such a relationship. Although the relationship between recall scores and DD scores was significant for some island types, Sprouse et al. (2012a) argued that this relationship was very weak, as indicated by the small R2 value (between 0.00 and 0.06), which measures the proportion of the variance in DD scores that can be accounted for by recall scores. A similar pattern of results, also supporting the grammatical account, was observed in their second experiment, which used magnitude estimation and included two WM measures, a serial-recall task and an n-back task.
However, in a response paper, Hofmeister et al. (2012a) made several criticisms of the Sprouse et al. (2012a) study. The first criticism was that Sprouse and colleagues misinterpreted their results when they relied on R2 values as a means of hypothesis testing, instead of p-values. The second criticism was that Sprouse and colleagues used complex stimuli that made it difficult to observe a relationship between recall scores and DD scores. More specifically, Hofmeister et al. (2012a) claimed that the critical island violation sentences in the ISLAND/EMBEDDED condition (5) received low acceptability ratings because they were too hard to process, even for individuals with high WM capacity.
(5) * What do you worry
Hofmeister et al. (2012a) argued that because these sentences, which are direct questions, were presented to participants without a context, they sounded pragmatically odd and were difficult to process. They also argued that these sentences had vague wh-fillers (e.g. what) rather than specific wh-fillers (e.g. which-NP), making their processing even more difficult. Hofmeister et al. (2012a) claimed that the extreme processing difficulty did not allow individual differences in WM capacity to emerge in acceptability ratings of island violation sentences. They argued that more variability would potentially emerge if the island violation sentences were less complex. The third criticism was that the serial-recall and n-back tasks chosen by Sprouse and colleagues may not be appropriate measures of WM capacity. A study conducted in our own lab, Pham et al. (2020), addressed the methodological criticisms raised by Hofmeister et al. (2012a, 2012b) and examined island sensitivity in a large-scale study of English native speakers. The results supported the initial argument of Sprouse et al. (2012a), revealing no significant relationship between cognitive abilities and island sensitivity. The present study extends this line of investigation to L2 learners.
II The present study
The present study investigates two research questions. The first research question, in line with previous studies, is whether adult Najdi Arabic learners of English can show sensitivity to syntactic island constraints. The answer to this question contributes to the literature addressing whether adult learners have access to UG constraints, in line with theories like the Full Transfer/Full Access Hypothesis (Schwartz and Sprouse, 1996). However, as pointed out by Hale (1996), if learners indeed show sensitivity to syntactic constraints that have been claimed to be universal, it is extremely difficult, if not impossible, to determine whether that sensitivity is derived from UG via universal constraints or through the language-specific properties of the L1. Recent psycholinguistic research taking an approach similar to the one we take here has shown evidence of island effects in wh-in-situ languages such as Chinese (e.g. Lu et al., 2021) and although the same kind of systematic investigation has yet to be conducted on island effects in Najdi Arabic, which is also a wh-in-situ language, we cannot rule out the possibility that sensitivity to island effects can potentially be derived from the L1. Nevertheless, the core question of our study addresses whether any observed sensitivity to syntactic island constraints in L2 learners is indeed due to grammatical constraints at all (whether derived from UG or the L1), in line with grammatical accounts (e.g. Phillips, 2006; Sprouse et al., 2012a) or whether apparent sensitivity is due to processing difficulty, in line with resource limitation accounts (e.g. Hofmeister and Sag, 2010; Kluender, 2004; Kluender and Kutas, 1993). As we discussed above, we believe this question is particularly interesting to address from the perspective of L2 learners as it has been proposed that L2 learners are less sensitive to abstract grammatical constraints, such as those governing wh-dependencies (Boxell and Felser, 2017; Clahsen and Felser, 2006, 2018), and more susceptible to processing limitations (e.g. McDonald, 2006). If the processing account is on the right track, only those learners with enhanced WM should show island sensitivity. Thus, examining the relationship between WM and island sensitivity will help us to tease apart the source of island effects for both natives and L2 learners.
The present study uses a revised version of the stimuli of Sprouse et al. (2012a) using a task originally developed by Aldosari (2015) that was designed to address the concerns raised by Hofmeister et al. (2012a, 2012b). First, each test sentence (6b) was preceded by a declarative background sentence (6a) to make the processing of the test sentence easier, avoiding the potential pragmatic oddity of presenting questions without a context.
(6) a. BACKGROUND SENTENCE The worker worries if the boss leaves her office keys in the car. b. TEST SENTENCE *
Second, we used a complex wh-filler phrase (e.g. which keys) instead of a bare wh-filler phrase (e.g. what) in the test sentences, as in (6b). It has been argued that complex wh-filler phrases facilitate the processing of wh-dependencies (Goodall, 2015; Hofmeister and Sag, 2010).
To measure WM capacity, the present study uses the automated operation span task. Unlike the serial-recall and n-back tasks that Hofmeister et al. (2012a, 2012b) claimed may not be measures of WM because they are simple span tasks, the automated operation span task is considered a true measure of WM because it is a complex span task (Conway et al., 2005).
III Linguistic facts in Najdi Arabic
In Najdi Arabic, wh-questions and relative clauses are not formed via overt movement and involve resumptive pronouns (Aldwayan et al., 2010). In (7a), for example, the wh-phrase min ‘who’ originates in this surface position, which is the specifier of the complementizer phrase.
(7) a. min alli shif-t- who c see.PERF.2SG.MASC-him ‘Who did you see?’ b. alli shif-t c see-PERF.2SG.MASC who ‘Who did you see?’
That is, the wh-phrase min ‘who’ does not move from the object position after the verb shif-t ‘see-perfective’ because this position is already filled by an obligatory resumptive pronoun. If a resumptive pronoun is not used in wh-questions, the wh-phrase remains in situ, as in (7b). Since Najdi Arabic does not have overt wh-movement, the grammaticality of sentences in Najdi Arabic is not affected by syntactic constraints on wh-movement, as is the case in English. In (8), for example, the wh-questions are grammatical in Najdi Arabic although their English counterparts are ungrammatical due to a violation of an adjunct island constraint (8a) and wh-island constraint (8b) on wh-movement.
(8) a. ADJUNCT ISLAND ayy mafateeh ya-glag Ali etha Fahd which keys 3SG.MASC-worry.IMPERF Ali if Fahd nisa- ‘Which keys does Ali worry if Fahd forgets them?’ (cf. *Which keys does Ali worry if Fahd forgets ___?) b. WH-ISLAND ayy rjal 9alima-ni Ali mita zar- which man tell.PERF.3SG.MASC-me Ali when visit.PERF.3SG.MASC-
It is important to point out that the judgments provided here come from native speakers’ intuitions and not from systematic experimental investigations of whether there are indeed island effects in this spoken variety of Arabic. Recent research on Modern Standard Arabic (Tucker et al., 2019) and Jordanian Arabic (Al-Aqarbeh and Sprouse, submitted) have shown island effects to some extent in various domains of the grammar. Future research is needed to examine whether island effects are observed in Najdi Arabic.
IV Method
1 Participants
Seventy-two advanced Najdi Arabic learners of English voluntarily participated in the study. The Najdi Arabic learners (67 males, mean age = 27.4) started learning English at the age of 13 in public schools in Saudi Arabia and were all native speakers of Najdi Arabic who were in the USA at the time of testing. 1 Twenty-nine of these learners studied in an English-speaking country and had exposure to English for one to seven years. All Najdi Arabic learners completed the Michigan Listening Comprehension Test to assess their English proficiency. The test consisted of 45 listening comprehension questions that targeted various grammatical constructions. The learners’ scores ranged from 35 to 44 out of 45 possible correct answers (M = 40.05, SD = 2.31) indicating that the learners had reached an advanced level of proficiency.
Eighty-five monolingual native speakers of English also participated in the study. They were undergraduate students at the University of Kansas and received extra credit for participating in the study. The final data analysis included data from 82 English native speakers (63 females, mean age = 19.9). 2
2 Materials
Building on Sprouse et al. (2012a), the stimuli for the AJT in the present study were designed to test the effects of four island types: adjunct islands, subject islands, complex NP islands, and whether islands. To test each of the four island types, wh-dependency length and presence of an island structure were manipulated in four conditions using a 2 × 2 factorial design, as in (9).
(9) ADJUNCT ISLAND a. NONISLAND/ MATRIX The helpful worker thinks that the boss left her keys in the car. b. NONISLAND/ EMBEDDED The worker thinks that the boss left her office keys in the car. c. ISLAND/ MATRIX The helpful worker worries if the boss leaves her keys in the car. d. ISLAND/ EMBEDDED The worker worries if the boss leaves her office keys in the car. *
The wh-dependency is either short, as in (9a) and (9c), with a wh-extraction from a matrix clause, or long, as in (9b) and (9d), with a wh-extraction from an embedded clause. The island structure is either absent, as in (9a) and (9b), or present, as in (9c) and (9d). The first three conditions are grammatical, while the last condition is ungrammatical due to an island violation.
The context sentence in each condition was designed to match the test sentence in structure and lexical material. Importantly, the context sentence introduced the NP that was to be extracted in the target question (e.g. the helpful worker, her office keys) so as to make the test sentences more natural.
We constructed 64 sets of sentences in total, 16 sets for each of the four island types. Each set included sentences in four conditions, as in (9). The sentences from the 64 sets were distributed among four lists using a Latin square design, such that every participant was presented with only one sentence from every set. Each list had 64 target sentences, 16 sentences targeting each of the four island types (adjunct islands, subject islands, complex NP islands, and whether islands). Thus, for each island type, there were four sentences targeting each of the four conditions (non-island/matrix, non-island/embedded, island/matrix, island/embedded).
In order to balance the number of grammatical and ungrammatical stimuli on each list, 32 ungrammatical declarative filler sentences were added to each of the four lists. These filler sentences included ungrammatical relative clauses with a resumptive pronoun in subject, object, indirect object, oblique object, and object comparative positions. The filler sentences also included ungrammatical relative clauses with double complementizers, ungrammatical sentences with null subjects in embedded clauses, and sentences with ungrammatical passives. 3 Thus, the total number of sentences in each list was 96, including 64 experimental sentences (48 grammatical and 16 ungrammatical) and 32 ungrammatical filler sentences. 4 The sentences in each list were presented in four blocks. Each block consisted of 24 sentences that included 16 experimental sentences (12 grammatical and 4 ungrammatical) and 8 ungrammatical filler sentences. The experimental and filler sentences were randomized in each block. The order of blocks was also randomized across participants. The experiment was presented using the experimental software Paradigm (Tagliaferri, 2005). On each trial, a declarative background sentence first appeared on the computer screen. After reading the declarative sentence, the participant pressed the space bar to advance to the next screen and was presented with only the test sentence. The participants then judged, with no time limit, whether the test sentence sounded natural or unnatural in English, using a seven-point rating scale displayed underneath the test sentence. The rating scale ranged from ‘totally unnatural’ to ‘perfectly natural’. The participants were also given the option to choose ‘I do not know’ if they could not make a judgment. The task began with 6 practice trials to familiarize participants with the task.
3 Measure of sensitivity to island effects
To measure sensitivity to island effects, the study used the differences-in-differences (DD) score, following Sprouse et al. (2012a), as in (10).
(10) Rating a. D1 = (NONISLAND/EMBEDDED) – (ISLAND/EMBEDDED) (z-score units) Which keys does the worker think that the boss left ___ in the car? 0.7 * Which keys does the worker worry 2.1 b. D2 = (NONISLAND/MATRIX) – (ISLAND/MATRIX) Which worker ___ thinks that the boss left her keys in the car? 1.6 Which worker ___ worries 1.2 c. DD = D1 – D2 = 2.1 – 1.2 = 0.9
The DD scores measure how much greater the effect of an island structure is in a sentence with a long wh-dependency than in a sentence with a short wh-dependency. Higher DD scores reflect stronger sensitivity to island effects and lower acceptance of ungrammatical sentences with island violations while lower DD scores reflect weaker sensitivity to island effects and higher acceptance of ungrammatical sentences with island violations.
4 Measure of WM capacity
The present study used a complex span task to measure WM capacity, which is the automated operation span task (Unsworth et al., 2005). In this task, participants first see a math operation (e.g. (1 × 2) + 1 = ?). After they solve the math operation, on the next screen, a digit will appear (e.g. 3) and participants are instructed to choose ‘true’ if the digit corresponds to the correct answer or ‘false’ if the digit is not correct. After responding, they advance to the next screen and see a letter. After three to seven such operation-letter strings, participants are presented with a recall grid and are asked to click on the letters they saw in the correct order in which they saw them. Participants were presented with three sets of each set size; the set sizes ranged from three to seven operation-letter strings. Each participant was presented with a total of 75 letters and 75 math operations.
Before participants began the task, they completed three practice sessions during which a time limit for solving each math operation was calculated for each individual. The time limit calculation was based on each participant’s mean length of time (in seconds) to solve the math operations in the practice session, plus 2.5 SD. We adopted the all-or-nothing scoring method that takes into account the total of all correctly recalled sets. For example, if a participant correctly recalled 4 letters in a set size of 4, 5 letters in a set size of 5, but 4 letters in a set size of 6, the operation span score would be 9 (4+5+0).
5 Procedure
English natives and Najdi Arabic learners were tested individually, using a computer. They signed a consent form and filled out a background information questionnaire. Najdi Arabic learners were provided with a vocabulary list of difficult words before the experiment began to ensure that they understood all lexical items in the stimuli; they were asked to look it over and were invited to ask questions if they were unfamiliar with any of the words. Both natives and learners first took the AJT and then completed the WM task. Learners completed the proficiency test at the end of the session.
V Results
We first present the results of English natives and L2 learners from the AJT. Then, we present the results of the relationship between WM capacity and island sensitivity.
1 AJT analyses
Prior to statistical analysis, each participant’s acceptability ratings were z-score transformed. We performed linear mixed-effects models using the R packages lme4 and lmerTest (Bates et al., 2015; Kuznetsova et al., 2017) to investigate whether participants were sensitive to island effects in the AJT. For each island type, a full model was constructed which included the fixed effects Island Structure (non-island, island), Dependency Length (matrix, embedded), and the interaction Island Structure × Dependency Length. Models included maximal random effects structures, with random intercepts for participants and items, as well as by-item and by-participant random slopes for each factor and the interaction term. The full model that converged was simplified stepwise using likelihood ratio tests to determine whether the inclusion of random and fixed effects improved model fit.
a English native speakers
The English natives’ mean acceptability ratings and standard deviations for each condition in the four island types tested are reported in Table 1. Ratings ranged from 1–7, with higher scores reflecting greater acceptability.
Natives (N = 82): Means and standard deviations for each condition.
Note. Ratings ranged from 1 to 7 with higher number indicating more acceptability.
The best-fitting model for each island type revealed a significant main effect of wh-dependency length (whether: est = 0.64, SE = 0.05, t = 12.32, p < .001; complex NP: est = 1.45, SE = 0.12, t = 12.52, p < .001; Subject: est = 1.56, SE = 0.08, t = 18.51, p < .001; adjunct: est = 1.46, SE = 0.10, t = 14.15, p < .001). A main effect of island structure for each island type was also significant (whether: est = 0.42, SE = 0.05, t = 9.14, p < .001; complex NP: est = 1.17, SE = 0.11, t = 10.35, p < .001; subject: est = 1.67, SE = 0.10, t = 16.57, p < .001; adjunct: est = 1.35, SE = 0.13, t = 10.02, p < .001). These main effects reflect the fact that sentences with longer wh-dependencies were rated lower than those with shorter (matrix) wh-dependencies, and sentences with islands were rated lower than non-island sentences.
Crucially, the interaction between wh-dependency length and island structure was significant for each island type (whether: est = −0.36, SE = 0.06, t = −6.29, p < .001; complex NP: est = −1.01, SE = 0.15, t = −6.64, p < .001; subject: est = −1.65, SE = 0.11, t = −14.98, p < .001; adjunct: est = −1.19, SE = 0.14, t = −8.48, p < .001). This interaction resulted from low acceptability ratings of the ungrammatical island violation condition compared to the other three grammatical conditions for each island type. In other words, the effect of the island structure was greater in sentences with a long wh-dependency than in sentences with a short wh-dependency, indicating that English natives were sensitive to island effects in all four island types. Interaction plots for each island type are shown in Figure 1.

Natives (N = 82): Interaction plots for each island type.
b L2 learners
L2 learners patterned similarly to English natives on the AJT. The L2 learners’ mean acceptability ratings and standard deviations for each condition in the four island types tested are reported in Table 2. The best-fitting model for each island type revealed a significant main effect of wh-dependency length (whether: est = 0.25, SE = 0.08, t = 2.97, p < .001; complex NP: est = 1.09, SE = 0.11, t = 9.70, p < .001; subject: est = 0.98, SE = 0.09, t = 10.55, p < .001; adjunct: est = 0.67, SE = 0.11, t = 5.85, p < .001). A main effect of island structure for each island type was also significant (whether: est = 0.34, SE = 0.08, t = 4.35, p < .001; complex NP: est = 0.95, SE = 0.10, t = 9.31, p < .001; subject: est = 0.90, SE = 0.09, t = 10.07, p < .001; adjunct: est = 0.79, SE = 0.11, t = 7.31, p < .001). These main effects reflect the fact that sentences with longer wh-dependencies were rated lower than those with shorter (matrix) wh-dependencies, and sentences with islands were rated lower than non-island sentences.
Learners (N = 72): Means and standard deviations for each condition.
Note. Ratings ranged from 1 to 7 with higher number indicating more acceptability.
Crucially, a significant interaction between wh-dependency length and island structure was also found for each island type (whether: est = −0.23, SE = 0.09, t = −2.55, p < .05; complex NP: est = −0.94, SE = 0.13, t = −7.26, p < .001; subject: est = −0.80, SE = 0.11, t = −7.32, p < .001; adjunct: est = −0.67, SE = 0.14, t = −4.92, p < .001). This interaction resulted from low acceptability ratings of the ungrammatical island violation condition compared to the other three grammatical conditions for each island type. In other words, the effect of the island structure was greater in sentences with a long wh-dependency than in sentences with a short wh-dependency, indicating that L2 learners were sensitive to island effects in all four island types. In short, similar to English native speakers, the Najdi Arabic learners demonstrated island sensitivity across all island types. Interaction plots for each island type are shown in Figure 2.

Learners (N = 72): Interaction plots for each island type.
Additional analyses were conducted to compare the size of island effects observed in native speakers vs. L2 learners. To address this question, Group (L1, L2) was included as a factor in models testing the four island types. As expected, all models showed a significant two-way interaction between wh-dependency length and island structure, resulting from lower acceptance rates in the island violation condition. In the models for adjunct and subject islands, a significant three-way interaction between group, wh-dependency length, and island structure emerged. This interaction term indicated that the ‘island effect’ for native speakers was larger than for L2 learners for adjunct islands and subject islands. No three-way interaction with the variable Group emerged for the complex NP or whether island models. In sum, we observed that for strong islands (adjunct, subject), native speakers showed more robust island effects. For weak islands (complex NP, whether), results showed that acceptability judgments were similar across groups.
2 Individual differences analyses
To investigate the source of the island effects reported above, we next examined whether island sensitivity (quantified by DD scores) was modulated by individual differences in WM (quantified by the operation span scores).
a English native speakers
English natives’ scores on the operation span task, which measures WM capacity, ranged from 7 to 75 (M = 42.57, SD = 16.40). DD scores, which measure sensitivity to island effects, were calculated for each participant for each island type. The English natives’ DD scores are plotted as a function of their operation span scores in Figure 3.

Natives (N = 82): Differences-in-differences (DD) scores plotted as a function of operation span scores. The solid line represents the line of best fit for all DD scores. The dashed line represents the line of best fit when DD scores below zero are removed from analysis (shaded gray). R2 for each trend line is reported in the legend.
For each island type, we ran two sets of linear regressions following Sprouse et al. (2012a). The dependent variable for the first regression was the complete set of DD scores for each island type. The second regression was run with DD scores ⩾ 0 for each island type. We performed the second analysis because, as noted by Sprouse et al. (2012a), DD scores below zero suggest a ‘subadditive’ island effect, which is not predicted by the resource-limitation theory or grammatical theories. A subadditive island effect means that the effect of an island structure is less in sentences with a long wh-dependency than in sentences with a short wh-dependency. However, it could be the case, as Sprouse et al. (2012a) noted, that DD scores below zero represent people who actually do not have sensitivity to superadditive island effects. If this is the case, then including DD scores below zero may increase the possibility of finding a relationship between operation span scores and DD scores. The second regression analysis resulted in the exclusion of five participants for adjunct islands, one participant for subject islands, two participants for complex NP islands, and 17 participants for whether islands.
To assess the strength of evidence with respect to hypothesis testing (Dienes, 2014), Bayes factors (BF) were calculated for each island type using the Jeffreys–Zellner–Siow (JZS) prior with the R package BayesFactor (Morey and Rouder, 2018). BF < .33 is considered substantial evidence for the null hypothesis over the alternative hypothesis, which predicts no relationship between DD scores and WM scores. BF > 3 would be considered substantial evidence for the alternative hypothesis over the null hypothesis, which would be expected under a resource-limitation view. BF between .33 and 3 indicate that the data do not provide substantial evidence to distinguish the null and alternative hypotheses.
Results from the linear regressions are reported in Table 3. For each island type, the line of best fit, goodness of fit, and significance of the slope are provided. In both sets of linear regressions, none of the best-fit slopes were significantly different from zero across all island types. Moreover, the R2 value, which measures how much of the variance in DD scores can be explained by the operation span scores, was very low for each island type. Finally, Bayes factors for each island type provided adequate evidence in line with the null hypothesis for all of the linear regressions, with Bayes factors below .33 for all island types. Together, these results indicate that there is not a strong relationship between island sensitivity and processing resources.
Natives (N = 82): Linear regression modeling differences-in-differences (DD) scores as a function of operation span score.
b L2 learners
L2 learners’ scores on the operation span task ranged from 0 to 75 (M = 42.19, SD = 18.76). We followed the same analysis procedure as described above. 5 The second regression analysis excluded 14 participants for adjunct islands, 10 participants for subject islands, four participants for complex NP islands, and 21 participants for whether islands. The L2 learners’ DD scores are plotted as a function of their operation span scores in Figure 4.

Learners (N = 72): DD scores plotted as a function of operation span scores. The solid line represents the line of best fit for all DD scores. The dashed line represents the line of best fit when DD scores below zero are removed from analysis (shaded gray). R2 for each trend line is reported in the legend.
Results from the linear regressions are reported in Table 4. In both sets of linear regressions, none of the best-fit slopes were significantly different from zero across all island types. The R2 value for each model was very low, and thus the goodness-of-fit results and significance tests of the slopes (p-values) indicate a lack of significant relationship between DD scores and operation span scores.
Second language (L2) learners (N = 72): Linear regression modeling differences-in-differences (DD) scores as a function of operation span score.
Bayes factors for each island type similarly provided evidence in line with the null hypothesis for most of the linear regressions, with Bayes factors below or around .33 for most island types, with the exception of the second whether island model which indicated that there was not substantial evidence to support the null hypothesis or reject the null hypothesis in favor of the alternative (BF = 1.342). Together, these results provide no evidence of a robust relationship between island sensitivity and WM. 6
VI Discussion
The present study investigated two research questions. The first research question is whether Najdi Arabic learners of English are sensitive to syntactic island constraints on overt wh-movement, a property that is not instantiated in their L1. The second question addresses the source of island effects in both natives and learners.
With respect to the first question, the results of the AJT showed that Najdi Arabic learners, like English natives, rejected the island violation condition and accepted the other three grammatical conditions for each of the four island types tested. This pattern of acceptability exhibited by both English natives and Najdi Arabic learners led to an interaction between wh-dependency length and island structure, suggesting that Najdi Arabic learners, like English natives, were sensitive to syntactic island constraints on wh-movement. These results are in line with Schwartz and Sprouse’s (1996) proposal that L2 learners can access innate syntactic constraints and other papers reporting island sensitivity (e.g. Li, 1998; Martohardjono, 1993; White and Juffs, 1998). The results of the AJT also showed that Najdi Arabic learners patterned similarly to English natives in terms of the strength of their sensitivity to the four island types tested. Both English natives and Najdi Arabic learners showed a weaker sensitivity to whether islands as compared to their sensitivity to adjunct islands, subject islands, and complex NP islands. This pattern of sensitivity is largely consistent with Belikova and White’s (2009) proposal, which argued that L2 learners are expected to perform more accurately on wh-extraction from strong islands such as adjunct islands and subject islands (universal constraints on extraction) than on wh-extraction from weak islands such as whether islands, which are grammatical for example in languages such as Greek (Alexopoulou and Keller, 2003). As noted by Szabolcsi (2006), there is some variation within English natives with respect to the acceptability of extractions from wh-islands (for work on Norwegian, see also Kush et al., 2018). The variability observed in the native English speakers in the present study is in line with other studies, such as Johnson and Newport (1991), Martohardjono (1993) and Pham et al. (2020). In our study, both natives and Najdi Arabic learners may have been even more likely to accept the whether islands because the wh-filler phrase that we include in the stimuli is a complex or discourse-linked wh-phrase, which arguably dilutes the effects of islands (Pesetsky, 1987; Rizzi, 1990). As noted by Phillips (2013), this type of wh-filler phrase is more likely to improve the acceptability of sentences that violate weaker island constraints like whether islands, but they cannot greatly improve the acceptability of sentences that violate stronger island constraints like relative clause islands. Importantly, the inclusion of the complex wh-filler phrase did not eliminate island effects in any condition, for either natives or L2 learners, in line with Sprouse et al. (2016).
Like whether islands, complex NP islands are also considered weak islands (Chomsky, 1986). However, in the present study, complex NP islands were strongly rejected by both English natives and Najdi Arabic learners. This clear rejection can possibly be attributed to a combination of factors in the sentences used to test this type of island, as in (11).
(11) * Which pie did the chef hear the message that Jeff baked?
As shown in (11), the complex NPs (e.g. the message that Jeff baked) are tensed. Moreover, the head of the complex NPs in these sentences (e.g. the message) is a definite noun. Tensed and definite complex NPs have been observed to cause stronger rejection of sentences with complex NP island violations (e.g. Chomsky, 1986; Szabolcsi and den Dikken, 2003). Another factor that may have led to the strong rejection of the sentences with complex NP island violations can be related to the structure of those islands. Although both complex NP islands and whether islands are weak islands, they differ in their structure. The structure of whether islands is a CP complement of a verb. However, it is still unclear whether complement clauses of nouns in complex NP islands are indeed complements or instead are adjuncts (see discussion in Belikova and White, 2009). Relatedly, Chomsky (1986) argued that complement clauses of nouns are more like adjuncts because nouns cannot properly govern their complements as verbs do. In sum, the patterns observed for the acceptability judgments across island types can likely be accounted for by a range of linguistic factors, but importantly are similar for both natives and L2 learners.
Our primary question addressed the source of the island effects, building on Sprouse et al.’s (2012a) approach to teasing apart grammatical accounts of islands (e.g. Phillips, 2006) from resource-limitation accounts (e.g. Hofmeister and Sag, 2010; Kluender, 2004; Kluender and Kutas, 1993). The results for English natives in the present study showed no relationship between operation span scores and DD scores for each of the four island types tested (p > .05), when DD scores below zero were included or excluded from the regression analysis. WM scores accounted for no more than 1% of the variance in the DD scores in any condition. In addition, Bayes factors for these analyses consistently supported the null hypothesis. These results contribute to a recent body of crosslinguistic evidence supporting the grammatical account of islands in native speakers (e.g. Kush et al., 2018, 2019; Pham et al., 2020; Sprouse et al., 2016; Yoshida et al., 2014).
Importantly, the Najdi Arabic learners showed a very similar pattern of results. For the analyses that included all data, the results showed no relationship between operation span scores and DD scores for each of the four island types tested (p > .05) and WM scores accounted for no more than 1% of the variance in the DD scores in any condition, similar to native speakers. In addition, Bayes factors for these analyses consistently supported the null hypothesis. The L2 learners differed from English natives only in the whether island type, where they showed a marginal relationship between operation span scores and DD scores, only in the analysis which included DD scores greater than or equal to zero (p = .059). This relationship is very weak, as demonstrated by the small R2 value of 0.09, which suggests that operation span scores account for only 9% of the variance in DD scores. Recall that under Sprouse et al.’s (2012) interpretation of the resource-limitation theory, the only predictor of the strength of sensitivity to island effects is WM resources. However, the Bayes factor for this regression indicated that there was insufficient evidence to support the null hypothesis. Thus, for the L2 learners, the grammatical account of islands is most strongly supported by the results for the complex NP, subject, and adjunct islands, a pattern also observed by Pham et al. (2020) for English native speakers.
In line with Johnson et al. (2016), these results make an important contribution to the L2 literature on islands constraints. As we discussed earlier, while previous studies have shown that L2 learners are sensitive to island constraints (e.g. Li, 1998; Martohardjono, 1993; White and Juffs, 1998) or that they avoid positing gaps in islands in the course of online processing (e.g. Aldwayan et al., 2010; Boxell and Felser, 2017; Covey et al., submitted; Omaki and Schulz, 2011; Perpiñán, 2020), these studies leave open the question of whether the source of the island effects is different in natives and learners. This is a theoretically important question as it has been proposed that L2 learners have a reduced ability to utilize abstract syntactic constraints during processing (Clahsen and Felser, 2006, 2018) and that learners have reduced processing abilities in the L2 (see McDonald, 2006). Thus, it is important to consider the possibility that L2 learners’ apparent sensitivity to island violations may be due to a simple overload of the processing system: learners may reject sentences with island violations or avoid positing gaps in islands simply because the sentences are too complex. To that end, the present study made several important methodological modifications to the Sprouse et al. (2012a) task design in order to make the processing of the complex wh-dependencies less burdensome, and to address several of the key criticisms raised by Hofmeister et al. (2012a, 2012b). Hofmeister et al. argued that the critical island violation sentences in Sprouse et al.’s experiments involved many sources of processing difficulty in addition to the island structure itself, which may have obscured the relationship between WM capacity and sensitivity to island effects. For example, they criticized the fact that the target sentences, which were direct questions, were presented to participants without a context, had referential NPs with no discourse antecedents (e.g. the boss), and had vague wh-fillers (e.g. what) instead of specific wh-fillers (e.g. which-NP). In our experiment, we preceded each test sentence with a declarative background sentence to make the processing of the test sentence easier, avoiding the pragmatic oddity of presenting questions without a context. We also used complex wh-fillers in the test sentences, which have been argued to facilitate processing of wh-dependency sentences at the gap site (e.g. Goodall, 2015; Hofmeister and Sag, 2010). Although the present study used less complex stimuli which reduced the processing burden of the task, the results from both natives and learners still showed no significant relationship between WM and island sensitivity. Thus, in line with Johnson et al.’s self-paced reading results (2016), we believe these results present strong evidence that island effects in both natives and L2 learners have a similar source, with the rejection of island violations being due to adherence to grammatical constraints in both populations.
As a final note, we will also address the WM measure that we used, as this is also a point of contention in the literature. Hofmeister et al. (2012a, 2012b) claimed that the failure of Sprouse and colleagues to find a relationship between WM and sensitivity to island effects was due in part to the measures of WM that were used. Hofmeister et al. (2012a, 2012b) argued that the serial-recall and n-back tasks that Sprouse et al. (2012a) used are simple span tasks and may not be considered measures of WM because they measure only storage capacity. In the present study, the automated operation span task that we used is a complex span task (Conway et al., 2005), which requires not only short-term storage but also simultaneous processing of additional information. However, it is still possible that we failed to observe a relationship between WM and sensitivity to island effects because we did not use a measure that perfectly captures the relevant processes used in processing island violation sentences. In their response to Hofmeister et al. (2012a), Sprouse et al. (2012b) suggest that this idea is improbable because although there are many WM tasks, the types of cognitive processes which these tasks can engage are limited. Thus, using a different WM measure is likely to yield results similar to those found in the present study and Sprouse et al. (2012a). Nevertheless, a recent study by Pañeda et al. (2020) raises the question of whether WM measures which assess capacity, as the one we used here, are appropriate measures to use given that on some theoretical accounts, it is retrieval, as opposed to storage, that is critical for language comprehension (e.g. Lewis et al., 2006; McElree et al., 2003). This is an important issue that should be addressed in future studies.
VII Conclusions
This study makes two important contributions to the L2 literature on syntactic island constraints. First, our results provide further evidence that learners can indeed acquire the syntactic constraints on wh-extraction in the L2 regardless of whether or not the L1 instantiates overt wh-movement. Najdi Arabic learners showed sensitivity to island constraints in English similarly to native speakers. Second, the results of our study, which took several methodological criticisms of Sprouse et al. (2012a) into account, observed no robust relationship between WM and island sensitivity, suggesting, at least for the island types examined here, that island constraints should not be reduced to capacity-based limitations in WM. Thus, our results suggest that the island effects observed for both natives and L2 learners are more likely to be due to the adherence to grammatical syntactic constraints, as opposed to limitations in processing.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
