Abstract
Norwegian allows filler-gap dependencies into embedded questions, which are islands for filler-gap dependency formation in English. We ask whether there is evidence that Norwegian learners of English transfer the functional structure that permits island violations from their first language (L1) to their second language (L2). In two acceptability judgment studies, we find that Norwegians are more likely to accept ‘island-violating’ filler-gap dependencies in L2 English if the corresponding filler-gap dependency is acceptable in Norwegian: Norwegian learners variably accept English sentences with dependencies into embedded questions, but not into subject phrases. These results are consistent with models that permit transfer of abstract functional structure. Norwegians are still less likely to accept filler-gap dependencies into English embedded questions than Norwegian embedded questions. We interpret the latter finding as evidence that, despite transfer, Norwegian speakers may partially restructure their L2 English analysis. We discuss how indirect positive evidence may play a role in helping learners restructure.
I Introduction
This article addresses first language (L1) transfer in the acquisition of filler-gap dependencies in adult second language (L2) acquisition. We ask whether Norwegian learners of English transfer acceptable filler-gap dependencies from their L1 Norwegian to their L2 English, including dependencies that are unacceptable (and therefore unattested) in English. We also consider whether and how Norwegians might learn that English is more restrictive than Norwegian.
Norwegian and English allow long-distance filler-gap dependencies into embedded declarative clauses. For example, the relative clause (RC) head the signals / signalene can be interpreted as either the direct object (1a, 2a) or subject (1b, 2b) of an embedded verb.
(1) a. Those were the signalsi that the sailors said [(that) folks could understand ___i ]. b. Those were the signalsi that the sailors said [ ___i meant danger]. (2) a. Det var signal-enei som sjømenn-ene sa [(at) folk kunne forstå ___i ]. That was signal- ‘Those were the signals that the sailors said that folks could understand.’ b. Det var signal-enei som sjømenn-ene sa [(at) __i betydde fare]. That was signal- ‘Those were the signals that the seamen said meant danger.’
Norwegian and English differ, however, in subtle ways. Embedded questions are islands in English in that they block filler-gap dependency formation (Chomsky, 1977; Sprouse et al., 2012). Attempting to associate the filler the signals with the embedded verbs in (3) results in unacceptability. In Norwegian embedded questions are not islands (Maling and Zaenen, 1982). It is acceptable to associate the filler signalene with the embedded gaps in (4).
(3) a. * Those were the signalsi that the sailors knew [ b. * Those were the signalsi that the sailors knew [ (4) a. Det var signal-enei som sjømennene visste [ That was signal- ‘Those were the signals that the sailors knew who could understand.’ b. Det var signal-enei som sjømennene visste [ That was signal- ‘Those were the signals that the sailors knew what meant.’ ~ ‘Those were the signals that the sailors knew the meaning of.’
In the present study, we investigate if the acceptability of sentences like (4b) leads native Norwegian speakers to accept sentences like (3b) in their L2 English.
We expect Norwegians to accept sentences like (3b) if they inappropriately transfer to L2 English those features of their L1 grammar that render embedded questions non-islands. What could such features be? Under many generative syntactic analyses ‘long-distance’ movement out of an embedded clause as in (1) and (2) requires successive-cyclic movement through the left-periphery of the embedded clause (e.g. Chomsky, 1977, 2000). In languages like English, this movement uses the specifier of the complementizer phrase (henceforth spec,CP) as an intermediate landing site. Ordinary declarative clauses are not islands, because spec,CP is empty, allowing the moved element to transit through. Embedded questions are islands because spec,CP is already occupied by the wh-phrase – who/what in examples (3a) and (3b) – so an intermediate stop-over is blocked.
Cross-linguistic differences in the islandhood of embedded questions are assumed to reflect parametric variation in the functional structure of the left-periphery of the clause (e.g. Reinhart, 1983; Rizzi, 1982). 1 For the sake of concreteness we make use of a specific proposal made for Mainland Scandinavian languages like Norwegian: Recent work argues that such languages have multiple specifiers in the complementizer domain that would permit successive-cyclic movement through the edge of an embedded question (e.g. Lindahl, 2017; Kush et al., 2018, 2019; Vikner et al., 2017). The relevant specifiers are generated by an extra functional head (e.g. the head c under Vikner and colleagues’ proposal).
Under this analysis, Norwegians would treat English embedded questions as non-islands if they transfer the extra functional structure of their L1 complementizer domain to English. As we discuss below, whether such transfer is possible is a point of disagreement between models of L2 acquisition. In our investigation, we address three inter-related theoretical questions at the intersection of second-language influence and learnability:
To what extent does L1 functional structure transfer to L2?
Are L1 features transferred to L2 in a conservative fashion?
How do L2 learners restructure after erroneous L1 transfer?
We discuss each question in turn.
1 What can transfer?
Cases of L1–L2 transfer are well documented. For example, learners often produce or accept L1 word order patterns that are ungrammatical in L2 (Ayoun, 1999; Rankin, 2012; Trahey and White, 1993; Westergaard, 2003; White, 1991). Such instances suggest that L2 learners use some aspects of their L1 (as a starting point) to analyse their L2, but exactly what transfers is a matter of considerable debate.
Models of L2/Ln acquisition disagree on the degree to which L1 functional structure transfers (for review, see Rothman et al., 2019). The Minimal Trees approach of Vainikka and Young-Scholten (1994, 1996, 2006) admits no transfer of functional projections from L1 to L2, positing that learners transfer only lexical projections (VPs) during early acquisition. Higher-level functional projections (e.g. CP) are assumed to emerge later in development via the interaction of L2 input and principles of Universal Grammar (UG) without using L1 functional heads as templates. Most other models of transfer assume that functional projections from L1 transfer to L2, though they differ as to what this entails. Eubank’s (1993) Weak Transfer hypothesis holds that functional heads from L1 transfer along with their parameter settings (e.g. basic directionality), but L1-specific lexical feature-values associated with those heads do not transfer. Full Transfer models contend that L1 functional heads, their parameter settings, and their associated feature values serve as the initial interlanguage template for L2 development (e.g. Schwartz and Sprouse, 1994, 1996). 2 As far as island-insensitivity in L1 Norwegian can be attributed to the presence of extra functional structure, observing comparable island insensitivity in L2 English would constitute evidence for transfer of that functional structure.
2 Conservativity and transfer
Language learners often encounter input that is compatible with two (or more) analyses differing in generative capacity: a restrictive analysis that closely fits the observed data and another more powerful analysis that generates both the observed data and additional unattested sentences. In such cases the strings generated by the first analysis represent a subset of the strings generated by the more powerful analysis (with respect to a given phenomenon). 3 When learners must choose between the two analyses, they face a version of the classic subset–superset problem (e.g. Berwick, 1985; Wexler and Manzini, 1987; White, 1989a, 1989b; for cases in L2, see Judy and Rothman, 2010; Yuan, 1997): should they choose the more, or less, restrictive analysis? What if they choose the less restrictive analysis and it turns out the be incorrect? If so, rejecting the superset analysis may be difficult since strings consistent with the subset analysis are equally consistent with the superset analysis.
Similar learnability considerations apply in L2 acquisition, where the problem is aggravated by the possibility of transfer: If the learner’s L1 supports the superset analysis and the analysis is transferred to L2, the result is an overly permissive L2 grammar that generates both acceptable and unacceptable L2 forms. The case of Norwegian is arguably such an instance: if Norwegian learners transfer their L1 functional structure to their analysis of L2 English filler-gap dependencies, they would be able to generate acceptable long-distance dependencies in English, but also island-violating dependencies that should be unacceptable in English.
Prior work in L1 acquisition has argued that learners can avoid erroneous overgeneralization by adopting conservative learning strategies that prefer restrictive analyses (e.g. Snyder, 2007; Westergaard, 2014). 4 In principle, it is possible that transfer is also conservative: L2 learners could eschew transferring features that would potentially over-generate or avoid transferring typologically marked structures (e.g. Mazurkewich, 1984) without direct evidence for those features. Previous research has shown that L2 learners are less conservative than L1 learners, but these studies have not directly considered the role that transfer might play in these situations (e.g. Anderssen et al., 2018; Clahsen and Muysken, 1986; White, 1989b). If Norwegians treat embedded questions as non-islands in L2 English, this would constitute evidence against conservative transfer.
3 Retraction and restructuring after transfer
L2 learners can undo transfer of an L1 feature (e.g. restructure) based on positive evidence of conflict between L2 input data and L1 analyses. Many models assume that direct positive evidence of conflict with the L1 analysis of phenomenon, P, is required for restructuring the L2 analysis of P (see, e.g. Schwartz and Sprouse, 1996): metalinguistic-type negative evidence (e.g. correction) is believed not to be useful for prompting underlying grammatical restructuring (Schwartz, 1993; White, 2003). For some basic phenomena like determining head-directionality the relevant evidence is in abundance, so restructuring should happen quickly. As the similarity between or number of (surface) forms predicted by L1 and L2 increases, however, the possibility of direct conflict diminishes: the relevant positive data are scarce, if they exist at all. In the absence of (enough) conflicting data, inappropriately transferred features are expected to persist late into acquisition or become fossilized (see, amongst others, Franceschina, 2005; Hawkins et al., 1993; Judy and Rothman, 2010; Lardiere, 2007; Schwartz and Sprouse, 1996). As such, instances where L2 surface forms are a subset of acceptable L1 forms represent paradigm cases where ‘persistent’ or fossilized transfer should obtain. Given that (most) acceptable English filler-gap dependencies are compatible with a transferred Norwegian analysis, we predict that Norwegians are likely to have restructured their L2 English grammars if transfer has occurred.
4 Past work on learnability of islands/movement
Before proceeding to our experiments, we briefly consider past work that investigated islands in L2 acquisition to highlight the difference from our research questions. Most prior studies were framed as tests of access to principles of Universal Grammar (UG) during L2 acquisition rather than transfer.
Some earlier experiments explored if L1 speakers of languages without overt wh-movement accept island-violating wh-movement in English (Johnson and Newport, 1991; Li, 1998; Martohardjono, 1993; White and Genesee, 1996; White and Juffs, 1998; Wolfe Quintero, 1992). For example: As part of a larger study, Martohardjono (1993) had L1 Chinese and L1 Indonesian participants judge sentences with long-distance wh-dependencies in their L2 English. Test sentences contained wh-dependencies that into five types of constituents that are islands in English: embedded questions (wh-islands), RCs, complex NPs, adjunct clauses, and sentential subjects. Martohardjono found that participants in both groups correctly rejected island violations on a non-trivial portion of trials, 5 which was taken as evidence for access to UG constraints on wh-movement during L2 acquisition (for similar conclusions, see Li, 1998; White and Juffs, 1998).
Other experiments have tested whether learners accept island-violating L2 filler-gap dependencies that correspond to unacceptable dependencies in their L1. Martohardjono (1993) again provides an example. Martohardjono asked L1 Italian participants to rate the same English sentences as the native Chinese and Indonesian participants in the experiment above. In Italian, wh-movement from all five of the constituents is unacceptable, just as in English (Rizzi, 1982; Sprouse et al., 2016). Martohardjono found that Italian participants rejected the test sentences at rates comparable to L1 English natives. 6 These results demonstrate that participants do not allow island-violating dependencies in their L2 if those dependencies are unacceptable in L1, an empirical conclusion that is also supported by the growing body of research on the real-time processing of islands in L2 (Felser et al., 2012; Kim et al., 2015; Omaki and Schulz, 2012).
The results above do not directly address the limits of transfer because they are in principle compatible with transfer either having or not having occurred. If speakers of non-wh-movement languages initially transferred their L1 analysis of wh-dependencies to L2 English, observing overt movement dependencies would prompt them to restructure and generate a new analysis for the observed forms in the L2 input. If transfer did not occur, they would similarly base an analysis of English wh-dependencies on input forms. Judgments of English dependencies, then, would be based on their input-driven analyses. In the case of Italian, if participants conservatively learn the distribution of acceptable English dependencies from the L2 input alone or transfer their L1 analysis, they should reject island-violating wh-dependencies all the same.
Unlike prior experiments, our work tests whether transfer occurs by testing cases where the dependencies allowed by L1 constitute a larger set than is allowed in L2. If transfer occurs, we expect ‘unlearning’ the L1 analysis should prove difficult because there is arguably little to no direct evidence that would contradict the transferred analysis. As such, we expect the transferred analysis to persist and to affect participant judgments even despite high otherwise proficiency in the L2 and significant time knowing the L2 well. In particular, we expect participants to accept unacceptable L2 forms generable under the L1 analysis. We tested whether L2 speakers of English make such errors with two acceptability judgment studies. To preview our main results, we find evidence for the predicted non-conservative transfer from L1 Norwegian to L2 English. However, we also find evidence that suggests some degree of restructuring: Norwegians do not uniformly treat embedded questions as non-islands in English as they do in Norwegian. We consider the implications of these facts in the General Discussion.
II Experiments
We ran two acceptability judgment studies that tested Norwegian speakers’ intuitions about the acceptability of relative clause dependencies in configurations like (3b) and (4b) in both English and Norwegian. We henceforth refer to such examples as Wh-Trace Configurations to highlight two characteristics of the constructions: (1) the islands in question are embedded questions (wh-islands) and (2) the filler is associated with a subject gap/trace immediately adjacent to the embedded wh-word. Both aspects of the constructions are presumed to result in unacceptability in English: (1) because of an island violation, and (2) because it is unacceptable in (most dialects of) English to have a gap next to an overt element in the complementizer domain (so called Comp-trace effects; Chomsky and Lasnik, 1977; Perlmutter, 1971). As both experiments had the same design, we present information about the materials, procedure, and analysis before discussing the specifics of each experiment.
1 Materials and design
Both experiments employed the factorial definition of island effects developed by Sprouse (2007) and used in many recent studies of island-sensitivity cross-linguistically (e.g. Kush et al., 2018, 2019; Sprouse et al., 2011, 2012, 2016). The standard 2×2 factorial design for island effects crosses the factors
(5) Sample Wh-Trace Item (English Conditions): The sailors . . . a. found someone that __ knew [ that the signal meant danger ]. b. saw the signal that they knew [ ___ meant danger ]. c. found someone that __ knew [ d. saw the signal that they knew [
In (5) the filler-gap dependency is a relative clause dependency.
We crossed the standard 2×2 manipulation above with an additional factor:
(6) Sample Wh-Trace Item (Norwegian Conditions): Sjømennene . . . Sailors. a. fant noen som __ visste [at signal-et betydde fare]. found someone that __ knew that signal- b. så signal-et som de visste [at __ betydde fare]. saw signal-def that they knew at __ meant danger c. fant noen som __ visste [ found someone that __ knew what signal- d. så signalet som de visste [ saw signal-def that they knew what ___ meant
In addition to Wh-Trace items, we tested sensitivity to another island type: Subject islands. The islandhood of subject phrases is determined by different syntactic constraints (e.g. the Condition on Extraction Domains of Huang, 1982) than the embedded questions. As a result, the extra functional structure that permits Wh-Trace island violations should have no effect on the islandhood of subjects. Thus, subjects should be islands in both Norwegian and English. This prediction has been verified by previous studies using the factorial design (Kush et al., 2018, 2019; Sprouse et al., 2011, 2016).
We adapted materials from Kush et al. (2018, 2019) to test the acceptability of RC-dependencies into subject islands. The design crossed
(7) Sample Subject Island Item (English Conditions): The judge . . . a. met the lawyer that __ hoped that the report would confirm the suspicions. b. read the report that the lawyer hoped would __ confirm the suspicions. c. met the lawyer that __ hoped that the information in the report would confirm the suspicions. d. read the report that the lawyer hoped that [the information in __] would confirm the suspicions. (8) Sample Subject Island Item (Norwegian Conditions): Dommeren. . . a. møtte advokaten som __ håpet at rapporten ville bekrefte mistankene. b. leste rapporten som advokaten håpet at __ ville bekrefte mistankene. c. møtte advokaten som __ håpet at opplysningene i rapporten ville bekrefte mistankene. d. leste rapporten som advokaten håpet at [opplysningene i __ ] ville bekrefte mistankene.
Subject island judgments provide an independent baseline of island-sensitivity that is not expected to be affected by the hypothesized transfer of functional structure.
2 Procedure
Test items were distributed across lists according to a Latin Square design and intermixed among filler sentences. The experiment was hosted on IbexFarm (Drummond, 2012). Participants participated on their own personal computers. Sentences were presented one at a time. Participants rated their acceptability on a 7-point scale. All participants rated English items first before judging a Norwegian block to minimize L1 interference. Instructions were presented in English and participants received a break between English and Norwegian blocks.
3 Analysis
Raw ratings were z-score transformed before analysis. We z-scored ratings by participant and language tested. Z-scoring by-participant helps to control for biases in how individual participants used the 7-point scale. Z-scoring by-language for each participant helps control for the fact that participants may use the scale differently in their L1 and L2 (Sorace, 1996; Spinner and Gass, 2019).
Z-scores were analysed with linear mixed effects models implemented using the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages in R (R Core Team, 2013). All models included fixed effects of
III Experiment 1
1 Materials
Sixteen items of 8 conditions apiece were created for each island type following the
2 Experiment 1a: Native English controls
Thirty-one native English volunteers recruited as control participants via social media (mean age = 38.0, SD = 11.9, 17 female; 27 from the United States) judged sentences in the English block of the experiment on Ibex Farm. One participant was excluded from analysis for having multiple response times < 500 ms. Because participants rated only the English sentences, participants rated 4 tokens per condition per island.
3 Results
Average acceptability judgments by condition are found in Figure 1. A summary of statistical analysis is found in Table 1. Native English speakers rated RC dependencies into both subject phrases and embedded Wh-Trace constructions much lower than RC dependencies into non-islands. There were clear island effects (

Average z-scored acceptability judgments from native English control participants in the Subject island (left panel) and Wh-Trace island (right panel) sub-experiments.
Summary of statistical analysis of native English Control judgments from Experiment 1a.
Notes. Significant effects are in bold face. Model: zscore ~
We also inspected by-participant ratings of the Subject and Wh-Trace Long–Island sentences to check for inter-trial consistency. Native English participants rejected Wh-Trace Long–Island sentences nearly uniformly. Twenty-eight of 30 participants rejected 4 out of 4 Wh-Trace Long–Island tokens, judging all below z = 0. The two remaining participants rated a single Wh-Trace Long–Island token above z = 0, but rejected the remaining 3 tokens.
Judgments of Subject Long–Island sentences showed slightly more variability. Fourteen participants rejected all four tokens that they rated. Twelve participants rejected three of four tokens. Three participants exhibited more variability: two of the three rejected only two of four Subject Long–Island sentences and one participant rejected only one of four. Overall, however, participants rejected RC-dependencies into subject phrases on the clear majority of trials.
4 Experiment 1b: Norwegian L1, English L2
a Participants
Twenty-seven native speakers of Norwegian took part (16 female). Two participants’ data were excluded because the participants reported exposure to English during infancy. Participants were students enrolled in the English Studies program at the Norwegian University of Science and Technology (NTNU) either at the bachelor’s or master’s level. Norwegian university students are assumed to have a proficiency in English at least commensurate to the Common European Framework of Reference for Languages (CEFR) at B2 level, as this is the minimum standard for enrollment for foreign students (see, for example, Samordna opptak, 2020). Participants filled out a short survey on their language background and their English exposure. An overview of responses to this survey is in Table 2.
Demographic information for Norwegian participants in Experiment 1b.
Unlike the English control participants, Norwegian participants rated 8 items per island in each language (2 tokens per condition per language). Average judgments by language and island type are plotted in Figure 2.

Average z-scored acceptability judgments from Norwegian participants in Experiment 1.
We first report the results of the omnibus
Omnibus statistical analysis of judgments from Experiment 1b.
Notes. Significant effects are in bold face. Model: zscore ~
b Subject islands
A statistical summary is given in Table 4. The size of Subject island effect was significantly larger in English (DD = 1.69) than in Norwegian (DD = 1.01), as indicated by a
Statistical analysis of judgments of the subject island items from Experiment 1b.
Notes. Significant effects are in bold face. Model: zscore ~
c Wh-Trace Islands
A statistical summary can be found in Table 5. Again, the three-way
Statistical analysis of judgments of the wh-trace island items from Experiment 1b.
Notes. Significant effects are in bold face. Model: zscore ~
The Wh-Trace island effect observed in English is smaller in magnitude than the subject island effects in either language, while the average rating of the English Wh-Trace Long–Island sentence is considerably higher (roughly ‒0.25) than the English Subject Long–Island sentence (roughly ‒0.80). Following Kush et al. (2018, 2019), we investigated whether the smaller effect reflected inconsistent judgments across trials. Figure 3 plots the distribution of z-scores in both Long conditions for each island–language combination. Response consistency is reflected in the degree to which judgments in a condition follow a unimodal distribution. Inconsistent judgments manifest as bimodal or uniform distributions. In each of the island–language pairs, the Long–NoIsland condition provides a baseline level of consistency against which to judge the responses in the Long–Island conditions. The extent of the overlap between the Long–NoIsland and Long–Island judgments provides a rough way of approximating the extent to which the RC-dependencies into islands were perceived as run-of-the-mill long-distance dependencies.

Distribution of judgments in Long–NoIsland and Long–Island conditions for each island and language pair in Experiment 1b.
Beginning with subject island sentences, we observe relatively little overlap between that the ratings for Long–Island and Long–NoIsland sentences. Judgments of Subject Long–NoIsland sentences cluster unimodally around the higher end of the scale (z = +1), with a thicker left tail. Judgments of the Subject Long–NoIsland condition, by contrast, cluster at the opposite end of the scale (z = ‒1) and exhibit less of a right skew.
Judgments of Long conditions in the Subject island sub-experiment provide a template for consistent judgment. Judgments in the Wh-Trace sub-experiments clearly do not conform to that template. Judgments in the Norwegian Wh-Trace Long–Island condition are consistent with general acceptability: z-scores are unimodally distributed about the high end of the scale with a fat left tail. The pattern of responses indicates that participants perceived the test sentences as unobjectionable on most trials. Bimodality in the corresponding Long–NoIsland condition suggests that participants were less consistent in judging those sentences. Turning to the English Wh-Trace sub-experiment we see bimodality in both Long conditions, though the larger mode falls on opposite ends of the range between conditions. Norwegian participants tended to accept Long–NoIsland sentences more often than reject them, but there were still a number of trials where they judged the sentences to be unacceptable. Most relevant to our purposes, the Norwegian participants often rejected Long–Island sentences, but there was a non-negligible number of trials on which they accepted structures that native English speakers reject.
5 Individual differences
Analysis of the rating distributions shows that there was inter-trial inconsistency in the ratings of English Wh-Trace Long–Island sentences, but it does not establish whether the cause was inter- or intra-participant inconsistency. To ascertain whether individual participants were inconsistent, we plotted each participant’s maximum judgment against their minimum judgment for each island–language combination (see Kush et al., 2019). In Figure 4, each dot corresponds to an individual participant.

Plots of by-participant minimum and maximum judgments for each island–language pair in Experiment 1b.
For the purposes of the analysis we adopt a crude definition of ‘acceptance’ and ‘rejection’: we treat all judgments that fall below z = 0 as rejections and all judgments that are above z = 0 as acceptances. Using this coarse categorization technique permits identification of three participant response types: Participants that rejected both tokens of an island type occupy quadrant 3 (bottom left). Those that accepted both tokens occupy quadrant 1 (top right). Those that occupy quadrant 4 (top left) rated island tokens inconsistently, accepting one and rejecting the other.
A few participants accepted one or both Norwegian subject island tokens, but subject island judgments otherwise exemplify consistent rejection: In Panels 1 and 2 of Figure 4 most participants fall into quadrant 4. Judgments of the Norwegian Wh-Trace sentences show a different pattern: all participants fell into quadrant 1 (consistent accepters) or quadrant 4 (inconsistent raters). Judgments of English Wh-Trace islands show more variability. Eleven participants consistently rejected Wh-Trace islands in English and 3 consistently accepted the constructions. The remaining 11 participants rated the sentences inconsistently. This level of inter- and intra-participant inconsistency stands in contrast to the relative uniformity of the same participants’ judgments of English Subject island tokens.
Given the differences in participant response patterns for the English Wh-Trace, we conducted an exploratory analysis of whether individual variability correlated with self-reported proficiency, weekly hours of English spoken, or English media consumption. We used participants’ English Wh-Trace island DD score as the dependent measure of island sensitivity. A positive correlation between DD score and individual measure would be expected on the assumption that increased exposure or proficiency made participants behave more like native English speakers. Hours of spoken English did not correlate with DD score (|t| < 1), nor did self-reported English proficiency (|t| < 1). There was a small, but significant negative correlation between DD score and English media consumption (t = ‒2.036, p < .05; adjusted R2 = .116). As Figure 5 shows, this correlation indicates – counter-intuitively – that participants who consumed more English media showed reduced sensitivity to English Wh-Trace island effects.

Correlation between participant Wh-Trace DD scores and self-reported hours of English media exposure in Experiment 1b.
6 Discussion
As expected, English participants rejected RC-dependencies into subject phrases. Norwegian participants also rejected subject island violations in their L1 Norwegian and L2 English. These results are expected, if subjects are islands in both languages and the extra functional structure that allows filler-gap dependencies into embedded questions does not amnesty subject island violations.
Participants diverged in their judgments of Wh-Trace items. Native English speakers exhibited large Wh-Trace island effects, rejecting RC-dependencies in Wh-Trace configurations. We failed to find a Wh-Trace island effect in Norwegian. Norwegian participants generally accepted RC-dependencies into Wh-Trace configurations in their L1 as readily as RC-dependencies into declarative complement clauses. Interestingly, we found a significant island effect with English Wh-Trace constructions, indicating that Norwegians rated RC-dependencies in Wh-Trace constructions less acceptable on average than RC-dependencies into non-island declarative complement clauses. However, the Wh-Trace island effect was smaller than subject island effects, because Norwegian participants rated English Wh-Trace islands inconsistently: participant ratings were a mix of ‘accept’ and ‘reject’ trials. We defer further discussion and interpretation of this finding to the General Discussion.
The number of trials where Norwegian participants accepted English Wh-Trace violations provides suggestive support for transfer from L1 Norwegian to L2 English. However, the experiment was relatively low-powered, with only two observations of the relevant configuration per participant. We wished to test if our findings would replicate, and whether participants would provide more consistent judgments of Wh-Trace island violations in English if given more trials. Therefore, we ran Experiment 2, in which we doubled the number of observations per participant. We also increased our sample size and drew from a wider pool.
IV Experiment 2
1 Participants
Forty-nine native speakers of Norwegian took part in experiment 2 (29 female). Like the participants in Experiment 1, participants in Experiment 2 were enrolled as bachelor’s and master’s students in a Norwegian university. Unlike the previous participants, participants in Experiment 2 were enrolled in a wide range of degree programs, not only English. All these courses of study presuppose that students have studied English from upper secondary school and have achieved minimum proficiency at CEFR B2 level. Participants provided the same information as in Experiment 1. Table 6 provides an overview of descriptive statistics.
Demographic information for Norwegian participants in Experiment 2.
2 Materials
Participants rated the same items as in Experiment 1 plus 16 new Wh-Trace items. As a result, participants judged 4 tokens per condition per language in the Wh-Trace sub-experiment instead of 2 as in Experiment 1.
3 Results
Participants’ average judgments by island type and language are plotted in Figure 6. A summary of the omnibus statistical analysis can be found in Table 7. As in the analysis of Experiment 1, we focus only on the highest-order interaction effects.

Average z-scored acceptability judgments from Norwegian participants in Experiment 2. Rows correspond to the island judged and columns correspond to the language of presentation.
Omnibus statistical analysis from Experiment 2.
Notes. Significant effects are in bold face. Model: zscore ~
There was a significant
a Subject islands
A statistical summary is in Table 8. As in Experiment 1, Long sentences were rated lower on average than Short sentences (p < .000) and Island sentences were rated lower NoIsland sentences (p < .000) collapsing across languages. The three-way
Statistical analysis of judgments of the subject island items from Experiment 2.
Notes. Significant effects in bold face. Model: zscore ~
b Wh-Trace islands
Table 9 presents a statistical summary. Long conditions were rated significantly lower on average than Short conditions (p < .000) and Island conditions lower than NoIsland conditions (p < .05). The
Statistical analysis of judgments of the wh-trace island items from Experiment 2.
Notes. Significant effects in bold face. Model: zscore ~
We again examined the distribution of ratings in Long conditions for all island–language pairs. Distributions are plotted in Figure 7. Subject Long–Island and Long–NoIsland distributions are roughly bimodal with modes at opposite ends of the rating scale. However, in the Norwegian Wh-Trace sentences, the distribution of ratings for Long–Island sentences is essentially indistinguishable from Long–NoIsland sentences: participants accepted the majority of test sentences in both conditions. English Wh-Trace judgments diverged from the ratings of their Norwegian counterpart sentences. Participants generally accepted Long–NoIsland sentences, judgments of Long–Island sentences are bimodally distributed. The group as a whole appears to accept and reject English Wh-Trace sentences with near equal frequency.

Distribution of judgments in Long–NoIsland and Long–Island conditions for each island and language pair in Experiment 2.
Figure 8 plots individuals’ minimum and maximum ratings for each island–language pair, to visualize rating consistency. Participants consistently rejected Subject Island violations in English and were generally consistent in their judgment of Norwegian Subject Island violations, as evidenced by the clustering in quadrant 3 in panels 1 and 2 of Figure 8.

Plots of by-participant minimum and maximum judgments for each island–language pair in Experiment 2.
Panel 3 shows that every participant accepted at least one Norwegian Wh-Trace island token – most participants fall into quadrant 4 – while many accepted all four tokens. Panel 4 indicates that almost all participants accepted at least one English Wh-Trace island token.
Figure 8 only provides information about the range of individual participants’ judgments. We were also interested in how many of the 4 Long–Island Wh-Trace tokens each participant accepted. Therefore, we binned participants by how many tokens they rated above 0. The result is in Table 10. Forty of 49 participants accepted 3 or 4 Norwegian Wh-Trace island tokens and none rejected all 4 tokens. Judgments of English Wh-Trace tokens showed less consistency: fewer participants accepted most of the items (18 of 49). Five participants consistently rejected Long–Island tokens. Nevertheless, Norwegian participants clearly displayed a different response pattern than Native English speakers in Experiment 1a, where all participants either uniformly rejected the Wh-Trace island tokens, or rejected 3 of 4.
Participants binned by the number of wh-trace island violation items they rated above the midpoint of the scale.
One question that Table 10 leaves unaddressed is how strongly participants’ judgments in the Norwegian Wh-Trace experiment correlate with their judgments in English. We addressed this question in two follow-up analyses. First, we plotted participants’ Wh-Trace DD scores in Norwegian against their DD scores in English, to determine whether there was a correlation between island effect size. This plot is in Figure 9. Second, we looked for a correlation between individual participants’ probability of accepting a Wh-Trace island violation in Norwegian and English. The correlation plot is provided in Figure 10.

Correlation between individual participants’ Wh-Trace DD scores in Norwegian and English in Experiment 2.

Correlation between individual participants’ probability of accepting a Wh-Trace island violation in Norwegian and English.
There was no reliable correlation between Norwegian and English Wh-Trace DD scores. As Figure 9 makes apparent, there were many participants who exhibited no island effects in Norwegian (z ⩽ 0), but nevertheless had a positive Wh-Trace DD score in English. Figure 10 shows a numeric trend such that participants who accepted a high proportion of Wh-Trace island violations in Norwegian were slightly more likely to accept Wh-Trace island violations in English, though this correlation was not significant (Adjusted R2 = .015; t = 1.32). The correlation was weakened by the group of 19 participants who readily accepted Wh-Trace island violations in Norwegian (> 50%), but were less likely to do so in English. Importantly, all but five of these participants still accepted Wh-Trace violations in English. Finally, we checked whether any of the three individual-level variables correlate with a participant’s Wh-Trace DD score. None of the measures correlated with DD score (ts < 1).
V General discussion
Embedded questions are islands for filler-gap dependency creation in English, but not in Norwegian. The difference between the two languages has been linked to extra functional structure in the left-periphery of the Norwegian clause (Kush et al., 2018, 2019; Vikner et al., 2017). We were interested in determining whether Norwegians transfer this extra functional structure from their L1 to their L2, English. We reasoned that if Norwegians transfer the functional structure to English, they should erroneously treat embedded questions as non-islands in English. Insofar as the set of acceptable Norwegian filler-gap dependencies represents a superset of the acceptable English filler-gap dependencies, acquiring the appropriate generalization in English should prove difficult if transfer has occurred. The difficulty reflects the fact that there is arguably little if any direct evidence to counter-exemplify the less restrictive hypothesis (White, 1989a).
To test whether such transfer occurs, we tested whether adult Norwegian speakers accept filler-gap dependencies into embedded questions (wh-islands) in English. Our results provide evidence of transfer from L1 Norwegian L2 English: Participants accept filler-gap dependencies into wh-islands in English even though they have never encountered those structures in their English input. Importantly, participants do not accept all island-violating filler-gap dependencies in L2 English, as evidenced by participants’ consistent rejection of subject-island-violating filler-gap dependencies. The fact that subject island violations were consistently rejected militates against an interpretation that attributes Norwegian participants’ acceptance of English Wh-Trace island violations to general island-insensitivity in L2. As predicted by transfer, our participants only accepted island-violations in English if the corresponding dependency was acceptable in Norwegian. Insofar as the non-island status of embedded questions is due to extra CP-level functional structure, our results are consistent with models that permit such transfer, including Weak Transfer (Eubank, 1993) or traditional Full Transfer models (Schwartz and Sprouse, 1994, 1996) over models that restrict transfer to minimal grammatical information (Vainikka and Young-Scholten, 1994, 1996).
We predicted that transfer of L1 functional structure could lead native Norwegians into a ‘superset trap’: having assumed that an analysis that allows more filler-gap dependencies than are acceptable in English, learners would be unable to retract to a more restrictive analysis. All else equal, we would therefore predict that Norwegian participants should accept island-violating dependencies as often in L2 English as they do in L1 Norwegian. Participant judgments yielded a more complicated picture: Almost all participants accepted Wh-Trace island violations in English on some portion of trials, but roughly one-third of our participants accepted the English structures less readily than in Norwegian.
1 The source of inconsistent judgments
Participants’ inconsistent judgments of English wh-island violations are consistent with two broad interpretations. First, participants may have rejected the sentences simply due to their increased complexity. This could occur if participants have greater difficulty processing wh-dependencies in their L2 than in their L1 (Juffs, 2005; Juffs and Harrington, 1995). We point out that this explanation presupposes that transfer must have occurred, otherwise the Norwegians would not accept the island-violations in English at all. The explanation holds, however, that Norwegians’ tendency to probabilistically reject island-violations does not constitute evidence of learning the appropriate English analysis: rejection occurs for orthogonal, extra-grammatical reasons.
The second option, which we favor, is that probabilistic rejection provides evidence of learning and partial restructuring. By ‘restructuring’ we simply mean that changes are made to some aspect of the holistic system or feature set transferred from L1. These changes could represent target-like restructuring, such that Norwegian speakers specifically reject the extra functional structure from Norwegian and adopt a simplified left-periphery identical to native English speakers. Alternatively, Norwegians could engage in ad-hoc compensatory restructuring wherein other grammatical changes are made to ensure closer surface alignment with acceptable English forms, without directly retracting the L1 functional structure. Our current results do not allow us to distinguish these two possibilities. Which of these outcomes is more likely depends, in part, on what types of L2 input learners receive as evidence that the set of filler-gap dependencies is different in English and how directly that evidence contradicts the L1 analysis. We consider the issue of evidence in the input presently.
Participants’ stochastic or inconsistent judgments are compatible with the notion that they have learned that the distribution of filler-gap dependencies differs between the two languages, but that learning or restructuring is not ‘complete’. Within a parameter-setting model of L2 acquisition (Schwartz and Sprouse, 1996; White, 2003) or a grammar competition/multiple grammars model (Amaral and Roeper, 2014; Rankin, 2014), this uncertainty could be modeled as a probabilistic competition between different grammars. Transfer would entail that Norwegian learners begin acquisition by assigning a high probability to their L1 analysis. Over time, however, they would accumulate evidence against that analysis and would shift probability to a more restrictive analysis (provided they could avoid the preemption problem; Rothman and Iverson, 2013; Trahey and White, 1993).
2 Evidence of difference
The question remains what cues learners could use in their English input as evidence in favor of the restrictive analysis. Negative evidence could, in principle, play a role. We consider direct negative evidence in the form of corrections an implausible mechanism given: (1) the relative infrequency of relevant productions, (2) the unreliability and ambiguity of interlocutors’ correction, (3) the low probability that wh-island violations are ever addressed explicitly in the English classroom (see, Carroll, 1995, 2001; Schwartz, 1993). Indirect negative evidence is another option: if Norwegian learners of English expect to encounter English wh-island violations at a rate comparable to Norwegian, then the absence of the structures could over time lead the learner towards the restrictive hypothesis. Prior research has argued that indirect evidence may play a role in L1 (Foraker et al., 2009; Perfors et al., 2011; Ramscar et al., 2013; Regier and Gahl, 2004; Rohde and Plaut, 1999) and L2 acquisition (Dahl, 2004; Plough, 1992). However, it is unclear whether the frequency of the island-violations is high enough in L1 to form the basis for strong predictions in L2.
Direct positive evidence of the unacceptability of wh-island violations does not occur, but some learning models allow indirect positive evidence to play a role (e.g. Pearl and Mis, 2016, or more traditional parameter-based models). Learners can rely on indirect positive evidence if there exist implicational relations between observed (non-island) structures and the possibility of island violations. Under the assumption that additional CP-level functional structure underlies the non-island status of embedded questions in Norwegian, Norwegians would require evidence that this structure is absent in English. Such evidence is only possible if some overt property of the English CP-domain conflicts with the Norwegian analysis. What could such cues be?
We assume that most sentences do not provide unambiguous evidence for deep differences in functional structure of the CP domain, given the similarity of surface word order patterns in the two languages. However, one piece of evidence might prove useful:
It has been suggested that evidence for an articulated CP-domain in Mainland Scandinavian comes from embedded V2 phenomena (e.g. Vikner et al., 2017). Mainland Scandinavian languages exhibit V2 word order in main clauses (9a; Holmberg and Platzack, 1995): the finite verb (skal) is the second constituent in the linear string regardless of whether a subject (9a) or non-subject (9b) occupies sentence-initial position: (9) a. Han He shall presumably ‘He probably won’t sing tomorrow.’ b. I morgen tomorrow shall he presumably
The traditional analysis holds that V2 movement requires movement of the finite verb to C0. Canonical word order in embedded clauses is not V2, as evidenced by the position of the verb with respect to adverbs and negation in (10). This entails that the verb does not move to the embedded C position.
(10) Han er lei for at han antakeligvis ikke He is sad for that he presumably ‘He is sad because he probably won’t sing tomorrow.’
It has been observed, however, that V2 word order is possible in some embedded clauses (see, amongst others, Bentzen, 2014; Julien, 2007). For example, in (11) the frame adverbial i morgen (‘tomorrow’) has been fronted internal to the embedded clause and the verb has moved past the embedded subject: (11) Han sa [at i morgen skal han ikke synge.] He said that tomorrow shall he ‘He said that
Sentences like (11) provide evidence for extra functional structure in the left-periphery of the clause under the assumption that skal has moved to a head in the CP-domain distinct from the head hosting the complementizer head at (‘that’) and there exists a specifier position between at and the verb that i morgen can occupy.
In English, embedded fronting of a non-subject does not result in V2/subject-auxiliary inversion. Thus, observing the absence of V2 in embedded clauses like (12) might provide evidence that the language lacks the extra functional structure. 7
(12) He said that tomorrow {he will not | *will he not} sing.
Indirect positive evidence might also come from input sentences that do not involve observing different complementizer-level functional structure: English speech errors might also provide relevant evidence of a difference. It is well known that English speakers produce resumptive pronouns inside islands to ‘rescue’ ill-formed sentences (e.g. Morgan and Wagers, 2018; Ross, 1967). Importantly, English speakers produce resumptives in precisely the locations where Norwegian would allow gaps. For example, the sentences in (11) were observed in natural discourse: (13) a. There were a bunch of people at the party that I didn’t know [who b. ‘. . . the sale of the uranium that nobody knows what c. ‘Maybe it was a bad idea to get people together and try to record audio with some equipment that we didn’t know how
Based on examples such as those in (13), a learner with the knowledge that resumptive pronouns and gaps are in complementary distribution would be able to infer that embedded questions are islands in English. Importantly, drawing inferences based on indirect positive evidence requires non-trivial prior knowledge of the implicational relations between overt forms and (families of) underlying structures.
Indirect positive evidence of the type we describe above is likely to be relatively infrequent in the learner’s input. The relative infrequency of such structures may help explain why our participants appear not to have mastered the appropriate generalization and why there is significant inter-individual variation in outcomes despite long-term exposure to, and instruction in, English. Such effects follow under probabilistic models of grammar competition where conclusively shifting to the subset grammar would require repeated exposure to disconfirmatory evidence (e.g. Yang, 2018).
VI Conclusions
We have argued that native Norwegian speakers erroneously transfer the grammatical source of wh-island insensitivity from their L1 to their L2 English. Such effects are compatible with models of transfer that allow transfer of CP-level functional structure, but not those that restrict transfer to lexical information. We also found evidence that suggested that (some) learners may partially restructure, which we suggested could be triggered by indirect positive evidence. However, our data do not tell us whether the restructuring observed involves transition to the target English analysis or adoption of a divergent compensatory hypothesis that simply ensures closer surface alignment with the English forms.
Footnotes
Acknowledgements
Previous versions of this work were presented at UiT, UMASS, UC Santa Cruz, and at the 2019 CUNY Sentence Processing Conference. We thank audiences for helpful feedback. Special thanks to Jason Rothman for helpful comments on a previous draft. All errors or misrepresentations are our responsibility.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
