Abstract
Recent approaches to the that-trace phenomenon in English include syntactic analyses based on the principle of Anti-locality and a sentence production analysis based on the Principle of End Weight. These analyses have many similarities, but they differ in their predictions for second language (L2) speakers. In an Anti-locality analysis, we expect L2 speakers to show a pattern very similar to first language (L1) speakers, with substantial degradation in acceptability for extraction of a subject from an embedded clause with that. In the Principle of End Weight analysis, we expect L2 speakers to display this same subject extraction degradation whether or not the embedded clause has that. A sentence acceptability experiment with L1 English speakers and two groups of L2 English speakers (L1 Korean and L1 Spanish) confirm the prediction of the Principle of End Weight analysis: the L1 speakers show degradation with subject extraction from a that-clause, while the L2 speakers do the same with clauses with and without that. These results form an interesting contrast with studies of island effects, which have generally found substantial L1~L2 similarities, and show how L2 data can be used as evidence to decide between competing analyses of L1 phenomena.
Keywords
I Introduction
In general, extraction out of embedded complement clauses is freely available in English. As seen in (1), extraction of a subject, an object, or an adjunct are all possible.
(1) a. Who do you think [ __ met Sue]? b. Who do you think [Sue met __ ]? c. Where do you think [they met Sue __ ]?
There is nonetheless a curious restriction on this phenomenon, as has long been noted (Chomsky and Lasnik, 1977; Perlmutter, 1971). Cases like (1a), with a subject gap, are not possible when there is overt material in the CP layer of the clause, either in C, as in (2a–c), or in the specifier of C, as in (2d).
(2) a. * Who do you think [that [ __ met Sue]]? b. * Who would you prefer [for [ __ to meet Sue]]? c. * Who do you wonder [whether [ __ met Sue]]? d. * Who do you wonder [where [ __ met Sue]]?
This restriction only seems to arise when the gap immediately follows the material in the CP layer. Object gaps are possible, as in (3a), and subject gaps are possible when they are not adjacent to the material in the CP layer, as in (3b).
(3) a. Who do you think [that [Sue met __ ]]? b. Who do you think [that [after much deliberation, __ decided to meet Sue]]?
Presumably for this reason, the restriction does not occur in languages in which the subject gap is plausibly not at the left edge of the clause, as in the Spanish example in (4).
(4) ¿Quién crees [que [habló __ ]? who think.2s that spoke.3s ‘Who do you think spoke?’
The generalization thus seems to be that it is not possible to have overt material in the CP layer that is followed immediately by the gap of an Aʹ-dependency. This is known as the that-trace or COMP-trace (or complementizer-trace) effect. Given its peculiarities, for which children would appear to receive no overt evidence, it is generally assumed that the phenomenon follows from some deeper properties of language and is thus not learned as such. This idea is supported by the fact that the effect has been observed in a wide range of languages (for overviews, see Cowart and McDaniel, 2021; Pesetsky, 2017).
Here we will explore the extent to which this phenomenon exists among speakers whose exposure to the language did not begin until later in life, i.e. among L2 speakers. On the one hand, we would expect these speakers to be essentially the same as native speakers in their sensitivity to the that-trace effect. If this effect follows from some deep property of human language (e.g. from an aspect of UG that is accessible to L2 learners or from a ‘third factor’ in the sense of Chomsky, 2005), then it is hard to see how or why L2 speakers would be immune from it. On the other hand, given the surface complexity of the phenomenon, as seen in (1)–(3) above, and the fact that the effect is not visible in some languages, as in (4), it might seem reasonable to expect that L2 speakers would not show the same sensitivity to the effect as native speakers.
We will see in this article that neither of these expectations is upheld in the way one might expect. We show, by means of a formal sentence acceptability experiment with both native and L2 speakers of English, that the L2 speakers differ from the natives in some important ways but are similar in other ways. Specifically, the natives show a clear and robust that-trace effect, while the L2 speakers show what might be termed simply a ‘trace’ effect: they strongly disprefer a gap at the left edge of the embedded clause whether or not there is overt material in the CP layer. We will offer a characterization of L2 speakers’ ability in which these findings make sense. We will suggest that L1 and L2 speakers are broadly similar in the way that they represent and process Aʹ-dependencies, but that L2 speakers are more limited in their sentence planning abilities and this results in the ‘trace’ effect that we observe.
We begin with an overview, in Section I.1, of two prominent analyses of the that-trace phenomenon and in Section I.2, we explore what predictions these make for L2 speakers. In Section I.3, we review what is currently known about L2 speakers in this regard, and in Section II, we present the results of our own experiment. We compare these results in Section III with what is known about island effects among L2 speakers, a related phenomenon. In Section IV, we conclude with remarks on what the that-trace phenomenon can tell us about L2 and, just as importantly, what the L2 results that we have seen here can tell us about the that-trace phenomenon.
1 Analyses of the that-trace effect
The that-trace effect has been studied for many decades and is very well known, but there is still no consensus view as to why it exists. There has been a resurgence of interest in this area over the last few years, though, and we present here two analyses that have become particularly prominent.
a Anti-locality
Perhaps the most influential idea concerning the that-trace phenomenon in recent years is the proposal that what goes wrong in sentences like (2a), repeated here as (5), is that movement out of the embedded subject position is ‘too short’ (e.g. Bošković, 2016; Erlewine, 2020). 1
(5) * Who do you think [that [ __ met Sue]]?
That is, movement out of this position goes first to the specifier of C, before being raised into the higher clause, as schematized in (6).
(6) 
The claim is that the first step in this derivation is not possible, because only anti-local movement, as defined in (7), exists in natural language.
(7) Anti-locality: Movement of a phrase from the Specifier of XP must cross a maximal projection other than XP. (Erlewine, 2020)
This rules out (5)/(6), because movement from the specifier of TP (the embedded subject) only crosses TP itself as it moves into the specifier of CP (see Bošković (2016) for a different formulation of this idea).
In a sentence like (3a), in contrast, repeated here as (8), movement into the specifier of C is possible.
(8) Who do you think [that [Sue met __ ]]?
Wherever the moved phrase starts, it is lower than the specifier of TP, so crossing TP counts as crossing ‘a maximal projection other than XP’ and Anti-locality is obeyed. Similarly, in a language where subjects may undergo Aʹ-movement from a low position within the clause, crossing TP allows Anti-locality to be obeyed, as in the Spanish example in (4).
Anti-locality is also obeyed in sentences like (1a), repeated here as (9), but for a different reason.
(9) Who do you think [ __ met Sue]?
In this case, there is no intermediate CP structure, either because C and T have been bundled into a single head (CT; Erlewine, 2020) or because there is simply a bare IP (Bošković, 1997, 2016), so instead of movement from Spec of T to Spec of C, as in (6), there is movement from Spec of TP (or CTP) directly into the higher clause, as in (10).
(10) 
This movement will cross TP (or CTP), but it will also cross other maximal projections as it moves into the higher clause, so Anti-locality is respected.
An attractive feature of this approach is that what the child needs to learn is clearly evident in the input: independently of extraction, embedded clauses that have an overt C (that) are taken to have a CP structure, and embedded clauses without an overt C are instead taken to have simply a TP structure (or a CTP structure). Given this, and given the assumption that Anti-locality is an inescapable property of movement (i.e. it is a ‘third factor’ property of language in the sense of Chomsky, 2005; see Bošković, 2016; Douglas, 2017), everything else follows. (5) is impossible because it violates Anti-locality, while sentences like (8) and (9) are grammatical because they are in accord with Anti-locality. Moreover, in languages like Spanish that appear not to exhibit the that-trace effect – as in (4) – there would be abundant evidence for the child that subjects can appear in a low position, and if Aʹ-movement proceeds from this position, Anti-locality is obeyed and the sentence is grammatical.
b Principle of end weight
McDaniel et al. (2015) propose an account of the that-trace effect that is similar to Anti-locality accounts in some ways, but that relies on principles of sentence processing rather than of grammar. Their account takes as its point of departure the idea that the clause is the default major planning unit in sentence production, but under some circumstances, speakers may plan the matrix and embedded clauses as a single unit. McDaniel et al. make two further assumptions about this joint planning of the two clauses. The first is that it is associated with the absence of that in the (tensed) embedded clause. This is supported by the finding in Ferreira and Dell (2000) that speakers use that less when material in the embedded clause is more available. If availability makes planning of the embedded clause together with the matrix clause more feasible, then their finding suggests that this advance planning results in a reduced use of that. This conclusion also makes sense given the well-known generalization that both within and across languages, reduced clauses make long-distance extraction more possible. If tensed clauses without that are a kind of reduced clause (relative to clauses with that), and if long-distance extraction is made easier by planning the matrix and embedded clauses as a unit, then it is not surprising, under this view, that that is dispreferred in cases of extraction.
The second assumption made by McDaniel et al. is that gaps are especially taxing on the sentence production system when they are at the beginning of a planning unit. This follows from their Principle of End Weight, as in (11), which they borrow from Wasow (2002).
(11) Principle of End Weight (PEW): Phrases are presented in order of increasing weight. (Wasow, 2002: 3)
‘Weight’ here is understood to refer to syntactic and semantic complexity, with the result that gaps, which are very complex syntactically, are at variance with PEW when they are at the beginning of the planning unit.
Putting these two assumptions together, we can now reach the following conclusions. Since embedded clauses with that are likely a separate planning unit, having a gap at the beginning of that clause will be worse than having one further to the right. This accounts for the well-known contrast between (5) and (8) seen above. When the embedded clause does not have that, though, this signals that the embedded clause is part of the same planning unit as the matrix clause, so the embedded subject gap is no longer at the beginning of that planning unit. This accounts for the fact that subject extraction out of an embedded clause without that is relatively acceptable, as in (9) above.
This analysis makes clear why a classical that-trace violation as in (5) would be relatively difficult to produce: the embedded that-clause is a separate planning unit and having a gap as the first element in that unit goes against PEW. It is less clear, though, why this difficulty in production would result in substantially lowered acceptability. Nonetheless, there are reasons to believe that in general, there is a relatively tight connection between processing difficulty and acceptability. The simple existence of an Aʹ-dependency, for example, especially when this dependency is long-distance, is well known to be taxing to the processor and to also cause large degradations in acceptability (see Fanselow (2021) and Goodall (2021) for reviews). These effects are usually discussed in terms of parsing difficulties, but it is not unreasonable to think that production difficulties would have similar effects on acceptability. McDaniel et al.’s claim, then, that the low acceptability of the classical that-trace violation results from properties of sentence production is not unreasonable.
c Overview of the two analyses
We have now seen two possible analyses of the that-trace phenomenon. In one, the use of that creates additional structure that makes it impossible for the embedded subject to escape the clause, given that movement in natural language, by hypothesis, only operates in an anti-local fashion. When that is not present, this additional structure is not present either and the embedded subject is able to move out of the clause while still preserving anti-locality. In the other analysis, that signals that the embedded clause is a separate planning unit, so moving the embedded subject results in a gap at the beginning of that planning unit, which is so taxing for the production system that the sentence is perceived as unacceptable. 2 When that is absent, this means that the matrix and embedded clauses are part of the same planning unit, so a gap in the embedded subject position is no longer at the beginning of the planning unit.
These two analyses are different in some important ways, including the obvious fact that one is based on principles of grammar and the other on considerations of processing, but they also share some interesting similarities. One is that neither analysis makes direct use of the notion ‘subject’. In the Anti-locality analysis, the element that cannot move is the structurally highest element in the clause, whereas in the sentence production analysis, it is the element at the beginning of the planning unit. In both cases, that element is typically the subject, but being a subject is neither necessary nor sufficient (for extensive discussion of this point, see Erlewine, 2020). This allows both analyses to account for the effect seen earlier in (3b), repeated here as (12), where an adverbial inserted between that and the gap seems to obviate the that-trace effect.
(12) Who do you think [that [after much deliberation, __ decided to meet Sue]]?
Movement of the embedded subject is allowed here, either because the movement is now anti-local (because of the insertion of intervening structure), or because the gap is now not at the beginning of the planning unit. Similar considerations explain why embedded subject extraction is permitted in languages like Spanish, as seen earlier in (4). If subjects in these languages may be lower in the structure and further to the right, then movement will be anti-local and the resulting gap will not be at the beginning of the planning unit. In the rest of this article, we will continue to refer to the illicit gap in the that-trace configuration as the embedded subject, but we do this for ease of exposition only; the analyses just examined appear to be correct in claiming that it is not the subject status of the gap that is at issue.
The two analyses are also similar in that both offer natural accounts of why that-clauses constrain subject extraction more than clauses without that, and not the other way around. In the Anti-locality analysis, this is because that-clauses have additional structure that make anti-local extraction of the subject impossible, while in the production account, it is the that-clause that is associated with being a separate planning unit, presumably because that permits additional time for that planning, so a gap in embedded subject position violates PEW. 3
2 Predictions for L2
Neither of the accounts that we are considering here is explicitly designed to make predictions regarding possible differences between L1 and L2 speakers in the that-trace phenomenon. Nevertheless, they each implicitly make such predictions, and even more intriguingly, their predictions differ. In this section, we explore what these predictions are, how they differ and how they might be tested.
We begin with the Anti-locality analysis. One of the virtues of this analysis is that it provides a straightforward account of how L1 speakers could come to display that-trace effects. If children notice, as they surely must, that tensed embedded clauses sometimes have that and sometimes do not, and if they make the reasonable assumption that presence of that means that there is a CP structure and that absence of that means that there is not, then the that-trace effect follows. The reason is that the child is only able to perform movement that is anti-local, under this analysis, so extraction of the subject from within a that-clause is impossible, since it would require a kind of local movement that is beyond the child’s (or an adult’s) ability (assuming that Anti-locality is a third-factor property of language, as mentioned earlier). Only when there is an intervening adverbial, as in (12), or when the language allows subjects to surface in a lower position within the clause, as in Spanish, would the child be able to extract the subject, by means of non-local movement.
For an L2 learner, the situation would be approximately the same. The anti-local nature of movement would make extraction of subjects out of that-clauses impossible, barring the types of special circumstances just noted. If L1 speakers do not have the ability to perform local movement, then L2 speakers are presumably also unable to do it, since it does not seem credible that they alone, and not L1 speakers, would somehow acquire this ability. One might speculate that L2 speakers would assign different structures to that- and non-that-clauses than L1 speakers do, but this too does not seem likely. There is abundant evidence in the input that that can be either present or absent in tensed embedded clauses, and taking the presence of that as evidence for C (and CP) and its absence as evidence for no C (and no CP) would appear to be the null hypothesis, so it seems unlikely that L2 learners would reach some other conclusion.
The Anti-locality analysis thus leads us to expect very strongly that L2 speakers will behave similarly to L1 speakers with regard to extraction out of that-clauses. Extraction of subjects out of these clauses will be severely degraded relative to extraction of objects, but this asymmetry will disappear when that is not present. L2 speakers are not expected to have abilities surpassing those of L1 speakers (i.e. they are not expected to be able to overcome third-factor design features), so there is little reason to think that L2 speakers would show a pattern of results different from L1 speakers.
The PEW analysis is similar, to the extent that it too gives us a way of understanding how L1 speakers would come to have the that-trace effect. As mentioned earlier, children receive abundant evidence that tensed embedded clauses sometimes have that and sometimes do not, but children are thought to be limited in their ability to do advance planning (e.g. McDaniel et al. 2010), so they presumably treat the matrix and embedded clauses as a single planning unit less often than adults do. As their ability to do this develops, though, it is reasonable that they would omit that in these cases, while using that when the two clauses are planned separately. Omitting that decreases the distance between the two clauses, so it facilitates treating them as a single unit, while including that allows for additional time before the lexical content of the embedded clause begins, so it facilitates separate planning of the embedded clause. Given this, an adult-like that-trace effect follows. When the embedded clause is planned separately and that is used, it will be very difficult to have a gap at the beginning of that clause/planning unit because of PEW (by assumption, a fundamental property of the production system that does not need to be learned).
Unlike very young children, adults are able to do advance planning of the type that allows them to treat the matrix and embedded clauses as a single planning unit. Put more precisely, adults can do this in their L1, but there are many reasons to think that their ability to do this in an L2 will be impaired, given the evidence available about the processing abilities of L2 speakers. L2 speakers are widely thought to have a shorter working memory span (Service et al., 2002) and to be slower and less efficient at sentence processing in general (McDonald, 2006; McElree et al., 2000), but they have more specific deficiencies that would very plausibly impede their ability to do advance planning of an upcoming embedded clause. For example, they are less efficient at lexical access in a way that has many ‘downstream’ effects, making syntactic structure building slower and more difficult (Dekydtspotter and Renaud, 2014; Dekydtspotter et al., 2006; Hopp, 2014, 2016; Miller, 2014; Runnqvist et al., 2011). In addition, their ability to retrieve items from memory during sentence processing appears to be less efficient (Cunnings, 2017), as is their ability to generate expectations going forward (Grüter and Rohde, 2013, 2020; Mayo et al., 1997). Put simply, if lexical access, retrieval from memory, and generation of expectations are all slower and more labored, it would not be a surprise if L2 speakers would tend to produce one planning unit at a time and would find it particularly difficult to plan a matrix and an embedded clause jointly.
We thus expect that L2 speakers will be much less able than L1 speakers to plan both the matrix and embedded clauses as a single planning unit, regardless of the presence or absence of that. If they are generally treating embedded clauses as a separate planning unit, then PEW effects will arise, and embedded subject extraction will be severely degraded. In short, under the PEW analysis of the that-trace phenomenon, if speakers are limited in their ability to plan embedded clauses in advance, as seems likely for L2 speakers, they should then find extracting the subject of the embedded clause to be very difficult both when that is present and when it is not.
At this point, we see that the Anti-locality analysis and the PEW analysis make different predictions regarding that-trace effects among L2 speakers. In the Anti-locality analysis, the effect arises because the CP structure of which that is head makes movement of the subject of that clause impossible, given the anti-local nature of movement. Since there is no reason to expect that L2 speakers would be any more capable of doing local movement than L1 speakers, and since nothing prevents movement of the subject when that (and the CP structure) is not present, our expectation is that L2 speakers would behave essentially like L1 speakers. In the PEW analysis, on the other hand, the that-trace effect arises because of the difficulty in having a gap at the beginning of a planning unit. L1 speakers are able to circumvent this by planning both the matrix and embedded clauses as a single unit when that is not present, but as we have seen, there are good reasons to think that L2 speakers will be much less able to do this. We thus expect that unlike L1 speakers, L2 speakers will have difficulty with embedded subject extraction in general, not just when that is present.
Concretely, the Anti-locality approach predicts that out of the crucial sentences of the that-trace paradigm, summarized here in (13), L1 and L2 speakers will be alike in perceiving a substantial degradation in (c) that is not found in the other sentences.
(13) a. Who do you think [ __ met Sue]? b. Who do you think [Sue met __ ]? c. * Who do you think [that [ __ met Sue]]? d. Who do you think [that [Sue met __ ]]?
The PEW analysis, however, predicts that unlike L1 speakers, L2 speakers will find both (a) and (c) degraded relative to (b) and (d). The crucial point of difference between L1 and L2 speakers, under this analysis, is predicted to be (a): L1 speakers will show no special degradation in this case, while L2 speakers will show the same degree of degradation in (a) that is seen in (c).
3 What is known about that-trace effects in L2
Both of the analyses that we have discussed, that based on Anti-locality and that based on PEW, are in accord with regard to classic that-trace violations like (13c): they both have ways of explaining why L1 speakers find them degraded and they both predict that L2 speakers will also perceive this degradation. Where the two analyses differ is in their predictions for (13a). Anti-locality predicts that it will not be degraded in the way that (13c) is, while PEW predicts that it will be.
To a large extent, the existing literature on the that-trace phenomenon among L2 speakers seems to corroborate the idea that L2 speakers find (13c) degraded, in accord with both of the above approaches. That is, many studies have found that L2 speakers find sentences like (13c) deviant to some degree, in a way similar to L1 speakers (e.g. Heil and Ebert, 2018; Martohardjono, 1993; White and Genesee, 1996), just as the Anti-locality and PEW approaches both predict (though some others have found different results; see, e.g. Bley-Vroman et al., 1988; Filiou, 2019; Reglero, 2005).
As valuable as these studies are, though, their results are difficult to interpret from our current perspective and they do not allow us to test the differing predictions that the two analyses make for sentences like (13a). To see this, let us consider perhaps the best known of these studies, White and Genesee (1996). In this study, there were three groups of participants (L1, high-proficiency L2 and lower-proficiency L2), and all were presented with sentences and asked to decide whether each sentence was possible or not. Given 6 tokens of sentences like (13c), participants accepted them as possible at a rate of 22% (L1), 36% (high-proficiency L2), and 59% (lower-proficiency L2). 4 The fact that this sentence-type meets with substantially less than full acceptance among all three groups suggests that both L1 and L2 speakers perhaps perceive some degree of deviance here. However, there are two obstacles in trying to interpret these results with confidence, given the issues we are addressing here. First, acceptability ratings are inherently relative, so results for a single sentence-type are not very informative (Cowart, 1997; Goodall, 2021; Myers, 2009; Schütze and Sprouse, 2014). We could interpret the 59% acceptance rate of lower-proficiency L2 speakers, for instance, to mean that the sentence-type is basically grammatical, and assume that the rating is being pushed down because all long-distance wh-dependencies are degraded to some degree, or we could take it to mean that the sentence-type is basically ungrammatical, and assume that the rating is being raised because participants nonetheless find these sentences better than some other stimuli within the experiment. Without a comparison to other sentence-types, we simply don’t know what 59% acceptance means. Even the comparison across groups (22% vs. 36% vs. 59%), which might look meaningful at first, is hard to interpret. It could be that it reflects a genuine difference in the status of this sentence-type across participant groups, or it could be that some groups are more biased towards a ‘yes’ response than others (see footnote 6 below for discussion of the differences one might expect to find across proficiency groups with the type of experiment that we propose). Again, looking at this sentence-type in isolation does not allow us to know which is the right interpretation.
Second, the predictions that we aim to test here are ultimately predictions regarding the full paradigm of sentences in (13). The Anti-locality analysis, for instance, predicts that in clauses without that, comparison of subject extraction to object extraction – i.e. (13a) to (13b) – will not show the same degree of degradation that we see in the analogous cases with that; i.e. (13c) to (13d). The PEW analysis, on the other hand, predicts that there will be the same degree of degradation in the two cases. There may be independently motivated differences in the acceptability of extraction of embedded subjects vs. embedded objects that are not relevant to our concerns here, so if we measure the acceptability of a sentence like (13a), we want to be sure that we are measuring the part of the acceptability that relates to our predictions. We can do this if we first establish a baseline of the difference between embedded subject extraction and embedded object extraction out of a that-clause – (13c) and (13d) – and then compare this to the difference found between the analogous conditions in a clause without that; (13a) and (13b). The traditional way to do this is to design the experiment with factors, in such a way that each factor can be manipulated independently of the others (Fisher, 1935). In the case at hand, for instance, we could establish the factors
This design allows us to measure the effect of the presence of that and the effect of the gap position independently of each other, and crucially, it allows us to examine whether these two factors interact. For example, it allows us to measure the amount of degradation seen between (13c) and (13d) and compare this to whatever difference we find between (13a) and (13b). In order to avoid extraneous influences, the four conditions in a factorial design like (14) should be kept as similar to each other as possible, which means they need to be lexically matched, but each individual participant should see each condition in a different lexicalization, which means that many lexicalization sets like (14) need to be created and distributed among participants with full counterbalancing (Cowart, 1997; Goodall, 2021).
Studies like White and Genesee (1996) are suggestive, then, but they are not equipped to confirm or disconfirm the predictions that are our main concern here, because they are not designed with the kind of factorial analysis that allows us to determine whether there is an interaction or not. 5 There are a number of other studies that are suggestive in a different way, in that they report that L2 speakers find subject-extraction sentences like (13a) less acceptable and/or more difficult to process than object-extraction sentences like (13b) (e.g. Dussias and Piñar, 2010; Jackson and van Hell, 2011; Juffs, 2005; Juffs and Harrington, 1995; Schachter and Yip, 1990; White and Juffs, 1998) to a greater extent than L1 speakers do. These studies are interesting from our current perspective, in that they suggest that subject-extraction sentences like (13a) present special difficulties for L2 speakers, as the PEW account predicts, but they still do not fully confirm this prediction, since they do not examine the interaction, i.e. they do not tell us whether (13a) is worse than (13b) in the same way that (13c) is worse than (13d) for these speakers.
As will be discussed in more detail below, the PEW account predicts that there will not be an interaction between
II Experiment
In order to test the predictions of the Anti-locality account and the PEW account, we conducted a sentence acceptability experiment with L1 English speakers and two groups of L2 English speakers: L1 Korean and L1 Spanish.
1 Participants
121 L2 speakers of English with either L1 Korean (n = 72) or L1 Spanish (n = 49) and 72 L1 speakers of English participated in the experiment. All L2 participants were born in their home countries, either Korea or a Spanish-speaking country, and moved to the U.S. between age 6 and 15. All had resided in the U.S. for at least 7 years and were university students in California at the time of testing. The age range of the L2 participants was 18-29 years, with an average of 21, while for the L1 English participants it was 18-36, with an average of 21. Table 1 summarizes the language experience and background of the L2 participants.
Background and experience of L2 participants (n = 121).
Korean and Spanish were chosen as the L1 for the L2 English speakers, since these two languages represent typologically very distinct options, with Korean being wh-in-situ and Spanish employing wh-movement, but without the that-trace effect, as discussed earlier. It is reasonable to assume that the participants had a high proficiency level in English, given that all arrived in the U.S. in childhood or early adolescence, graduated from high school in the U.S. and were attending university in the U.S. at the time of testing.
2 Materials and procedure
Participants were presented with a series of sentences, where each sentence was accompanied by a 9-point response scale, as in the sample in Figure 1. They were instructed not to analyse the sentence, but to give their first reaction by indicating how good or bad it sounded to them using the scale provided. The stimuli were presented on paper and participants indicated their responses by circling the appropriate number on the scale. Upon completion of the experimental session, participants filled out a language background and experience questionnaire.

Experiment stimulus.
Test stimuli consisted of long-distance wh-questions varied by
(15) a. subject extraction with that: * Who did Bill think that ___ saw you? b. object extraction with that: Who did Bill think that you saw ___? c. subject extraction without that: Who did Bill think ___ saw you? d. object extraction without that: Who did Bill think you saw ___?
The wh-word was always who, the matrix subject was a proper name, and the non-extracted subject/object of the embedded clause was you. All verbs in the embedded clause were in the past tense. 20 lexically-matched sets as in (15) were created and distributed among 4 counterbalanced lists using a Latin Square procedure. Each list contained 5 tokens of each condition, for a total of 20 experimental sentences, and was pseudo-randomized. Each list also contained 82 fillers (yielding a filler:experimental ratio of slightly over 4:1) consisting of various types of grammatical and ungrammatical wh-questions (with violations involving subject–verb agreement, particle movement, islands, etc.). Fillers were chosen so that grammatical and ungrammatical stimuli in the lists were in roughly equal proportions.
3 Predictions
For the L1 speakers, the two competing analyses that we are examining here both predict that there will be a clear interaction between

Possible result of experiment showing interaction between
For the L2 speakers, the Anti-locality account predicts that they too will show an interaction, since they will be constrained by the same restrictions on movement. The PEW account, however, predicts that there will be no interaction for these speakers, but that there will be a main effect for

Possible outcome of experiment showing main effect of
4 Results
To reduce scale biases among participants, the raw results were transformed to z-scores prior to analysis and the means are presented in Figure 4. These z-scores were submitted to linear mixed-effects models (lme4 package; Bates et al., 2015) in R (R Development Core Team, 2019) and analysed separately for each group, with the z-score as the dependent variable. Fixed effects were the factors

Mean results of experiment in z-scores.
The results of this analysis are presented in Table 2. For the L1 English speakers, there was no main effect for
Linear mixed effects results for acceptability judgments.
5 Discussion
The main question we addressed in this experiment was whether L2 speakers would show significant degradation for subject extraction only in the case of that-clauses or whether this degradation would show up regardless of the presence or absence of that. As we saw earlier, the PEW approach to the that-trace effect predicts the latter, since under this approach, a gap at the beginning of a planning unit poses great difficulty for sentence planning, which is reflected in the degradation of acceptability. L1 English speakers are able to treat embedded clauses without that as part of the same planning unit as the matrix clause, but L2 speakers are plausibly much less able to do this, given their reduced abilities for sentence planning. Under the Anti-locality approach to the that-trace effect, on the other hand, L2 speakers are predicted to behave in essentially the same way as L1 speakers. For both populations, extraction of a subject embedded within a CP structure (headed by that) is impossible, since this would require a kind of local movement that does not occur in natural language. Extraction of the embedded subject when there is no CP structure (i.e. no that) requires only non-local movement, so this is possible.
The results of our experiment clearly match the predictions made by the PEW analysis. In the L1 group, there is an interaction between
Without additional assumptions, our results are not in accord with the predictions made by the Anti-locality analysis. To see this, note that the results for the L1 speakers are completely in line with what Anti-locality predicts. The decline in acceptability for subject extraction with that, which is what drives the interaction here, follows from the idea that this sentence-type cannot be derived through purely non-local movement. This explanation could also account for the decline seen for this condition in the two L2 groups, but it leaves unexplained why subject extraction without that undergoes a similar decline. It is possible, of course, that there is some other factor at play here which is causing this decline. For example, subject extraction without that could be degraded because of the processing difficulty that L2 speakers experience with this structure (see section I.3 above). However, it would then remain a striking coincidence that this processing difficulty results in a decline (for subject extraction without that) of the same size as that caused by the violation of Anti-locality (for subject extraction with that), as evidenced by the lack of an interaction. In the PEW approach, in contrast, no additional mechanism is needed (only the assumption that L2 speakers have difficulty with advance planning, for which there is independent evidence). PEW accounts for both the L1 and the L2 results, and within the L2 results, it accounts for the degradation associated with subject extraction in both the that and no-that cases.
The results for both L2 groups are just as predicted by the PEW accounts, but that is not to say that they are identical. The L1 Korean group shows a main effect for
The results for the L1 Spanish group are also interesting for another reason: in contrast to what we see here with the L1 Spanish speakers, experimental studies of Spanish have consistently shown no asymmetry between subject and object extraction in structures like these (i.e. no main effect of
Finally, it is worth pointing out that in principle, our results could have provided evidence against both the Anti-locality account and the PEW account, but that did not happen. That is, in advance of the facts, one might have expected that the two L2 groups would show no main effect for
III Broader issues in constraints on extraction
The that-trace effect that we have been discussing here brings to mind island effects, which have been discussed more extensively in the L2 literature (for a review, see Belikova and White, 2009). These are different phenomena in important ways, in that an island is a structural domain (typically a clause) in which gaps are very difficult, no matter where that gap is within the domain, while the that-trace effect is about a specific position which cannot be a gap, even though the clause containing that gap otherwise allows gaps freely. Nevertheless, the that-trace effect and island effects both share the property of being constraints on extraction, and from that perspective, our findings here about the that-trace effect in L1 and L2 might initially seem surprising. The reason is that many studies have found that to a very large extent, L2 speakers (and bilinguals in general) show the same or very similar type of sensitivity to islands as L1 speakers (e.g. Aldwayan et al., 2010; Johnson et al., 2016; Kim and Goodall, 2016; Kim et al., 2015; Martohardjono, 1993; Omaki and Schulz, 2011; White and Juffs, 1998), whereas we have found here that there are important differences between L1 and L2 with regard to the that-trace effect.
Why would there be this difference between islands and the that-trace effect? To see this, let us consider the source of each. The source of island effects is of course notoriously contentious, but there is nonetheless broad consensus that island domains are more complex than non-islands, and that this additional complexity impedes extraction. This restriction on extraction occurs either because the grammar is unable to create or interpret the extraction structure (i.e. the dependency) or because the processor is unable to resolve the dependency due to capacity constraints on working memory (see Johnson et al. (2016), Phillips (2013), Michel (2014), Kluender (2021) for relevant discussion). The former are often referred to as ‘grammatical’ approaches to islands, where the area of grammar involved may be syntax, semantics, or information structure, and the latter are referred to as ‘processing’ approaches, where this typically refers to parsing and working memory constraints. Under either type of approach, though, extraction out of an island represents a case that exceeds the ability of the system. That is, having a dependency where the gap is inside an island domain exceeds (or severely strains) either speakers’ grammatical resources or their working memory resources, or perhaps both. Given this perspective, it would now be very surprising if L2 speakers did not show sensitivity to islands, since this would mean that in effect, they are able to perform linguistic feats that native speakers are unable to do. More specifically, it would mean that their grammars carry out operations that L1 grammars cannot and/or that their working memory capacity is somehow more expansive in L2 than in L1. Neither of these ideas seems plausible, so it seems reasonable to take it as the null hypothesis that L1 and L2 speakers will be very similar in their sensitivity to islands. This is not to say that L1/L2 differences in islands are not worth exploring or that intriguing differences have not been found (for relevant discussion of L1/L2 similarities and differences, see Clahsen and Felser, 2006; Cunnings, 2017; Kim, 2015; Kim et al., 2015; Kush and Dahl, 2022; Li, 1998; Perpiñán, 2015; and for discussion of crosslinguistic variation in island phenomena, Goodall, 2022), but it does give us a way of understanding the very general finding in the literature that L1 and L2 speakers are both sensitive to islands in very similar ways.
Against this backdrop, the L1/L2 differences that we have seen in the that-trace effect, and the fact that these contrast with the L1/L2 similarities that one sees in island effects, now make more sense. As with islands, the that-trace effect represents a case where L1 speakers are unable to perform extraction. Under the analysis that we have presented evidence for here, this is because the extraction in question would involve a gap at the beginning of a planning unit, which speakers find very difficult to do (because of the general principle expressed in PEW). L2 speakers are no better than L1 speakers in this regard and are subject to the same general planning pressures, but in addition, their sentence planning is slower and less efficient overall, so they are less able to find a solution to this planning problem. As we have discussed, L1 speakers are able to treat the matrix and embedded clauses together as a single planning unit, thus avoiding the effects of PEW, but L2 speakers, with their reduced planning capacity, have a more difficult time doing this, and as a result, they find embedded subject gaps degraded regardless of the presence or absence of that.
This view of the that-trace effect is reminiscent of attempts that have been made in the literature to find links between island effects and differences in individual working memory capacity. The idea has been that if island effects ultimately arise because of constraints on working memory capacity, then we should find differences across individuals with respect to islands in accord with measurable differences in their working memory capacity. Attempts to find such correlations have generally not been successful, however (Johnson et al. 2016; Michel, 2014; Sprouse et al., 2012). In this article, we have examined an analysis of the that-trace effect that attributes it to problems of sentence planning, and we have explored what happens in a population where we have reasons to believe their planning ability is lessened (i.e. L2 speakers). As we have seen, this population’s behavior with regard to the that-trace effect differs in just the way the analysis would predict. At a very rough level, then, this is the type of correlation that has been sought, but not found, for islands. That is, we have found a correlation between speakers’ ability to have gaps in certain positions and an aspect of their general processing ability (though not the same aspect as in the island studies), and as a result, we have found evidence that processing factors may be playing a relatively large role in a phenomenon that has commonly been thought to be purely grammatical.
IV Conclusions
In this article, we have examined two analyses of the that-trace phenomenon. In one, the degradation associated with extraction of an embedded subject out of a that-clause is due to the type of movement that this configuration would require, which is not allowed in natural language. In the other, the embedded subject gap in this structure occurs at the beginning of a sentence planning unit, which makes planning of the sentence extremely difficult. In the first analysis, we would expect L2 speakers to show the effect as well, under the assumption that their grammars do not have powers beyond those of L1 grammars. In the second analysis, we would reasonably expect L2 speakers to show an expanded effect, under the assumption that their sentence planning abilities are more limited than those of L1 speakers. Specifically, we would expect that they would have difficulty with embedded subject gaps regardless of the presence or absence of that. As we saw in our experiment, the predictions of this second analysis are confirmed: L2 English speakers, regardless of whether their L1 has wh-movement or not, show significant degradation when the gap of a long-distance wh-dependency is in the subject position of an embedded clause, both with that-clauses and clauses without that.
On the one hand, our study is an examination of L2 English. We have taken a phenomenon that at a descriptive level, is well understood for L1 speakers, and we have explored how it manifests itself among L2 speakers. There had been some studies of this already, but none that employed the type of factorial design, rigorous counterbalancing, and large number of participants that are now typical of experimental L1 studies. Our experiment confirmed some of the general findings from these earlier studies, but also uncovered new generalizations that were beyond the scope of the earlier studies.
On the other hand, our study can be seen as not only about L2 per se, but also about the way that findings about L2 can be used as evidence to decide between competing analyses of L1. These analyses were not designed with L2 in mind, but as we have seen, they nonetheless lead to reasonable predictions about what we should find in L2. These predictions can be submitted to empirical scrutiny, which is just what we have attempted to do here. Our hope is that there will be many such cases, where findings from the very rich domain of research on L2 can be fruitfully used as a new kind of evidence to help solve longstanding puzzles about language in general.
Footnotes
Acknowledgements
We are grateful to the audience at the 2021 CUNY Conference on Human Sentence Processing, the members of the Experimental Syntax Lab at UC San Diego, and the editors and anonymous reviewers for their many valuable comments on earlier versions of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
