N-is Focalizers as Semi-fixed Constructions: Modeling Variation across World Englishes

Abstract

N-is constructions combine a variable article and a shell noun such as thing, fact, or problem with copula be. As discourse markers at the left periphery, they focalize information that follows. Using data from a large online newspaper corpus, this study is the first to investigate the variable syntactic integration (bare versus that-clause) of focalizers across a broad range of World Englishes. Variability in syntactic integration reflects the relative recent emergence of this discourse marker. It is also relevant for World Englishes research because it is at the level of semi-idiomatic constructions that nativization in post-colonial varieties is likely to occur. Corpus data show that syntactic integration in N-is focalizers is predicted most strongly by linguistic variables, with regional variety being a much weaker predictor. While no clear-cut regional or variety-type patterns emerge from the data, qualitative analysis reveals some low-frequency patterns as candidates for structural nativization.

Keywords

focalizer constructions syntactic integration constructional variation World Englishes structural nativization

1. Introduction

In English, so-called “shell” or “signaling” nouns like thing, fact, or problem are regularly used in the left periphery of the sentence with a (variable) definite article and copula be. In these semi-fixed N-is constructions, the noun no longer has referential meaning (e.g., Miller & Weinert 1998; Schmid 1998; Flowerdew & Forest 2015). Instead, N-is constructions have developed a pragmatic function: they are used as discourse markers (DMs) to launch utterances and to focalize the following information. In (1), for instance, having medical insurance is not literally a thing but rather a fact, and, in (2), the DM function of the N-is construction can easily be demonstrated by substituting a pragmatic particle like apparently.

(1) “The thing is all our guys have their own medical insurance, which is a really good system, so they’re not going to be a drain on the New Zealand taxpayer.” (NOW, NZ, February 9, 2016)

(2) Word is they need a lift to get to their house. (NOW, NZ, October 18, 2016)

N-is focalizers are only semi-fixed constructions according to a number of criteria. In addition to variable article use (as in 1 versus 2) and different shell nouns, syntactic integration varies between overt subordination with that and bare subordination: compare (3) with (1) and (2). Moreover, N-is focalizers can be separated by punctuation (as in 4) or not (all other examples). Finally, the shell noun allows for optional pre-modification (see 5 and 6, where sadly could replace the N-is construction as a DM).

(3) Word is that Selena Gomez also did a version of the song. (NOW, US, August 10, 2016)

(4) The fact is, they’re allowed on the hill. (NOW, CA, July 13, 2017)

(5) Sad thing is, anyone with a loose interest in cricket would know not to expect … (NOW, AU, December 20, 2011)

(6) The sad truth is that life in the European countryside can be as basic, boring and as downright exhausting as it was a century ago. (NOW, GB, March 20, 2010)

Constructional variability can, on the one hand, be argued to be an aspect of (ongoing) constructionalization, i.e., the change from a formerly free clause to a DM. This variability is also highly relevant to discussions about nativization in World Englishes (WEs) because ongoing constructionalization can be found to intersect with processes of structural nativization.

The aim of the present paper is twofold: namely, to investigate the degree to which different N-is focalizers have developed towards DMs, and to investigate potential patterns of regional variation in this ongoing process. It uses corpus evidence from a large online corpus of English newspapers and probabilistic modeling (section 3) to study variation in degree of syntactic integration for six shell nouns across eleven varieties of English. Section 4 presents the results of the probabilistic modeling. Beyond the statistically significant distributions, the corpus data provide intriguing evidence of low-frequency patterns that are suggestive of regional differences in constructionalization, explored in section 5. The discussion (section 6) focuses, for the most part, on the question of variation across WEs, particularly the lack of clear patterns with respect to either region or variety type (e.g., by first or institutionalized second-language).

2. Background

2.1. Constructionalization and the Emergence of Semi-fixed Form-Meaning Pairings

The present study takes usage-based Construction Grammar as its theoretical framework. Following Goldberg (2013:17), constructions are form-meaning pairings at different levels of abstraction displaying different degrees of idiomaticity, ranging from individual words to abstract patterns (see Table 1). At higher degrees of schematicity, constructions have functional slots that can be partially filled or completely open; the benefactive and transitive constructions are thus highly schematic constructions which are not partially filled.

Table 1.

Constructions on Different Levels of Abstraction and Idiomaticity

Construction (level of abstraction)	Example
Word	calendar, exchange, indeed
Word (partially filled functional slots)	un-V, V-ed
Idiom (filled)	bite the bullet
Idiom (partially filled functional slots)	give/show <someone> the cold shoulder
Idiom (minimally filled)	The Xer the Yer The sooner he leaves, the better
Benefactive construction	She carved him a chuppah
Transitive construction	He opened the door

Constructions thus also include DMs such as indeed or the focalizer constructions that are the focus of the present paper. Both are the result of “constructionalization.” Traugott (1995) traces the development of indeed from a combination of a preposition and a lexical noun (in dede) via its use as an adverbial phrase and sentential adverb to DM function. The study thus outlines the change from a lexical construction to a DM, which fits the definition of constructionalization that Traugott and Trousdale (2013:22) put forward:

Constructionalisation is the creation of form_new-meaning_new (combinations of) signs. It forms new type nodes, which have new syntax or morphology and new coded meaning, in the linguistic network of a population of speakers. It is accompanied by changes in degree of schematicity, productivity, and compositionality. The constructionalization of schemas always results from a succession of micro-steps and is therefore gradual.

Hundt (2022) uses qualitative evidence from the Oxford English Dictionary and Early English Books online to establish the pathway of development for N-is focalizer for a set of five shell nouns, tracing the development from full main clause with overt subordination to a DM at the left periphery. The study shows that constructionalization started with truth is around the middle of the sixteenth century, with other shell nouns (such as fact, thing, and problem) occurring much later. The data show that constructionalization involves the loss of the subordinator, the addition of a premodifier, and the possibility to omit the definite article (in this order). On this rationale, the most constructionalized variants would be those without an article or that, and (in writing) with comma-separation, as illustrated in (5). Once the constructional template is established, novel shell nouns can enter the construction (e.g., problem is, a recent innovation from the 1940s), allowing the full range of variability from the very beginning (Hundt 2022). Hundt’s (2022) data for the nineteenth and twentieth centuries show that punctuation becomes the most important predictor in written data for bare complementation across time (i.e., it marks a further step along the development towards the left periphery), whereas article omission remains a rare (but salient) characteristic of N-is focalizers.

Previous accounts of N-is constructions differ in their views on whether to treat them as transparent syntactic structures consisting of a main and a subordinate clause (see Brenier & Michaelis 2005; Delahunty 2012) or as semi-fixed patterns (e.g., Biber, Johansson, Leech, Conrad & Finegan 1999; Carter & McCarthy 2006; Aijmer 2007; Keizer 2013, 2016). The development of a pragmatic function presents an argument in favor of the latter interpretation. That these constructions have developed into DMs, which are, moreover, predominantly used in the left periphery of the sentence, is also evidenced by the different terms that are used for them: they are referred to as “introductory constructions” (Curzan 2012), “overtures” (Biber, Johansson, Leech, Conrad & Finegan 1999), “projector/projecting constructions” (Günthner 2008; Curzan 2012; Shibasaki 2014, 2015), “scope operators” (Fiehler, Barden, Elstermann & Kraft 2004), “stance complement” constructions (Quirk, Greenbaum, Leech & Svartvik 1985), “utterance launchers” (Schmid 2000), or “focus formulas (with shell nouns)” (Tuggy 1996; Tárnyiková 2018). The term “focalizer” is the one introduced by Aijmer (2007). Keizer (2013:235) even claims that they are “positionally constrained, always appearing in initial position.” Example (7) is a serendipitous find from my own reading, which indicates that they are also possible at the right periphery (see also Traugott 2015).

(7) “We do good working together, truth is.” (Cornwell 2014:12)

N-is focalizers have been studied from various perspectives. They are often examined in the context of spoken syntax (e.g., Schmid 2001; Miller & Weinert 1998; Flowerdew & Forest 2015). In speech, DMs at the left periphery are typically separated from the main clause by a pause or “comma intonation” (Brinton 2008:8; but see Traugott 2015:13). As far as punctuation in writing is concerned, previous research reveals it as variably present with N-is focalizers, both historically (Hundt 2022) and in current usage. It is not a defining criterion in writing, but we can expect N-is focalizers without overt subordination to also favor orthographic separation from the proposition (i.e., maximum separation at the left periphery).

N-is constructions have a variant form with a double copula, which is typical of spoken discourse, as the instance in (8) from the Corpus of Contemporary American English (COCA) shows.

(8) But the point is is that this is a particular environment. (COCA, 2010, SPOK)

This variant has been treated separately in a series of previous papers (Bolinger 1987; McConvell 1988; Tuggy 1996; Massam 1999; Brenier & Michaelis 2005; Curzan 2012) but is too infrequent in writing to be included in the present study (see section 3.2).

Earlier corpus-based research is often restricted to one shell noun (e.g., Delahunty 2012; Shibasaki 2014, 2015; Keizer 2013, 2016). However, the synchronic data used in Hundt and Oppliger (2022) reveal that individual shell nouns vary significantly with respect to the type of proposition (bare versus that-clause) that they focalize. Similarly, article omission varies across individual nouns. It is therefore important to include a range of shell nouns in a study of constructional variation. Aijmer (2007:33) suggests that use of the complementizer that makes the article obligatory. While example (3) shows that N-is focalizers without that are attested with a zero article, we can expect article omission as well as the individual shell noun to be factors in the attested degree of syntactic integration (bare versus that-clause).

2.2. Constructional Variation and Nativization in World Englishes

As English was spread across the world, new varieties developed in countries where it was the majority language but also where it came to be an institutionalized second-language variety. The emergence of new Englishes is characterized by patterns of structural nativization, which can either be distributional, i.e., represent local uses of patterns available in the feature pool of global English (e.g., Hundt 1998; Schneider 2007; Szmrecsanyi, Grafmiller, Heller & Röthlisberger 2016) or constitute innovations, either due to prolonged separation from the input variety or due to language contact in large-scale adult second-language acquisition. The latter may involve various processes, such as transfer from a substrate language (e.g., the kena passive in Singapore English; Bao 2010), analogical extension (e.g., new ditransitives in Indian English; Mukherjee & Hoffmann 2006), or simplification (e.g., article omission or a more general reduction in complexity in the noun phrase in Kenyan and Singaporean English; Brunner 2014). However, language contact may also lead to more overt marking of grammatical categories as a result of what Mesthrie (2017:187) refers to as the “cognitive principle of hyperclarity,” which avoids ambiguity and is the flip-side of the “principle of economy.” Strictly speaking, hyperclarity refers to patterns where Post-Colonial Englishes use redundant marking (e.g., a combination of although and but in the same clause), but Mesthrie (2017:187) also provides an example of a pattern that is simply more transparent than the one used in the target language (i.e., an aspectual adverbial with a verb phrase that marks aspectual meaning). It is in this latter sense of transparency that the term hyperclarity is adopted here (i.e., more in the sense of Rohdenburg’s [1996] notion of “explicitness” than Schneider’s [2012] concept of “redundancy”). With respect to N-is focalizers, speakers of Post-Colonial Englishes might favor dropping the article (and thus be economical) while at the same time preferring to keep the overt subordination (thus avoiding ambiguity). It is important to bear in mind that structural nativization does not necessarily result in statistically significant patterns of usage (see Hundt 2021). Low-frequency idiosyncrasies can nevertheless be highly salient.

As examples (2), (3), and (5) show, articles can be omitted from N-is constructions in varieties of English as a first language. Nonetheless, variability in the determiner slot may likely be affected by language contact, as omission of the definite article is a pervasive feature in many contact varieties of English. Table 2 shows the ratings from The Electronic World Atlas of Varieties of English (eWAVE) for omission of the definite article from contexts where standard English uses an article (e.g., He saw Ø man in the street.). Listed are those post-colonial varieties included in the present study (see section 3) that emerged from minority emigration of English speakers and the two varieties where English is the first language for the majority of speakers in the countries that eWAVE provides information on with respect to this variable.

Table 2.

Feature Ratings From eWAVE on Article Omission

Variety	Feature rating for zero article
Indian English	A
(Colloquial) Singaporean English	A
Philippine English	B
Nigerian English	A
Ghanaian English	B
Kenyan English	A
New Zealand English, Irish English	D

Note: (A = feature is pervasive or obligatory; B = feature is neither pervasive nor extremely rare; D = attested absence of feature).

Importantly, structural nativization – as defined here – may affect both first- and second-language varieties of English. The various processes may affect the different slots in the N-is focalizer construction differently, however, and may intersect with constructional variation as part of ongoing constructionalization.

Based on the findings of previous research presented here, we can make the following hypothesis for ongoing constructionalization and for N-is focalizers more generally:

(H1) N-is focalizers followed by a bare proposition without complementizer that are less integrated with the proposition and will favor punctuation, i.e., be more likely to be clearly marked as belonging to the left periphery (see Brinton 2008; Keizer 2013; Hundt 2022).

Ongoing constructionalization can intersect with the tendencies outlined in section 2.1, leading us to the following additional hypotheses:

(H2) Individual shell nouns will vary significantly with respect to their preference for bare propositions (Hundt & Oppliger 2022); it is likely that this will result in differences across WEs.

(H3) Varieties that have emerged from large-scale adult second-language acquisition will have an overall higher incidence of propositions with a that complementizer following N-is focalizers (principle of hyperclarity/transparency).

(H4) Differences in the propensity of speakers to omit the article in N-is constructions might lead to regionalization (Asian and African varieties versus others or according to matrilects, i.e., BrE-related versus AmE-related).

3. Material and Methodology

3.1. Material

Ideally, the data for a comparative study of N-is focalizers in WEs would come from carefully compiled corpora representing different registers of both spoken and written language use in a broad range of Englishes. Such corpora are available from the International Corpus of English (ICE) project. However, even for thing, a moderately frequent shell noun in focalizer constructions, the British component of ICE yields fewer than fifty instances; less common shell nouns are attested in ICE-GB with a frequency of just over ten (trouble is) or not at all (word is). Indeed, even a mega corpus such as the Global Web-based English Corpus (GloWbE), at 1.9 million words, does not provide enough evidence on some shell nouns for all varieties under investigation here. Another limiting factor of ICE as a source of data for this study is that ICE-US is still lacking the spoken component.

Therefore, evidence for this study comes from the Newspapers On the Web (NOW) corpus. NOW is a web-based monitor corpus of newspaper language, i.e., a corpus that keeps growing. At the time of data extraction (2017-2021),¹ it expanded from around 5 billion words to just under 12 billion words from English news publications around the world, with the earliest texts dating back to 2010. With respect to regional variation, news writing from countries where English is the first language of the majority of speakers are included from both the northern and the southern hemisphere, i.e., the US, Canada, Great Britain, Ireland, Australia, and New Zealand. In addition, data from three Asian (India, Singapore, and the Philippines) and three African (Kenya, Nigeria, and Ghana) varieties are included. The data are thus restricted with respect to register (news writing)² and mode (written) but they provide breadth with respect to the number of shell nouns (see section 3.2) and varieties/regions (see Table 3) included.

Table 3.

Varieties Included in the Present Study

Country label (NOW)	Variety
AU	Australian English (AusE)
CA	Canadian English (CanE)
GB	British English (BrE)
GH	Ghanaian English (GhE)
IE	Irish English (IrE)
IN	Indian English (IndE)
KE	Kenyan English (KenE)
NG	Nigerian English (NigE)
NZ	New Zealand English (NZE)
PH	Philippine English (PhilE)
SG	Singaporean English (SingE)
US	American English (AmE)

With respect to register, the texts in NOW stem from digital news outlets and are not limited to news reports, opinion pieces, and reviews, but also include more diverse types of texts, among them those that are similar to advice columns and letters to the editor (i.e., commentary from readers). Unlike smaller reference corpora, the NOW corpus puts greater emphasis on size than on careful checking of the material that is sampled. It is therefore possible that texts which do not actually originate in the country to which they are assigned in the corpus get included. And, as with other newspaper corpora, it is possible that the materials included were written by people who are speakers of a different variety of English. Great care therefore has to be taken here in manually post-editing and removing instances from material that is problematic (see section 3.2 for details).

3.2. Extracting Data from NOW and Manual Post-editing

Data extraction was based on six shell nouns (fact, problem, thing, truth, trouble, and word) that were attested with a text frequency of at least ten instances in the press section of COCA (see Hundt & Oppliger 2022). The retrieval algorithm made use of the shell noun directly followed by the copula in the present tense. The resulting concordance files were randomized and manually post-edited to exclude false positives.

As indicated above, only the variants with a present tense is were retrieved. Instances with a past tense copula are attested, but they are a lot less frequent than the default option with a present tense form of be. For example, for the thing is/was in the US part of NOW, with an overt article, the ratio of present to past was 2961:260 (retrieval date: June 25, 2018) (see also Tárnyiková 2018:212). Variants with article omission and a past tense copula are even rarer; in the US part of NOW, thing was was attested just eighteen times, nine with premodification and nine without premodification (retrieval date: June 26, 2018). Similarly, NOW yields evidence of the variant with a double copula (as in 9 and 10), but since these were rare (at just over 100 instances in a corpus of almost 10 billion words) and not a variant for all WEs included in the study, they were excluded from the concordances as well.³

(9) I’m only speaking as a mother, I think, often the problem is is that people don’t really understand what those labels are. (NOW, US, August 13, 2020)

(10) The good thing is is that it’s in our hands. (NOW, GB, December 15, 2017)

In order to keep the amount of data to be manually coded for additional predictors within manageable bounds, a total of fifty relevant hits per shell noun and variety were sampled from randomized sets of the shell noun followed by is. For trouble and word, the total numbers for some varieties did not amount to fifty; the total number of hits therefore is 3438 rather than the expected 3600 (see the legend to Figure 1 for total raw frequency per variety). In sampling the material, great care was taken to exclude non-regional material, such as hits from the Tokyo Reporter in the original Singapore concordance. One general strategy in sampling the data was to give preference to regional news outlets (e.g., the Daily Post, a Laos-based Nigerian newspaper) rather than those that cater to a wider audience (e.g., The News Guru, which—while based in Laos—self-advertises as “Africa’s number one news portal”). The two types of online publication are generally distinguished by a domain name which is regional and one that ends in .com. This is not to say that there is no local news copy produced for an outlet such as Nirametrics.com.

The definition of the variable context used to select the target of fifty tokens per shell noun excludes instances with a determiner other than the definite article, such as (11). Also not relevant to the present study are uses where be is not a copula, as in (12); here it is an auxiliary forming a present progressive.

(11) One vital fact is that it’s the fees that separate Holly from many of its peers. (NOW, US, May 24, 2017)

(12) Word is circulating today that Legend of the Seeker has been cancelled after two seasons. (NOW, US, April 26, 2010)

Negation of be is also not possible in focalizer constructions because the noun in these contexts has referential meaning, as illustrated in (13).

(13) The problem isn’t that the title doesn’t describe what’s in the book. (NOW, US, September 11, 2013)

Finally, instances with non-finite clauses, as in (14), or phrases, as in (15), following N-is were also excluded, as they are complements of the verb phrase.

(14) The important thing is to always try and pay attention to what you are doing (NOW, US, July 28, 2016)

(15) Truth is however poles apart. (NOW, IN, April 14, 2016)

The 3438 concordance entries were annotated for the dependent variable syntactic integration (i.e., bare versus that-clause) and a set of predictors, to which I turn in Section 3.3.

3.3. Predictors

Table 4 gives an overview of the predictors and their levels. Punctuation could theoretically be viewed as an aspect of syntactic integration, i.e., the dependent variable. However, seeing that both instances with that-clause and bare variants can be variably punctuated or not (compare 16 and 17 with 18 and 19), punctuation was coded separately as a predictor, with the level punctuation subsuming variants such as comma, colon, semi-colon, as well as dash-separated clauses.

(16) The trouble is, that these elections are just so damned important. (NOW, NZ, October 8, 2016)

(17) Fact is, there are people on the ground who commit themselves to being as invisible as possible […] (NOW, SG, October 28, 2016)

(18) Word is that if the water becomes too warm, bacteria can develop. (NOW, AU, May 26, 2016)

(19) The simple fact is you are going to have a choice. (NOW, US, June 8, 2017)

Table 4.

Predictors and Their Levels

Predictor	Levels
Variety	AU, CA, GB, GH, IE, IN, KE, NG, NZ, PH, SG, US
Shell noun	fact, problem, thing, truth, trouble, word
Article	article, zero
Punctuation	punctuation, none
Modification	modification, no modification
PreSet	none, pragmatic marker, clause, conjunction, phrase
PostSet	yes, no

Modification is coded as binary, with instances of multiple pre-modification, as in (20), subsumed under modification.

(20) But the cold, hard historical fact is that all of these protests pale in comparison to the grave injustice that has been forced on a particular portion of Americans for so long […] (NOW, US, June 23, 2017)

Finally, N-is focalizers can be preceded by other pragmatic markers, a conjunction, a phrase, or another clause. These were subsumed under the variable label PreSet (i.e., material preceding the N-is construction) and are illustrated in (21)-(24). Similarly, elements can occur between the focalizer construction and the main proposition, as in examples of the variable PostSet in (25) and (26).⁴

(21) Still, the fact is that I’m probably remembered better than a lot of goalies with similar stats […] (NOW, CA, October 21, 2014)

(22) Though the state government is making efforts, the truth is that it has been overwhelmed […] (NOW, NG, July 16, 2017)

(23) […] but word is hatcheries and stocking programs are already feeling the crunch. (NOW, US, March 2, 2017)

(24) As for the Giants, word is that some Giants people have spoken behind the scenes about the possibility […] (NOW, US, July 15, 2017)

(25) The truth is that, until recently, we didn’t know. (NOW, US, May 24, 2011)

(26) The trouble is though that only The Australian’s readers will have seen it. (NOW, AU, July 31, 2011)

3.4. Statistical Modeling

The approach used here follows previous studies such as Tagliamonte and Baayen (2012) in combining random forest analyses with conditional inference trees (ctrees). Random forests are a type of permutation testing which does not assume normal distribution of the data but instead builds the distribution by resampling the observed dataset. The algorithm fits many regression trees to random subsets of the data and estimates the importance of predictors on the combined result of these trees, i.e., the random forest. Unlike traditional regression analysis, random forest analysis can cope even with highly correlated predictors and avoids overfitting (Strobl, Malley & Tutz 2009; Tagliamonte & Baayen 2012). Additionally, while empty cells or categorical responses for individual predictors pose problems for traditional regression modeling, they do not pose a problem for random forests. R’s party package (Strobl, Hothorn & Zeileis 2009) was used to fit the random forest model, with the number of trees set to 500 (mtry = 2; the default value, i.e., the root value of the number of predictors, in this case rounded down). Somers2 was used to test for model fit. According to Tagliamonte and Baayen (2012:156), a C-value > 0.8 represents a good model fit.

While random forests provide a robust approach to modeling variable importance of non-normally distributed data, they do not provide easily accessible information on possible interaction among predictors. For this purpose, single conditional inference trees can be used. Like random forests, single ctrees make use of recursive partitioning, predicting outcomes on binary splits of the data (the nodes in the tree) and sorting them into maximally homogenous “bins” (the “leaves”).⁵ Conditional inference trees have been implemented in R’s partykit package (Strobl, Malley & Tutz 2009; available through R Development Core Team [2011]), with the maximum number of splits between the “root” and the “leaves” of the tree set to four and the significance level for the splits to p < 0.5.

4. Results

In the following, the distribution of the variable (syntactic integration) in N-is focalizer constructions is shown by variety, shell noun, article, and punctuation (section 4.1) before presenting the random forest and ctree models for them (section 4.2).⁶

4.1. Summary Statistics

As far as syntactic integration is concerned, there is no obvious grouping into varieties spoken primarily as a first language and those where English is (mostly) acquired as an institutional second-language variety, as seen in Figure 1. Among the varieties with bare integration as the dominant variant, we find the two North American and the two southern hemisphere varieties as well as the two Asian varieties SingE and PhilE. BrE and IrE are the first-language varieties with overt that-integration at just over 50 percent, whereas overt integration is clearly the dominant variant in IndE and the two African Englishes. There is thus no clear regional pattern (e.g., Asian versus African varieties). However, there is a difference between conservative first-language varieties (BrE and IrE) with respect to focalizer use and more advanced first-language ones (AmE, CanE, AusE, NZE).

Figure 1.

Overt Versus Bare Integration Following N-is Focalizers Across Varieties in NOW. (AmE, BrE, IrE, CanE, AusE, NZE, IndE: N = 300; SingE = 277; PhilE = 269; GhE = 227, NigE = 272, KenE = 287)

As seen in Figure 2, of the six shell nouns, trouble has the highest proportion of bare integration (65.9 percent), followed by fact, thing, and truth, with rates of around 50 percent. Focalizers with problem and word favor a more syntactically integrated proposition at a little below 40 percent.

Figure 2.

Overt Versus Bare Integration Following N-is Focalizers Across Shell Nouns in NOW

Finally, with respect to integration and punctuation, focalizers followed by a proposition that is not syntactically integrated are relatively equally divided in terms of punctuation (51.5 percent without it versus 48.5 percent with it), whereas punctuation of syntactically integrated propositions (i.e., with that) is extremely rare, at 0.9 percent overall. The NOW data thus confirm the findings of Hundt (2022) that the more constructionalized variants are more likely to be separated by punctuation. Aijmer (2007:33) claims that articles cannot be omitted if the focalizer is followed by a that-clause. Example (3) shows a counter-example from written language. Such forms occur in NOW as well, where the proportion of article omission with that-clauses is much lower (12 percent) than in focalizers followed by a bare proposition (26 percent).

Finally, it is interesting to consider the combinations that are the most idiomatic and clearly separated from the proposition, i.e., the proportion of bare integration by shell noun combined with zero article (Figure 3) and combined with punctuation (Figure 4). When we look at Figure 3, the question arises whether omitting both article and complementizer is indeed a matter of increased constructionalization. If this were the case, we would expect the shell nouns that are attested most frequently in focalizer constructions (i.e., fact and truth, according to Hundt & Oppliger [2022]) to show the highest proportion of the maximally reduced variant (i.e., omission of both the article and the complementizer that). This is clearly not the case. Instead, we find the two shell nouns with the lowest token frequency in the construction showing the highest proportion of the maximally reduced variant. This suggests that a different factor than (ongoing) constructionalization is likely behind the omission of the article and complementizer. I return to this discussion in Section 5. With respect to separation by punctuation of bare complementation, Figure 4 shows that trouble is again the shell noun with the highest proportion of this highly constructionalized variant. Surprisingly, word shows a low proportion of focalizer use with a bare proposition that is separated by punctuation. Taken together, these results suggest that the predictors article and punctuation play out differently as indicators for the choice of syntactic integration across different shell nouns.

Figure 3.

Proportion of Ø N-is + Bare Proposition Across Shell Nouns in NOW

Figure 4.

Proportion of Punctuation-Separated N-is + Bare Proposition Across Shell Nouns in NOW

4.2. Modeling Syntactic Integration

The outcome of the random forest analysis for syntactic integration (i.e., that-clause versus bare) is given in Figure 5. Testing for model accuracy with Somers2 Dxy returns a prediction accuracy of 0.677 (above the 0.5 threshold) and a C-index value of 0.838, i.e., a good model fit.

Figure 5.

Variable Importance Predicting Syntactic Integration in N-is Focalizers in NOW

Figure 5 shows that punctuation is by far the most important predictor, with variety ranking second but at a much lower level. The effect of shell noun on the choice between bare proposition and integration with a that-clause is even less, as is that of article. PreSet and modification rank even lower, while PostSet is excluded from the model (i.e., it was not selected as a significant predictor).

Figure 6 shows how these predictors interact and in which direction they impact the choice between bare proposition and that-clause. Somers2 for the ctree returns slightly lower values at 0.524 (prediction accuracy) and C = 0.762, which still provide a good model fit. The following comments are limited to those instances where p < 0.001.

Figure 6.

ctree Analysis Showing Factor Interaction for Syntactic Integration in N-is Focalizers in NOW

Figure 6 illustrates that N-is focalizers are particularly likely to occur without a that complementizer if they are separated off by punctuation (node 17). What is difficult to see from the ctree is the detailed interaction of punctuation and variety with syntactic integration. A look at the raw frequencies shows that of the few focalizers that are both separated by a punctuation mark and followed by that, most are from Ghana (8 of 16), with two instances each from Nigeria and Kenya; the remaining four come from Canada, India, New Zealand, and the Philippines. In other words, there may be a small tendency for the African Englishes in the dataset to combine punctuation with complementizer that.

Figure 6 also reveals an interaction between punctuation and variety for focalizers that are not separated by punctuation (node 2). I consider each in turn. First, for the two southern hemisphere Englishes and CanE there is further interaction with the predictor article (node 3) such that zero articles predict the use of a bare proposition (i.e., without complementizer that) more than do those with an article (with the exception of word is). Typical instances from AusE, CanE, and NZE of this tendency are illustrated in (27)-(30).

(27) Fact is owner-occupiers tend to be far more careful when it comes to maintaining the building […] (NOW, AU, November 14, 2013)

(28) Problem is those solutions can be costly and take decades to have a positive impact. (NOW, CA, September 15, 2017)

(29) Thing is just about all of the things people are railing against would actually be fine. (NOW, NZ, February 26, 2017)

(30) Truth is the statistics haven’t changed much in the past ten years […] (NOW, NZ, June 28, 2016)

Second, there is a split between varieties that show a strong tendency towards that-complementizer with no punctuation (GhE, IndE, NigE) and those that show further interaction with other predictors (node 10). Intriguingly, GhE, IndE, and NigE are the varieties that have the overall lowest incidence of bare complementation (Figure 1). For the remaining varieties, there is some interaction (albeit at lower p-values) for noun and article that predicts a slightly higher likelihood of bare complementation, i.e., with fact, thing, trouble, truth, and a zero article (node 15); this is similar to the preference in AusE, CanE, and NZE to prefer bare over that-clause complementation with zero article (node 8) for almost the same set of nouns.

5. Qualitative Analyses

With respect to those instances where the N-is focalizer is separated from the proposition, a closer look at the data reveals some relevant qualitative differences. In (31) and (32), from India and New Zealand, the comma occurs where we would expect it, i.e., between the N-is focalizer and the complementizer.

(31) But the alarming truth is, that the might of the Goa government and police are not able to evict them from our land (NOW Corpus, IN, June 3, 2016)

(32) “The trouble is, that these elections are just so damned important.” (NOW Corpus, NZ, October 10, 2016)

In (33)-(38), from Ghana, Nigeria, and Kenya, however, the complementizer is included in the N-is focalizer and separated from the proposition by the comma, suggesting that it may be part of focalizer constructions in these varieties of English. In other words, reanalysis (as part of the constructionalization process) would have been different in these varieties.

(33) The problem is that, a lot of people build their network and leave the source open […] (NOW, GH, January 29, 2017)

(34) […] but the truth is that, the NDC will win and win big time,” he said. (NOW, GH, October 26, 2016)

(35) But the unfortunate thing is that, this action cannot save him […] (NOW, NG, September 24, 2016)

(36) The fact is that, Christians are not violent today in spite what parts of their holy book commands, […] (NOW, NG, April 22, 2016)

(37) But the worst thing is that, there are limited files format supported for transfers with this application, […] (NOW, KE, June 30, 2017)

(38) […] and the word is that, he should be joining the team probably by end of next week […] (NOW, KE, November 14, 2018)

Since data collection had to be limited to a manageable amount of instances per shell noun, not all instances with punctuation following the complementizer were retrieved in the initial concordances. A more systematic search for similar instances in the NOW corpus yields further evidence that constructionalization in African Englishes has resulted in a variant that includes that as part of the focalizer. In particular, examples from Nigeria and Ghana with a proposition clause that cannot be introduced by that but where the focalizer includes this (former) complementizer, support this view.⁷ Examples are given in (39) to (42).

(39) This notwithstanding, Mr. Dzakpasu reminded that “the important thing is that, let us as a country and stakeholders focus on how best we can get it right […]” (NOW, GH, May 12, 2016)

(40) So, the best thing is that, let me go and try it. (NOW, NG, June 16, 2017)

(41) The next thing is that, where did they come from? (NOW, NG, January 31, 2017)

(42) The real thing is that, make a brand new classification and put all MSMEs in the same basket. (NOW, NG, June 15, 2020)

Interestingly, NigE and GhE are the varieties that, overall, have high overall frequencies of that following N-is focalizers (see Figure 1). IndE has similarly high instances of N-is followed by that (around 30 percent), and a search for instances of punctuation that separate the focalizer after that can also be found in the IndE part of NOW (see 43-45), with (45) being an example where the proposition cannot be introduced by that.

(43) The good thing is that, systemically it doesn’t matter. (NOW, IN, October 7, 2018)

(44) In my opinion, the truth is that, I consider Hollywood and Bollywood completely different industries, […] (NOW, IN, August 18, 2011)

(45) Very surprising thing is that, how can Chavan file the charge sheet once he was transferred […] (NOW, IN, October 31, 2019)

The NOW corpus also provides occasional evidence of the same kind of reanalysis from other regions, including Canada and Great Britain, i.e., countries where English is the first language of the majority of the population, as (46)-(48) illustrate.

(46) The trouble is that, employee discipline issues have also increased. (NOW, PH, November 29, 2018)

(47) I said this is too bad […] but the truth is that, it stopped immediately, it was amazing, and then it became really sunny. (NOW, CA, February 24, 2017)

(48) The fundamental thing is that, it would have been better that they left a long time earlier (NOW, GB, August 8, 2018)

The present study takes its data from newspaper writing. In speech, segmentation of focalizer and proposition might be different in varieties of English as a first language. Indeed, there is an example from the spoken part of the BNC that supports the existence of the N-is that chunk as a variant of the focalizer in BrE; this is shown in (49).

(49) And the tremendous thing is that, Nicodemus may not really have been too sure of what his needs were. (BNC1, KN9 58)

Little is known so far, however, what the precise relationship is between punctuation of focalizers in writing and comma intonation. More specifically, it is difficult to verify whether what linguists have referred to as comma intonation is also reliably transcribed by punctuation in spoken corpora.

Qualitative evidence from mega-corpora like NOW do, however, allow us to verify whether N-is constructions are positionally as constrained as previous research suggests. Keizer (2013:235) claimed that they always occur in initial position, yet examples such as (7) suggest that N-is constructions may also occur at the right periphery. A systematic search for N-is in sentence final position (comma-separated) returned the instances in (50) and (51), where truth is could be similar in function to comment clauses such as I think.

(50) Hate is not a very honorable characteristic, truth is. (NOW, US, December 3, 2019)

(51) J L Austin reminded us, importance is not important, truth is. (NOW, IN, July 13, 2018)

However, these are the only examples in a corpus of a little under 12 billion words, and unlike (7), they lend themselves more easily to a contrastive reading (e.g., while importance is unimportant, truth is not) and would best be separated by a semicolon rather than a comma. In other words, N-is constructions remain firmly rooted in the left periphery in the function of focalizers and do not appear to have spread to the right periphery, yet.

6. Discussion and Conclusion

Research that combines a Construction Grammar approach to syntactic variation and change with a WEs perspective is still relatively rare (cf. Hoffmann 2014, 2020). The present study’s aim was to bring the two strands of research together in order to (i) test whether WEs data would add relevant detail to a case study on ongoing constructionalization and (ii) verify whether patterns of regional variation or nativization could be found. The review of previous research into constructionalization, N-is focalizers, and WEs gave rise to a set of hypotheses that were tested against corpus data from a large, web-based monitor corpus. In this section I discuss the results of the quantitative and qualitative analyses with respect to the hypotheses and consider the contribution that these make to the intersection of constructionalization and WEs research. The present study is able to answer some of the initial questions but also opens up avenues for further research.

With respect to syntactic integration (H1), the data from NOW have shown that punctuation is the most important predictor separating the two variants of integration (bare versus that-clause). Whether punctuation is a cause or symptom of syntactic integration is difficult to tell, however. The predictor resembles frequency as a factor in grammaticalization, which has been said to be both a cause in the development of novel constructions from lexical items and a concomitant of the process (Hilpert 2017). Punctuation could thus be seen as a further indication of constructionalization in N-is constructions. The study also provides evidence that a simple interpretation of variants without article and/or with a bare proposition as more constructionalized is difficult to uphold because it would mean that the shell nouns with lower type frequencies are more constructionalized than the highly frequent shell nouns in the construction.

With respect to constructional variation across WEs, the single ctree (Figure 6) shows that across all varieties, that-clauses are strongly disfavored when punctuation marks are present, thus lending further support to H1. All other predictors turn out to be far less important in the random forest (Figure 5). Moreover, while variety ranks second among the predictors that were selected in the random forest, it does not play out in a way that would have been predictable by existing models of WEs (whether they focus on present status, region or origin). With respect to the integration of the proposition by a that-clause (see Figure 1), we can distinguish very conservative varieties (IndE and the two West-African varieties) from innovative varieties (the North-American, Southern-Hemisphere, and East Asian varieties), with BrE, IrE, and KenE occupying an intermediate position, but there is no grouping according to variety type or region. Similarly, the single ctree does not reveal any groupings of varieties according to either region (e.g., North-American versus Southern Hemisphere, versus Asia, versus Africa), variety type (e.g., first-versus second-language variety) nor according to matrilect (e.g., British-versus US-related varieties). In other words, (H4) is not confirmed. Where variety plays a role, it does so in interaction with other predictors, but general patterns (e.g., according to shell noun) are difficult to discern, i.e., (H2) is not confirmed, either. In particular, the prediction that language contact would play a role (either in the form of hyperclarity or article omission) is not borne out by the data from NOW (H3). Unlike Hoffmann’s (2014, 2020) research, the present case study does not provide evidence of a straightforward correlation between constructional variation and the developmental stage at which the different WEs are in Schneider’s (2007) dynamic model on the evolution of WEs, or indeed any other model of WEs. The reasons why statistical modeling and conceptual modeling of WEs research are difficult to map onto each other are discussed at great length in Hundt (2021).

While the core grammar of N-is focalizers is largely shared across the WEs investigated here, the qualitative analysis of the data turns out to be interesting from the point of view of nativization. A closer look at the data reveals an intriguing difference in the use of punctuation with N-is focalizers followed by that, i.e., evidence of punctuated focalizers with that. This candidate for structural nativization emerged from further qualitative analysis rather than directly from the ctree. From a Construction Grammar perspective, moreover, punctuated focalizers with that are relevant because they do not occur at the more concrete level of the constructional hierarchy, i.e., with the different lexical fillers that enter a constructional slot, but at a more abstract level that reanalyzes the former subordinating conjunction as being part of focalizer constructions. If that in these variants of the N-is focalizer receives stress in speech, it is likely to have been reanalyzed as a demonstrative pronoun, which would tally well with the pragmatic function of the N-is constructions as utterance launchers at the left periphery of the sentence. In fact, it would make the focalizer function even clearer in these varieties and would thus lend itself to an explanation in terms of hyperclarity/transparency which, in turn, would be a factor in constructionalization in the second-language varieties. In other words, while (H3) is not supported by the quantitative analysis it does find support at the qualitative level of analysis.

Further research is needed to substantiate the hypothesis that this variant is particularly frequent in African varieties, and whether there might be a difference in constructionalization of N-is focalizers between West- and East-African varieties: only the former yield a number of examples where the proposition takes the form of a clause that could not be introduced by that, i.e., where reanalysis of that as part of the N-is focalizer must have occurred. This would mean that in the two West-African varieties, a more explicit variant of the N-is focalizer construction is emerging. Future research could verify the hypothesis that the variant including that in the construction is more frequent/entrenched in African varieties by devising an online questionnaire making use of forced choice tasks involving punctuation. Such an approach could more directly target the aspect that emerged as interesting from this study for both a Construction Grammar and a WEs perspective. What this case study has already shown is that it is a fruitful endeavor to bring evidence from a broad range of WEs to bear on studies of constructional variation and (ongoing) constructional change. The NOW corpus, though not an ideal source of data in terms of representativeness, has proven a useful tool to identify a candidate for nativization in second-language varieties of English. Finally, in order to fully understand the connection between what is referred to in the literature as comma intonation and constructionalization in N-is focalizers, it will be necessary to return to spoken evidence and/or combine corpus evidence with data from controlled experimentation (Hundt & Dellwo forthcoming).

Footnotes

Acknowledgments

The author gratefully acknowledges help with the initial extraction of the data set provided by Carlos Hartmann. The audience at the ISLE 5 conference in London, where an initial draft was presented, the anonymous reviewers, and the editors of the journal provided very helpful and constructive criticism, which greatly improved the original paper.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Marianne Hundt

Notes

Author Biography

Marianne Hundt is Professor of English Linguistics at Zürich University. Her corpus-based research covers grammatical change in contemporary and late Modern English as a first and second language (New Zealand, British, and American English; English in Fiji and South Asia) and language in the Indian Diaspora.

References

The British National Corpus, version 2 (BNC World) . 2001. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk.

Davies

Mark

. (2008). The Corpus of Contemporary American English (COCA). Available online at https://www.english-corpora.org/coca/(last accessed 5 February, 2021).

Davies

Mark

. (2013). Corpus of Global Web-Based English. Available online at https://www.english-corpora.org/glowbe/.

Davies

Mark

. (2016). Corpus of News on the Web (NOW). Available online at https://www.english-corpora.org/now/(last accessed 5 February, 2021).

ICE – International Corpus of English (http://www.ice-corpora.uzh.ch)

R Development Core Team . 2011. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. www.r-project.org (October 2016).

Aijmer

Karin

. 2007. The interface between discourse and grammar: The fact is that. In Celle

Agnès

Huart

Ruth

(eds.), Connectives as discourse landmarks, 31-46. Amsterdam: John Benjamins.

Bao

Zhiming

. 2010. A usage-based approach to substratum transfer: The case of four unproductive features in Singapore English. Language 86(4). 792-820.

Biber

Douglas

Stig

Johansson

Geoffrey

Leech

Susan

Conrad

Edward

Finegan

. 1999. The Longman grammar of spoken and written English. Harlow: Pearson.

10.

Bolinger

Dwight

. 1987. The remarkable double IS. English Today 3(1). 39-40.

11.

Brenier

Jason

Laura

A. Michaelis

. 2005. Optimization via syntactic amalgam: Syntax-prosody mismatch and copula doubling. Corpus Linguistics and Linguistic Theory 1(1). 45-88.

12.

Brinton

Laurel

. 2008. The comment clause in English: Syntactic origins and pragmatic development. Cambridge: Cambridge University Press.

13.

Brunner

Thomas

. 2014. Structural nativization, typology and complexity: Noun phrase structures in British, Kenyan and Singaporean English. English Language and Linguistics 18(1). 23-48.

14.

Carter

Ronald

Michael

McCarthy

. 2006. Cambridge grammar of English: A comprehensive guide. Cambridge: Cambridge University Press.

15.

Cornwell

Patricia

. 2014. Flesh and blood. London: Harper Collins.

16.

Curzan

Anne

. 2012. Revisiting the reduplicative copula with corpus-based evidence. In Nevalainen

Terttu

Closs Traugott

Elizabeth

(eds.), The Oxford handbook of the history of English, 211-221. Oxford: Oxford University Press.

17.

Delahunty

Gerald P

. 2012. An analysis of The thing is that S sentences. Pragmatics 21(1). 41-78.

18.

The Electronic World Atlas of Varieties of English . 2020. eWAVE. Kortmann

Bernd

Lunkenheimer

Kerstin

Ehret

Katharina

(eds.). http://ewave-atlas.org (13 July, 2021).

19.

Fiehler

Reinhard

Birgit

Barden

Mechthild

Elstermann

Barbara

Kraft

. 2004. Eigenschaften gesprochener Sprache. Tubingen: Narr.

20.

Flowerdew

John

Richard

Forest

. 2015. Signalling nouns in English: A corpus-based discourse approach Cambridge: Cambridge University Press.

21.

Goldberg

Adele E

. 2013. Constructionist approaches to language. In Hoffmann

Thomas

Trousdale

Graeme

(eds.), Handbook of construction grammar, 15-31. Oxford: Oxford University Press.

22.

Günthner

Susanne

. 2008. “die Sache ist…”: Eine Projektor-Konstruktion im gesprochenen Deutsch. Zeitschrift für Sprachwissenschaft 27(1). 39-71.

23.

Hilpert

Martin

. 2017. Frequencies in diachronic corpora and knowledge of language. In Hundt

Marianne

Mollin

Sandra

Pfenninger

Simone

(eds.), The changing English language: Psycholinguistic perspectives, 49-68. Cambridge: Cambridge University Press.

24.

Hoffmann

Thomas

. 2014. The cognitive evolution of Englishes: The role of constructions in the dynamic model. In Buschfeld

Sarah

Hoffmann

Thomas

Huber

Magnus

Kautzsch

Alexander

(eds.), The evolution of Englishes: The dynamic model and beyond, 160-180. Amsterdam: John Benjamins.

25.

Hoffmann

Thomas

. 2020. Marginal argument structure constructions: The [V the N_taboo-word out of NP]-construction in post-colonial Englishes. Linguistic Vanguard 6(1). 1-8.

26.

Hundt

Marianne

. 1998. New Zealand English grammar – Fact or fiction? A corpus-based study in morphosyntactic variation. Amsterdam: John Benjamins.

27.

Hundt

Marianne

. 2021. On models and modeling. World Englishes 40(3). 298-317.

28.

Hundt

Marianne

. 2022. Constructional variation and change in N-is focaliser constructions. In Sommerer

Lotte

Keizer

Evelien

(eds.), English noun phrases from a functional-cognitive perspective: Current issues, 206-233. Amsterdam: John Benjamins.

29.

Hundt

Marianne

Volker

Dellwo

. Forthcoming. (The) thing is, perception of prosodic punctuation in N-is focalisers is variable.

30.

Hundt

Marianne

Rahel

Oppliger

. 2022. (The) fact is … /(Die) Tatsache ist … Focaliser constructions in English and German are similar but subject to different constraints. International Journal of Corpus Linguistics.

31.

Hundt

Marianne

Paula

Rautionaho

Carolin

Strobl

. 2020. Progressive or simple? A corpus-based study of aspect in World Englishes. Corpora 15(1). 77–106.

32.

Keizer

Evelien

. 2013. The X is (is) construction: An FDG account. In Lachlan Mackenzie

Olbertz

Hella

(eds.), Casebook in functional discourse grammar, 213-248. Amsterdam: John Benjamins.

33.

Keizer

Evelien

. 2016. The (the) fact is (that) construction in English and Dutch. In Kaltenböck

Gunter

Keizer

Evelien

Lohmann

Arne

(eds.), Outside the clause: Form and function of extra-clausal constituents, 59-96. Amsterdam: John Benjamins.

34.

Massam

Diane

. 1999. Thing is constructions: The thing is, is what’s the right analysis? English Language and Linguistics 3(2). 335-352.

35.

McConvell

Patrick

. 1988. To be or double be? Current changes in the English copula. Australian Journal of Linguistics 8(2). 287-305.

36.

Mesthrie

Rajend

. 2017. World Englishes and language contact. In Filppula

Markku

Klemola

Juhani

Sharma

Devyani

(eds.), The Oxford handbook of World Englishes, 175-193. Oxford: Oxford University Press.

37.

Miller

Jim

Regina

Weinert

. 1998. Spontaneous spoken language: Syntax and discourse. Oxford: Clarendon Press.

38.

Mukherjee

Joybrato

Sebastian

Hoffmann

. 2006. Describing verb-complementation profiles of New Englishes: A pilot study of Indian English. English World-Wide 27(2). 147-173.

39.

Philipp

Michel

Achim

Zeileis

Carolin

Strobl

. 2016. A toolkit for stability assessment of tree-based learners. In Colubi

Ana

Blanco

Angela

Gatu

Cristian

(eds.), Proceedings of COMPSTAT 2016 – 22nd International Conference on Computational Statistics, 315-325. The International Statistical Institute/International Association for Statistical Computing. Available at https://www.zeileis.org/papers/Philipp+Zeileis+Strobl-2016.pdf (20 January, 2022).

40.

Quirk

Randolph

Greenbaum

Sidney

Leech

Geoffrey

Svartvik

Jan

. 1985. A comprehensive grammar of the English language. London: Longman.

41.

Rohdenburg

Günter

. 1996. Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics 7(2). 149-182.

42.

Schmid

Hans-Jörg

. 1998. Constant and ephemeral hypostatization: Thing, problem and other “shell nouns.” In Caron

Bernard

(ed.), Proceedings of the 16th international congress of linguists. Paris: Elsevier.

43.

Schmid

Hans-Jörg

. 2000. English abstract nouns as conceptual shells. From corpus to cognition. Berlin: Mouton de Gruyter.

44.

Schmid

Hans-Jörg

. 2001. Presupposition can be a bluff: How abstract nouns can be used as presupposition triggers. Journal of Pragmatics 33(10). 1529-1552.

45.

Schneider

Edgar W

. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press.

46.

Schneider

Edgar W

. 2012. Exploring the interface between World Englishes and second language acquisition – and implications for English as a lingua franca. Journal of English as a Lingua Franca 1(1). 57-91.

47.

Shibasaki

Reijirou

. 2014. On the development of The point is and related issues in American English. English Linguistics 31(1). 79-113.

48.

Shibasaki

Reijirou

. 2015. On the grammaticalization of the thing is and related issues in the history of American English. In Adams

Michael

Brinton

Laurel

Fulk

Robert D.

(eds.), Studies in the history of the English language. Evidence and method in histories of English, 99-121. Berlin: Mouton de Gruyter.

49.

Strobl

Carolin

Torsten

Hothorn

Achim

Zeileis

. 2009. Party on! A new, conditional variable-importance measure for random forests available in the party package. The R Journal 1(2). 14-17.

50.

Strobl

Carolin

James

Malley

Gerhard

Tutz

. 2009. An introduction to recursive partitioning: Rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods 14(4). 323-348.

51.

Szmrecsanyi

Benedikt

Jason

Grafmiller

Benedikt

Heller

Melanie

Röthlisberger

. 2016. Around the world in three alternations: Modeling syntactic variation in varieties of English. English World-Wide 37(2). 109-137.

52.

Tagliamonte

Sali A.

Baayen

R. Harald

. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135-178.

53.

Tárnyiková

Jarmila

. 2018. Constructions with shell nouns in English: Their dual role in information packaging. Caplletra 64. 205-225.

54.

Traugott

Elizabeth Closs

. 1995. The role of the development of discourse markers in a theory of grammaticalization. In Paper presented at ICHL XII. Manchester: England. Available at https://web.stanford.edu/∼traugott/papers/discourse.ps.

55.

Traugott

Elizabeth Closs

. 2015. Investigating “periphery” from a functionalist perspective. Linguistic Vanguard 1(1). 119-130.

56.

Traugott

Elizabeth C.

Trousdale

Graeme

. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.

57.

Tuggy

David H

. 1996. The thing is is that people talk that way. The question is is why? In Casad

Eugene H.

(ed.), Cognitive linguistics in the Redwoods: The expansion of a new paradigm in linguistics, 713-752. Berlin: Mouton de Gruyter.