Coordinating and Subordinating Conjunctions in Five Genres Across the History of English

Abstract

This article analyzes the frequency of three clause-level coordinating conjunctions, and, but, and or, in addition to a wide range of subordinating conjunctions across the Transhistorical Corpus of Written English (TCWE). This resource includes five genres covering seven centuries: sermons and statutes dating from the fifteenth to twenty first centuries, personal letters dating from the fifteenth to twentieth centuries, and emails and WhatsApp messages dating from the twenty-first century. The study engages with three key developments in the history of English: the increase in the inventory of subordinators between 1500 and 1700, the development of expository and non-expository registers in English between 1650 and 1990, and the increase in colloquialization moving into the Present-Day English period. It finds that there is more substantial genre-conditioned variation in relation to all three developments than has previously been acknowledged. Its results also contribute to knowledge about shifts in the generic conventions of English sermons and statutes over time, and language use within email and instant messaging, most notably a preference for causal as over because in the instant messaging sample.

Keywords

history of English coordinating and subordinating conjunctions genre corpus linguistics transhistorical digital language practices

1. Introduction

The connection between coordination and orality on one hand and subordination and literacy on the other has been well attested. In particular, written texts have been found to be associated with “a larger number of subordinate clauses” (Jahandarie 1999:145) than spoken discourse. Jahandarie (1999:145) provides a list of studies which make this observation in relation to Present Day English (henceforth PDE), Japanese, German, Russian, Korean, and Somali. Typically, writing is more structurally complex than typical speech, and subordination can be seen as “an index of structural complexity” (Biber 1988:229).¹ Some major corpus-based studies of contemporary English (e.g., Biber, Johansson, Leech, Conrad & Finegan 2021) also point out that certain coordinators, such as clause-level but, are preferred in spoken conversation, while some subordinators are mostly found in written texts.²

The potential link between coordination and orality on one hand and subordination and literacy on the other can also be made on a diachronic basis. Notopoulos (1949) analyzes “oral” Homeric texts and notes how many connectives they use to create conjunctions at junctures where later, “written” texts would use subordinate clauses. In relation to the more recent history of the English language, previous research has shown that there was an increase in subordination and in the inventory of subordinators in the English language during the Early Modern English (henceforth EModE) period, 1500-1700 (Görlach 1991:95, 122; Kortmann 1997), compared to the Middle English (henceforth ME) period, 1150-1500. This development is often seen as a consequence of “the spread of literacy since Late Middle English” (Kohnen 2007:290).

Moving into the Late Modern English (henceforth LModE) period, 1700-1900 and PDE (1900-), Biber (1995, 2001) and Biber and Finegan (2001) use multidimensional analysis to investigate the evolution of six written registers in English from 1650 to 1990. Smitterberg (2021:161) notes how in Biber (1988), “clausal coordination loads as an involved feature on Dimension 1, ‘Involved vs. Informational Production,’ the most powerful dimension in the analysis and one of three that, taken together, represent an oral vs. literate divide” in Biber and Finegan’s 2001 diachronic factor-score analysis.³ Furthermore, “the factor analysis of academic genres in Biber (2003) identifies one single dimension that separates oral from literate discourse,” and on that dimension, clausal coordination loads “as an ‘oral’ feature” (Smitterberg 2021:161).

Biber (1995, 2001) and Biber and Finegan (2001) define expository registers as primarily informational and explanatory in nature, while non-expository registers are viewed as less informational and more interpersonal. They define legal texts as more expository and sermons and personal letters as more non-expository. They find different paths for expository and non-expository registers over time, arguing that expository registers such as legal and medical prose have followed an essentially steady development toward ever more “literate” styles across the four centuries, which corresponds “to the development of a more specialized readership, more specialized purposes, and a fuller exploitation of the written mode” (Biber 1995:299). By comparison, the non-expository registers such as letters, diaries, fiction, drama, and sermons developed toward more “literate” characterizations from 1650 to 1800, before reversing this trend as they shifted toward more “oral” characteristics from 1800 to 1990. This shift is attributed to “the rise of a popular, middle-class literacy for the first time in British history” (Biber 1995:298) in the eighteenth century, which created a “need for widely accessible written prose” (Biber 1995:299). Moving into the PDE period, an increase in colloquialization, defined by Mair (2024:196) as “a stylistic shift away from a written norm which is elaborated to maximal distance from speech and towards a written norm which is closer to spoken usage” has been observed (see also Hundt & Mair 1999; Leech & Smith 2006; Mair 2006; Hilpert & Gries 2009:387).

Previous studies of conjunctions in the history of English have often looked at the period developments and diachronic register variation outlined above in relation to individual conjunctions during discrete time periods. For example, Culpeper and Kytö (2010:158-183), Smitterberg (2021:160-186), and Kytö and Smitterberg (2022) have all investigated coordination by clause-level and in, respectively, the early modern period, the late modern period, and the mid-twentieth century. Alternatively, they have considered sets of subordinating conjunctions during discrete time periods (e.g., Rissanen 1999; Claridge & Walker 2001).

This study fills a gap in research by providing a broader view. It presents a large-scale quantitative analysis of the frequency of a wide range of coordinating and subordinating conjunctions in five genres from the fifteenth to twenty-first centuries, with the aim of shedding new light on the previously observed period developments and diachronic register variation outlined above. It acknowledges that not all texts fall neatly into the Biberian expository and non-expository register categories outlined above. For example, EModE sermons can sometimes be very expository, the classification of persuasive writing also applies to the sermon genre, and not all (historical and modern) legal texts will necessarily be expository.

However, while we acknowledge there may be exceptions to the overview classification, we believe that it does have general relevance to our data. For example, statutes, the only legal texts we investigate, are primarily informational and explanatory in purpose. Furthermore, the distinction between expository and non-expository registers is a useful one to employ, because it allows us to engage with Biber (1995, 2001) and Biber and Finegan’s (2001) observations regarding the development of expository and non-expository registers in English between 1650 and 1990, in relation to new data.

The current study builds on Kohnen (2007:304), which focuses on coordinators and subordinators in sermons and statutes, by including more genres to widen the “stylistic space,” specifically sermons and statutes from the fifteenth to twenty-first centuries, letters from the fifteenth to twentieth centuries, and emails and instant messaging (hence IM) from the twenty-first century. The paper uses a mixed methods analytical approach, combining quantitative analysis with a qualitative perspective that focuses on register, genre-specific convention and, when appropriate, stylistic considerations (for a discussion of the difference between register, genre, and style, see section 3 below). It argues that there is more substantial genre-conditioned variation in relation to the period developments and diachronic register variation under investigation than has previously been acknowledged. The study also contributes to understanding about the changing generic conventions of English sermons and statutes over time and provides new insights about language use within the IM genre.

The paper is structured as follows. Section 2 outlines the corpus used, section 3 covers data, that is, coordinators and subordinators, and section 4 discusses methodological considerations relating to the quantitative and qualitative analysis. In section 5, the results and discussion section, the historical and digital genres are considered separately. Section 5.1 contains the diachronic analysis of conjunction frequency in the historical genres of sermons, letters, and statutes. It is broken down into 5.1.1, 5.1.2, and 5.1.3. 5.1.1 presents the overall frequencies of subordinators in the three historical genres over time, while 5.1.2 presents overall frequencies of all conjunctions, and of the individual clause-level coordinators and, but, and or, across genre and time. In this section, each genre is considered individually (sermons first, then letters, and then statutes). Section 5.1.3 presents an analysis of causal conjunctions, namely as, by cause/because, by that, for that, for as much as, in that, in as much as, in so much, seeing, and sith/since across genre and time. 5.2 presents a synchronic analysis of coordinators and subordinators in twenty-first century sermons, statutes, email, and WhatsApp messages. It is broken down into 5.2.1, which presents an analysis of overall frequencies of coordinators and subordinators, 5.2.2, which focuses on the frequency of clause-level and, but, and or, and 5.2.3, which focuses on the causal conjunctions present in the data, namely because, as, since, in that, and seeing (that).

2. The Corpus

The Transhistorical Corpus of Written English (henceforth TCWE) compiled by ourselves (Marcus & Maden-Weinberger 2021) is a single corpus that contains textual material dating from the fifteenth century to the present day, specifically 2019.⁴ The current study benefits from this feature of corpus design, because it can utilize one corpus with consistent compilation principles, rather than having to utilize several, as previous studies of conjunctions in the history of English have done (e.g., Rissanen 2005). In relation to register classifications, sermons, letters, and IM are included because they can be classed as non-expository genres in the Biberian sense (e.g., Biber 1995, 2001; Biber & Finegan 2001). Meanwhile, the professional Enron company emails in the TCWE are predominantly used in an informational capacity and can therefore be classified as more expository than sermons, letters, and IM. Statutes, both historically and in the present day, are a specialist, expository register. The sermon samples included are also writing-based but “speech-purposed,” while statutes are both “writing-based and purposed” (Culpeper & Kytö 2010:18).

The TCWE also deliberately includes two twenty-first century digital text types, email and IM, in what is an otherwise historical corpus. These more recent genres are included because it is important to think about the history of the English language, like the history of any language, as running up to, and including, the present day, especially given the massive impact the Internet has had on human communication. Traditionally there has been a conceptualization of computer-mediated communication (CMC) as a mingling of speech and writing, exhibiting characteristics of both the informal, spoken mode and the more formal, written mode. For example, Crystal (2006:52) defines the internet as “a novel medium combining spoken, written, and electronic properties.” Biber and Conrad (2019:301) state that registers such as “email messages, and text messaging are in many respects similar linguistically to conversation, even though they are written.” Furthermore, Soffer (2012) and Cutler, Ahmar, and Bahri (2022) describe writing in online spaces as often exhibiting what they describe as digital orality.

Whilst drawing on the insights of this previous research, our study will attempt to understand the language use contained in the email and IM sub-corpora within the wider context of a complex multimodal communicative environment. In this conceptualization, language use in the digital sphere is not separated from “its materiality and mediation” (Androutsopoulos 2011:9). The location of any language change that goes on within the digital sphere “is not so much in the influence of new media language on other domains of spoken or written usage, but in processes of innovation and change within digital usage” (Androutsopoulos 2011:9) itself.

Furthermore, email and IM are considered on their own terms, with their own register and genre conventions, in the same way that the three historical genres are. Emails from the Enron corpus and IM data from WhatsApp have been selected for this study because they are both predominantly writing-based, as opposed to material from applications such as Snapchat, TikTok, or Instagram, all heavily image-based. They are therefore considered to be good data for this kind of text-based linguistic analysis. Neither genre is speech-purposed in the way that sermons are, for example, in that they are not written to be read out loud.

However, they can also be differentiated. For example, email is more likely to be used in formal contexts than IM. It is also more asynchronous than IM, whereas the latter can function both synchronously and asynchronously, depending on how long it takes a conversation partner to respond to a message. Our choice of these two genres also came down to availability of data. We included email in the TCWE because the Enron email corpus, a large corpus of emails written in English, is freely available online. The IM data was collected by the first author for the purposes of inclusion in the TCWE. We used the whole corpus, totaling 494,955 words for our study. The breakdown of the sub-corpora is shown in Table 1.

Table 1.

Overview of Word Counts and the Number of Individual Texts Within Individual Sub-Corpora in the TCWE

Genre	15th century	16th century	17th century	18th century	19th century	20th century	21st century
Letters	19,809 words (39 letters, 18 writers)	19,295 words (39 letters, 17 writers)	20,864 words (45 letters, 15 writers)	20,034 words (117 letters, 39 writers)	20,248 words (42 letters, 6 writers)	20,417 words (102 letters, 84 writers)
Sermons	20,742 words (3 sermons, 3 writers)	19,992 words (12 sermons, 12 writers)	20,071 words (10 sermons, 10 writers)	20,322 words (10 sermons, 10 writers)	19,533 words (5 sermons, 5 writers)	19,905 words (6 sermons, 6 writers)	20,083 words (from 13 sermons, 5 writers)
Statutes	20,408 words (from 26 statutes)	20,902 words (from 13 statutes)	20,920 words (from 28 statutes)	20,576 words (from 14 statutes)	20,820 words (from 12 statutes)	20,795 words (from 9 statutes)	20,334 (from 5 statutes)
Email	0	0	0	0	0	0	45,179 words (199 emails, 146 writers)
Instant Messages	0	0	0	0	0	0	43,706 words (10 chats, including group chats, 24 writers)

While the TCWE covers a long diachrony, it is also quite small, at just under half a million words, compared to corpora such as the British National Corpus, which contains approximately one hundred million words. Another limitation is that some of the sub-corpora, such as fifteenth-century sermons, have very few texts within them, because we used large samples of textual material from the same documents.⁵ The IMs, letters, and emails are all quite short, so represent the language use of a larger number of writers, but the sermons and statute samples are longer and therefore represent fewer language users. The corpus’s potential lack of representativeness is acknowledged and considered in the discussion of results.

As can be seen from Table 1 above, we looked at approximately 20,000 words for each text type in each century, and those words were extracted from a variety of individual texts. We took extracts from a wide variety of statutes, written laws known as Acts, passed by the British Parliament, from each century, including the Confirmation of Liberties Statute (1405), the Laws in Wales Act (1535), the City of London Militia Act (1662), the White Herring Fisheries Act (1771), the Cornwall Submarine Mines Act (1858), the Perjury Act (1911), Obscene Publications Act (1959), and the Stalking Protection Act (2019). They all use a degree of formulaic language, as one would expect from legislative documents, and the enacting formula, “Be it enacted by the Queen/King’s most Excellent Majesty” is present throughout the centuries. However, certain formulaic phrases, such as “it is ordeyned by the seid auctorite that” are more common in the fifteenth- and sixteenth-century statutes, whilst more modern phrases such as “A person commits an offence if” are often repeated in the twenty-first century statutes. Later statutes also tend to contain more itemized lists, laid out as lists on the page. This feature of their layout potentially reduces opportunities for the use of clausal conjunctions.

The sermon material consists of written texts that function as scripts to be read out loud as orally delivered sermons. Their topics are generally related to religion, scripture, and/or lifestyle, and their purpose is generally persuasive and/or informative. They tend to be fairly lengthy documents, with each sermon attributed to a different writer. The nature of sermon content changes considerably over time, which is likely to be at least partially related to denomination. For example, the late medieval sermons include some Latin, which is likely to be related to their Catholic nature.⁶

The fifteenth- to twentieth-century letters in the TCWE are classed as personal correspondence because they are all from one individual to another and are not therefore public in the way that early royal correspondence could be, for example. The twentieth-century letter sub-corpus consists of business letters from within British Telecom (henceforth BT). Their status as business letters means that they arguably have a slightly more expository register profile than the earlier letters, and this potential difference is taken into account in the interpretation of results. Nevertheless, it should be noted that whilst they are certainly different from the nineteenth-century correspondence sub-corpus in this regard because the latter are a set of letters written to friends and spouses, the fifteenth- to eighteenth-century correspondence is full of business/money-related content, such as reports of mine activity and requests for payment. Furthermore, they, like the twentieth-century letters, are still personal in the sense that they are from one individual to another, and are therefore markedly less expository than statutes, for example.

Letters tend to contain conventional formulae at the beginning and at the end, with less formulaic content in the main body of the letter. There are no twenty-first century letters included in the corpus, because we could not find enough that had already been digitalized online to enable us to include 20,000 words from them, and we did not have time to source a large amount of private handwritten letters and type them up.

The fifteenth- to seventeenth-century letters are taken from a range of collections which represent the literate writing community in late medieval, Tudor and Stuart England. The upper social ranks are better represented than the lower because literate people tended to be of higher social rank during this period. Those of a higher social rank were also more likely to have their letters preserved. Letters from monarchs are not included, but there are a few from nobles such as Robert Dudley (1532-1588). Most letters from this three-hundred-year time period are from members of the gentry, such as those in the fifteenth-century letter collection of the Stonor family; professionals, such as Tudor courtiers Thomas Wolsey and Francis Walsingham; members of the clergy, such as the Rector of Tawstock in the seventeenth-century Cosin letter collection; and merchants. The eighteenth-century letters are all addressed to Richard Orford and come from a range of recipients from the gentry, professional, and merchant classes, with some letters from servants as well (discussed in more detail in section 5.1.3). The nineteenth-century letters are from intellectuals, such as Ernest Downson, Sidney Webb, and Beatrice Potter, and their family and friends. The twentieth-century letters are from the BT corpus. We do not know much about their social status, but they are all working for or connected to a telecommunications company, so in that sense they are likely to be professional and/or middle class. There are letters in the corpus from both women and men in each correspondence sub-corpus.

The email sub-corpus we used is the Enron corpus, which is publicly available and dates from 2000, so was classed as twenty-first century, but is on its cusp. There are some emails addressed to individuals in the sub-corpus, and others addressed to multiple recipients. An individual IM is an online message with no character limitations sent via the Internet, which instantly appears on the recipient’s mobile phone screen. As a genre, IM tends to be used in casual, non-professional contexts. The IM data is all from WhatsApp and was collected by the first author for inclusion in the corpus during 2019. The messages date from 2014 to 2019 and are from a sample of both male and female informants, who range in age from their early twenties to their mid-sixties. The informants come from the North-West and South of England and are predominantly upper working-class, lower middle-class, and upper middle-class individuals. Their messages were included in the corpus after they responded to an incentivized call for WhatsApp contributions to the TCWE.

3. Coordinators and Subordinators

The coordinators and, but, and or, with a “core meaning of addition, contrast and alternative, respectively” (Biber, Johansson, Leech, Conrad & Finegan 2021:81), were investigated because they are the clearest cases of coordinative conjunctions in English (e.g., Quirk, Greenbaum, Leech & Svartvik 1985:920; Biber, Johansson, Leech, Conrad & Finegan 2021:81). We focus exclusively on clause-level cases, omitting all instances of phrase-level and, but, and or. Due to time constraints, we did not attempt to distinguish discourse function of the clause-level coordinators according to sentence position.

The conjunction for is semantically and syntactically complex in the history of English, with some debate about whether it functions as a coordinator or subordinator (e.g., Jucker 1991; Rissanen 1999). Given that it would take time to ascribe coordinative or subordinative function to individual cases, for was excluded completely to keep the analysis within manageable limits. Whilst we acknowledge that we lose part of the picture by not including for as a standalone conjunction (Claridge and Walker 2001 find it to be the most common causal subordinator in their 1640-1740 texts, for example), it was included within the context of two causal expressions, namely for that and for as much as.

The texts in the dataset dating from the fifteenth, sixteenth, and seventeenth centuries contain a high amount of spelling variation, especially the correspondence data. But, for example, is spelt in various ways in the fifteenth-century letters, including bot and butt. In order to control for spelling variation, we put these earlier texts through the spelling regularization program VARD. Spelling regularization programs such as VARD do have limitations. For example, there may have been incidences of VARD changing bot (but) to boat. To mitigate for potentially missing data, we read a random 25,000-word sample of the early data to check that the items had been regularized correctly, and they had been. We can therefore say with confidence that the vast majority of the items searched for are accounted for.

The automatic retrieval of all instances of conjunctions has been carried out with the use of a freeware concordance program, AntConc 4.0.1. (Anthony 2021). Later manual disambiguation was then needed to classify the coordinators as either phrasal or clausal. This process took some time because there were many instances which needed to be differentiated. To take one example, in the sixteenth-century letters, there were 822 cases of and, 454 of which were phrasal and 368 of which were clausal. We included ampersand (&) in the count for clause-level and, but omitted instances of it used in the context of any Latin in the fifteenth- and sixteenth-century sermons.

The search for subordinators in the corpus was based on the lists given in Kortmann (1997:292-294) of those found in ME, EModE, and PDE. The PDE list in Kortmann (1997:294) was then cross-checked with Quirk, Greenbaum, Leech, and Svartvik (1985:998). All of the items in these lists were looked for in the data, but the search was amended depending on which century the data was from. For example, Kortmann lists for as much as, sith (that), and whence in the EModE list but not in the PDE list, so these phrases and standalone items were not looked for in the twentieth- and twenty-first-century material. Seeing as Kortmann (1997:292-294) does not provide a specific LModE list, we looked for both the EModE and PDE subordinators in our LModE (eighteenth- and nineteenth-century) material. Thirty-one ME subordinators, twenty-seven EModE subordinators and five PDE subordinators listed by Kortmann (1997:292-294) were unattested in the TCWE. The seventy which are present, and therefore included in the analysis, are listed below:

after (that); albeit (that); although; an(d) if; as; as far as; as if (that); as long(e) as; as much as; as soon as (that, ever); as though; assuming (that); because; before (that); besides (that); but that; but if (that); by (the time) that; cause (that); (on) condition (that); considering that; coz; cos; except (that); for all (that); for as much as; for fear (that); for that; gra(u)nted (that); given (that); (g)if (that); in as much as; in case (that); in order that; in that; in so much as; lest (that); like as; notwithstanding; now (that); once; only (that); provided (that); rather than; save (that); seeing (that); sin (that); since (that); sith(en) (that); so (that); so far as; so long(e) as; so soon as; such that; thanne (that); that; though (that); til (that); to the intent (that); unless that; until (that); unto (that); when (that); whenever; wher(as) (that); where (that); wher(so)ever; while (as, that); (the) whilst (that); without(e) (that).

As can be seen from the above list, that as a subordinator marking nominal that-clauses was included in addition to adverbial subordinators. Zero complement constructions were excluded from the analysis because they are not considered to be connective links in the same way as other subordinators are by Quirk, Greenbaum, Leech, and Svartvik (1985:1007; see also Kohnen 2007:293). Even though the fifteenth- to seventeenth-century texts were put through VARD, we searched for various spellings and orthographic variants of each word in the texts, such as onless(e) for unless, lyke for like, and yt and þ for that. The items coz, cos, and cause, as variants of because, were looked for in the email, IM, and twenty-first century sermon sub-corpora. The analysis includes subordinators that link both finite and non-finite clauses, specifically participle and infinitive constructions. We included non-finite constructions because we also included coordinators which linked them, and we wanted to maintain similar conditions for assessing the frequency of both coordinators and subordinators. Wh- elements introducing nominal clauses, those linking relative clauses and gerund constructions were omitted from the analysis.

Once the relevant items had been collected using AntConc, manual part-of-speech classification was carried out in order to omit instances of these items not functioning grammatically as subordinators. For example, all prepositional instances of after were identified and omitted. The remaining subordinators were then assigned a semantic clause type using a semantic typology based on Kortmann (1997:137-211), Quirk, Greenbaum, Leech, and Svartvik (1985), and Kohnen (2007). They were classified into the following categories: nominal (complementizer that being the only nominal type), manner/comparison, temporal, conditional, concessive, purpose/result, causal, and other clauses, which include clauses of preference, exception, contrast, place, and degree. Some subordinators only mark one kind of semantic clause and are therefore straightforward to classify. An example is although, which functions exclusively as a marker of concessive clauses, as in example (1) below.⁷

(1) It is amongst other things enacted that noe Mine of Tin Copper Iron or Lead shall hereafter be adjudged reputed or taken to bee a Royal Mine although Gold or Silver may be extracted out of the same (T17_016, 1693, Royal Mines Act 1693 Chapter 6 5 and 6 Will and Mar).⁸

Other subordinators, such as so and as, have several potential semantic functions. So functions as a marker of manner/comparison, and as a marker of purpose/result in the form so + that. The lexeme so is also part of other subordinative phrases—namely so far as, so soon as, so long as—all of which have different semantic functions, so all of these cases were disambiguated. To take another example, as on its own can function temporally (see example 2), causally (see example 3), as a marker of purpose/result (see example 4), or as a marker of manner/comparison (see example 5).

(2) ‘She was silent for a long time as we knelt there before GOD’ (S20_004, 1931, Harry A. Ironside).

(3) ‘Not sure I’ll be able to make tmrw as have yoga class in West london’ (I21_002, 2017-2019 Conversation Partners A and C).

(4) ‘And wheras His Royall Highness William then Prince of Orange now King of Ingland’ . . . ‘did call the Estates of this Kingdome to meet the fourteenth of March last In order to such an Establishment as that their Religion lawes and liberties might not be again in danger of being subverted’ (T17_028, 1689, Claim of Right Act 1689 Chapter 28 (Scottish Parliament)).

(5) ‘I must trust you to understand and allow for the position as I see it’ (L20_005, 1911, Olivier Lodge).

When part of other phrases, as can variously function as a temporal marker (as soon as), conditional marker (as long as), or as a marker of degree (as far as, as much as). In complex cases like these, we cross-checked each other’s classificatory decisions. While it did take some time to clean up, sort, and check the data, it is one of the strengths of the study that so many different items are included.

4. Methodological Approach

The paper combines quantitative and qualitative approaches. For the quantitative analysis, the study employs a text-linguistic approach to the frequency data extracted from the corpus, which “focuses on the rate of occurrence of one feature independently of the use of other features” (Smitterberg 2021:98). The dispersion-aware, non-parametric Wilcoxon Rank Sum test, also known as the Mann-Whitney U test, with a significance level of p < .05, is used for significance testing. We then apply qualitative analysis, that is, exemplification, explanation, and interpretation of any patterns found. We adopt the approach advocated by Biber and Conrad (2019:15), who state that “the same texts can be analysed from register, genre and style perspectives.” One or more of these complementary perspectives are drawn upon, depending on what is relevant to the result being discussed.

A register consists of “three major components: the situational context, the linguistic features, and the functional relationships between the first two components” (Biber & Conrad 2019:6). The situational context pertains to factors such as the purpose of the text, whether it was spoken or written, and its audience. When using register for interpretative purposes, the focus should be on how linguistic features “serve important communicative functions in the register” (Biber & Conrad 2019:16). By comparison, the genre perspective focuses on language features that are “conventionally associated with the genre: they conform to the culturally expected way of constructing texts belonging to the variety” (Biber & Conrad 2019:16). Biber and Conrad (2019:16) give the example of letters closing with “some kind of politeness expression.”

This study focuses on each of the five genres it considers individually, rather than extrapolating interpretations based on how much the written language in each genre does or doesn’t reflect features of spoken language. This decision is based on previous studies, for example Claridge and Walker (2001:54) who, in relation to subordinators marking causal clauses in EModE, find it “more helpful to look at the four genres individually.” The genres under investigation all have distinct profiles. For example, English sermons are rooted in a strong discourse tradition and have links to rhetoric, and sermons and statutes are based on well-established communities of practice. Furthermore, they all have their own distinct histories, although the histories of email and IM are obviously a lot shorter.

There have been calls for more research within English historical corpus linguistics that takes genre-conditioned variation into account (e.g., Walkden 2024). This study therefore attempts, where possible, to relate changes in the frequencies of coordinators and subordinators to shifts in genre conventions over time. Finally, in relation to style, it is worth bearing in mind that “a speaker or author often has attitudes about what constitutes ‘good style’, resulting in the manipulation of language for aesthetic purposes” (Biber & Conrad 2019:18). Therefore, when interpreting results from a stylistic perspective, we focus on language features that “are not directly functional,” but which are “preferred because they are aesthetically valued” (Biber & Conrad 2019:16).

5. Results and Discussion

5.1. Historical Genres

5.1.1. Overall Frequencies of Subordinators in Sermons, Letters, and Statutes Over Time

The only result in Table 2 below that reflects the increase in subordination and in the inventory of subordinators from ME to EModE noted by Görlach (1991:95, 122) and Kortmann (1997) is the increase from 17 subordinators per 1000 words in the fifteenth-century sermon sub-corpus to 24.5 subordinators per 1000 words in the sixteenth-century sermon sub-corpus. Normalized frequencies in the correspondence sub-corpora remain stable, although there is a minor increase from 20.5 per 1000 words in the seventeenth century to 22.4 per 1000 words in the eighteenth century. Meanwhile, normalized frequencies in statutes actually drop from late ME into EModE and LModE, before picking back up again in PDE.⁹

Table 2.

Raw and Normalized Frequencies (per 1000 Words) of Subordinators in Sermons, Letters, and Statutes Over Time¹⁰

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century	21st century
Subordinators in sermons	353 (17)	489 (24.5)	384 (19)	364 (18.1)	395 (20.2)	438 (22.0)	435 (21.7)
Subordinators in letters	409 (20.7)	397 (20.6)	427 (20.5)	449 (22.4)	370 (18.3)	505 (24.7)	n/a
Subordinators in statutes	387 (19.0)	263 (12.6)	265 (12.7)	212 (10.3)	237 (11.4)	407 (19.6)	356 (17.5)

All the sermon and correspondence sub-corpora (with the exception of the fifteenth-century sermon sub-corpora) contain more subordinators per 1000 words than the statute sub-corpora over time. This finding is explained by genre-specific factors. Kohnen (2007) also finds higher overall normalized frequencies of subordinators in seventeenth-century sermons compared to seventeenth-century statutes. He attributes the finding to the high level of non-finite, especially participle constructions in statutes, subordinate clauses which are not normally introduced by subordinators (see Kohnen 2007:301). The only way to get a true picture of these overall frequencies would therefore be to account for all subordinate clauses, including those not marked by subordinators, which is outside the scope of the current study.

5.1.2. Overall Frequencies of All Conjunctions and of Individual Coordinators In sermons, Letters, and Statutes Over Time

Sermons

Scholars such as Gordon (1966) and Blake (1992:533) have noted the additive, “rambling,” “conversational” prose style of Late ME sermons, including many more coordinative than subordinative constructions. Moving into the EModE period, “ornate” and “elaborate” prose styles become popular in the genre (Nevalainen & Raumolin-Brunberg 1993:63, 65-66). Although sermon manuals like Wilkins’ Ecclesiastes (1646) do start advocating for a more “plain” style (see Claridge and Wilson 2002:38; Lutzky 2012:59), “on the whole it seems fair to assume that in the Early Modern English sermons a tendency towards a higher level of literacy and a greater stylistic consciousness prevails” (Kohnen 2007:291). Kohnen (2007:294) finds a decline in coordinators in the sixteenth century sermon material relative to the fifteenth century sub-corpus, a result which reflects the shift in the generic conventions outlined above from an additive style to “more structured and elaborate patterns of presentation in Early Modern English.” Table 3 below shows that there are more coordinators per 1000 words in the sixteenth-century sub-corpus than in the fifteenth-century one, which does not fit with the outlined shift in generic conventions from ME to EModE.¹¹

Table 3.

Raw and Normalized Frequencies of Coordinators and Subordinators Across Sermons Over Time

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century	21st century
Coordinators	405 (19.5)	523 (26.2)	478 (23.8)	473 (23.3)	507 (26)	544 (27.3)	492 (24.5)
Subordinators	353 (17)	489 (24.5)	384 (19)	360 (17.7)	395 (20.2)	438 (22.0)	435 (21.7)

Given the fact that there are only three sermons represented in the fifteenth-century sub-corpus, this finding could be due to the influence of personal style. Nevertheless, what does fit with the move toward more ornate, elaborate style is the increase in the frequency of subordinators from the fifteenth- to sixteenth-century material highlighted in Table 3 above. Furthermore, there is a slight decrease in overall coordinator figures from the sixteenth to seventeenth century, which is maintained into the eighteenth century, which could also be a (slightly delayed) reflection of this generic shift.

It is also interesting to view this overall decline in coordinator figures in the seventeenth and eighteenth centuries from a register perspective, and to break it down to the level of individual coordinator. An example of clause-level and from a sixteenth century sermon is provided below.

(6) ‘Praye deuoutlye and worthylye vnto hym for this gyfte of chastyte, and apply thy selfe thervnto, and god wyll gyue it the.’ (TCWE S16_005, 1538, John Longland).

Table 4 and Figure 1 below demonstrate a decrease in the frequency of clause-level and in the seventeenth- and eighteenth-century sermons compared to the sixteenth-century sermons. There is then an increase in the nineteenth- and twentieth-century texts. Clause-level and is considered to be a better indicator of orality in written texts than clause-level but and or (e.g., Culpeper & Kytö 2010; Smitterberg 2021). This pattern can therefore be said to align with Biber (1995, 2001), Biber and Finegan’s (2001) observation that non-expository registers such as sermons developed toward more “literate” characterizations from 1650 to 1800, before reversing this trend as they shifted toward more ‘oral’ characteristics from 1800 to 1990.

Table 4.

Raw and Normalized Frequencies of Individual Coordinators in the Sermon Sub-Corpora

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century	21st century
And	345 (16.6)	422 (21.1)	329 (16.4)	332 (16.3)	359 (18.2)	408 (20.5)	357 (17.8)
But	58 (2.8)	81 (4.0)	114 (5.7)	91 (4.5)	110 (5.6)	122 (6.1)	93 (4.6)
Or	2 (0.1)	20 (1)	35 (1.7)	50 (3.2)	38 (2.0)	14 (0.7)	42 (2.1)
Total	405 (19.5)	523 (26.2)	478 (23.8)	473 (23.3)	507 (26)	544 (27.3)	492 (24.5)

Figure 1.

Normalized Frequencies of Individual Coordinators in the Sermon Sub-Corpora

The increase in frequency of clause-level and from the eighteenth- to twentieth-century sermons also fits with the general trend toward colloquialization in nineteenth-century English (see Smitterberg 2021:160-180) and into twentieth-century English (see Mair 2006, 2024). Furthermore, it agrees with Kohnen’s (2007:300) finding that there was “an increase in coordinative and” in the late twentieth-century sermons he looked at, relative to those dating from the seventeenth century. The decrease of clause-level and in the twenty-first century sermons relative to those dating to the twentieth century shows that this general pattern of increase does not continue right up to the present day, at least not in our 20,083 word twenty-first-century sub-corpus.

Letters

Using multi-dimensional analysis, Biber (2001:210) finds that although eighteenth-century letters are “notably ‘involved’ along Dimension 1” (which looks at fourteen linguistic features including first and second person pronouns, present tense verbs, possibility modals, that deletion, and be as a main verb), they are “marked by the absence of the stereotypically ‘oral’ features defining Dimension 2,” which assesses frequencies of discourse particles, wh- questions, contractions, and type/token ratio. Therefore, while personal letters can generally be considered a non-expository, popular register compared with, for example, legal or medical prose, eighteenth-century letters “tend to be more of a personal exposition/description with a specific addressee, rather than genuinely interactive dialogue posing questions and expecting responses” (Biber 2001:210). They also avoid “the use of overtly speech-based features, such as contractions and discourse particles” (Biber 2001:210). Biber therefore classes eighteenth-century letters written in English as more expository, informational, and argumentative in purpose than PDE letters, which are more personally involved and interactive. Collectively, the findings of Biber’s 2001 study suggest that:

written registers were generally more sharply distinguished from spoken registers in the eighteenth century than they are at present. Then over time, written registers like letters and diaries evolved to take on characteristics of speech (especially conversation) to a greater extent. These characteristics included a more genuinely interactive style in letters, more overt expression of personal stance and involvement in letters and diaries, and less of an informational purpose in both registers (Biber 2001:212-213).

If we, like Biber (1988, 2003) and Biber and Finegan (2001), classify clausal coordination as an “oral” feature and eighteenth letters as a more expository register than those dating to earlier and later centuries, we might expect there to be a dip in coordinator frequency in the eighteenth century correspondence sub-corpora compared to the seventeenth, and then a rise again from the nineteenth century onwards. However, Table 5 and Figure 2 below do not correspond with this overall trend. Instead, they show a genre-specific pattern of a general decline in the normalized frequencies of clause-level and in the correspondence sub-corpora from the fifteenth to the twentieth centuries (with a minor increase from the sixteenth to the seventeenth).¹² The twentieth century letters fit into this overall pattern of decline, despite the fact that, as business letters, they arguably have a slightly more expository register profile than the earlier letters.

Table 5.

Raw and Normalized Frequencies of Individual Coordinators in the Correspondence Sub-Corpora

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century
And	599 (30.2)	368 (19.1)	425 (20.4)	354 (17.7)	310 (15.3)	214 (10.5)
But	67 (3.4)	71 (3.7)	146 (7.0)	104 (5.2)	135 (6.7)	59 (2.8)
Or	15 (0.8)	9 (0.5)	20 (1)	26 (1.3)	13 (0.6)	9 (0.4)
Total	681 (34.4)	448 (23.2)	591 (28.3)	484 (24.2)	458 (22.6)	282 (13.8)

Figure 2.

Normalized Frequencies of Individual Coordinators in the Letter Sub-Corpora

These results do not align with Biber (1995, 2001) and Biber and Finegan’s (2001) suggestion that expository registers in English drifted toward more oral styles after 1800, unlike the results relating to the frequency of clause-level and over time in the sermon sub-corpora. They fit better with Culpeper and Kytö’s (2010:168) argument, based on their findings about clause-level and in the Corpus of English Dialogues, containing texts dating from 1560 to 1760, that there is a “shrinkage of clause-level AND in written texts.” They attribute this phenomenon to a “conceptual change from the period, defined aurally and rhetorically, to the sentence, defined visually and syntactically” (Culpeper & Kytö 2010:168) around the mid-late seventeenth century. Our correspondence material supports their suggestion. The fifteenth-century letters contain a lot of additive sentence constructions, featuring clause-level and, as in example (7) below:

(7) apon the whiche communicacion as y seide to yow that y wolde, and as ye seide my part was to spake with my lord Chaunceller, &c. and afterward Maister Rogger Kys and y were before my two seid lordis to knowe of a rule and a departyng home, &c. Whas rule and commaundement as y conceved was this, to make and ensele nywe bondis yn to Candelmasse next comyng, and lenger yf the parties wolde at oure comyng home; and yn the mene tyme to entrete at home to shorte the mater to their hondes; and that we myght not accorde therof, they to make an ende, the whiche hath ever be my will and laboure y take God to wytnesse, and yet shall be. (L15_018, 1447, John Shillingford to unknown recipient).

These long additive constructions become less common over time in the correspondence data set. By the eighteenth century, clause-level and is featured in shorter sentences with tighter grammatical construction, as in example (8):

(8) I should have been down by this Time but am arrested by the Gout, it has not however hitherto been very violent and am in hopes it will prove only a slight Fit (L18_071, 1775, RVA Gwillym to Richard Orford).

Although our results suggest a development in the opposite direction to that suggested by Biber (1995, 2001) and Biber and Finegan (2001), it should be pointed out that we are only focusing on one linguistic item, while Biber and Finegan look at many more as part of their multi-dimensional approach. Furthermore, the nineteenth-century sub-corpus only contains the letters of six writers, so it is less representative than the other sub-corpora. A further diachronic study with more nineteenth-century letters’ writers would be needed to corroborate the pattern observed in this study.

Statutes

Biber (1995, 2001) and Biber and Finegan (2001) classify medical prose, science prose, and legal opinions as expository registers. They argue that instead of “evolving towards more oral styles” between 1650 and 1990, these registers “have consistently developed towards more literate styles across all periods: greater integration (D1), greater elaboration of reference (D3), and a more frequent use of passive constructions (D5)” (Biber 1995:297). Statutes, whilst not legal opinions, are legal texts, so can also be categorized as an expository register.

Table 6 below shows that clause-level and, by far the most frequent coordinator in the statute sub-corpora over time, and an indicator of involved as opposed to informational production (see Biber & Conrad 2001:23), shows a consistent drop in normalized frequency from the sixteenth century on, from 14.6 per 1000 words in the sixteenth to 7.2 per 1000 words in the twenty-first century.¹³ This finding therefore provides strong evidence to support Biber and Finegan’s argument regarding the evolutionary path of expository registers in English from 1650 to 1990.

Table 6.

Raw and Normalized Frequencies of Individual Coordinators in the Statute Sub-Corpora

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century	21st century
And	262 (12.8)	305 (14.6)	301 (14.4)	288 (14.0)	255 (12.3)	162 (7.8)	146 (7.2)
But	14 (0.7)	22 (1.1)	14 (0.7)	17 (0.8)	13 (0.6)	11 (0.5)	11 (0.5)
Or	38 (1.9)	57 (2.7)	84 (4.0)	92 (4.5)	84 (4.0)	91 (4.4)	91 (4.5)
Total	314 (15.4)	384 (18.4)	399 (19.1)	397 (19.3)	352 (16.9)	264 (12.7)	248 (12.2)

A diachronic trend unique to the statute sub-corpora is the consistently higher normalized frequency of clause-level or compared to clause-level but over time (see Table 6 above). This result is likely to be due to the expository, informational, writing-based, and purposed nature of statutes as a register. As can be seen in example (9) below, clause-level or is often employed in statutes to cover various alternatives and therefore make sure the law is comprehensively applied in a range of scenarios.

(9) ‘. . . feloniously run away with his or their Shipp or Shipps or any Barge Boate Ordnance Ammunition Goods or Merchandizes or yield them up voluntarily to any Pirate or shall bring any seduceing Messages from any Pirate Enemy or Rebell or consult combine or confederate with or attempt or endeavour to corrupt any Commander Master Officer or Marriner . . .’ (TCWE T17_013, extract from The Piracy Act 1698, Chapter 7, 11, Will 3).

5.1.3. Causal Conjunctions in Sermons, Letters, and Statutes Over Time

Although there is a slight decrease in the eighteenth-century sub-corpus relative to the seventeenth-century one, and again in the nineteenth century relative to the eighteenth, Table 7 below demonstrates an overall increase in the use of conjunctions and expressions marking causal clauses in the sermon material. There is an increase from 0.8 per 1000 words in the fifteenth century sermon sub-corpus to 2.1 per 1000 words in the twenty-first century sermon sub-corpus. This general pattern of increase agrees with what Claridge and Walker (2001:52) find in relation to sermons from 1640 to 1740 and builds on their findings by showing there are also increases from the fifteenth to the sixteenth, sixteenth to the seventeenth, and then again from the nineteenth to the twenty-first centuries.

Table 7.

Total of All Conjunctions and Expressions Marking Causal Clauses in Sermons, Letters, and Statutes Over Time

Total items by genre	15th century	16th century	17th century	18th century	19th century	20th century	21st century
Total items across sermons	16 (0.8)	23 (1.2)	40 (2.0)	28 (1.4)	24 (1.2)	34 (1.7)	42 (2.1)
Total items across letters	13 (0.7)	29 (1.5)	19 (0.9)	83 (4.1)	36 (1.8)	27 (1.2)	n/a
Total items across statutes	11 (0.5)	4 (0.2)	4 (0.2)	0	0	0	0

The picture in relation to clausal conjunctions in letters over time does not present much of a pattern, although like in the sermon sub-corpora, there is a rise in their numbers between the fifteenth- to sixteenth-century letter sub-corpora, with a similar slight dip moving into the seventeenth century. Cases of subordinators marking causal clauses in the statutes are extremely rare in fifteenth- to seventeenth-century material, and then non-existent in statutes from the eighteenth century on. Their scarcity is most likely explained by the fact that statutes are a definitively authoritative genre. If they gave reasons for why various laws are the way they are, it would probably detract from their authority.

Although the numbers of causal subordinators are very rare in the fifteenth to seventeenth centuries, there does appear to be a slight decrease in their presence over these three hundred years, from 0.5 to 0.2 per 1000 words, before they disappear completely. Furthermore, the fact that causal subordinators are missing completely from our eighteenth-century statute sample, and stay absent in later centuries, could be related to a rise in subordinators marking what we classify as “other clauses” in our typology, that is, clauses of preference, exception, contrast, place, and degree, in statutes from the eighteenth to the twenty-first century. Consider an example of unless being used to mark a clause of exception in a statute dating from 1919:

(10) thereupon the council shall sell the holding to the tenant accordingly unless the council obtain the consent of the Board of Agriculture and Fisheries to the requirement of the tenant being refused by the council (T20_003, extract from the Land Settlement (Facilities) Act, 1919).

Subordinators marking these “other clauses” increase from 1.6 per 1000 words in the eighteenth-century statutes, to 2.2 per 1000 words in the nineteenth, to 5.4 per 1000 words in the twentieth, and then back down to 2.4 per 1000 words in the twenty-first-century statute sub-corpora, which is still an increase from the eighteenth century. These increases could be indicative of changes to formulaic convention within the genre, although further study would be needed to establish what these changes might be.

Tables 8 to 10 below demonstrate that there are certain causal conjunctions and expressions that are more common across the three genres and seven centuries, namely because, as, for that, for as much as, and sith(en)/since. Even so, for that and for as much as disappear in all three genres by the eighteenth century. This finding fits with the observation of Rissanen (1999:306) that for that becomes obsolete in English by the end of the seventeenth century.

Table 8.

Raw and Normalized Frequencies of Clausal Conjunctions Marking Causal Clauses in Sermons Across Time¹⁴

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century	21st century
As	1 (0.05)	0	7 (0.4)	10 (0.5)	0	0	4 (0.2)
By cause/because	1 (0.05)	19 (1.0)	20 (1.0)	12 (0.6)	22 (1.1)	34 (1.7)	37 (1.8)
For that	0	2 (0.1)	3 (0.2)	0	0	0	0
For as much as	7 (0.3)	1 (0.05)	1 (0.05)	0	0	0	0
In as much as	2 (0.1)	0	1 (0.05)	0	1 (0.05)	0	0
In so much	1 (0.05)	0	0	0	0	0	0
In that	0	0	1 (0.05)	1 (0.05)	0	0	0
Sith/since	4 (0.2)	1 (0.05)	3 (0.2)	5 (0.3)	0	0	1 (0.05)
Seeing (that)	0	0	4 (0.2)	0	1 (0.05)	0	0
Total items marking causal clauses	16 (0.8)	23 (1.2)	40 (2.0)	28 (1.4)	24 (1.2)	34 (1.7)	42 (2.1)

Table 8 above shows that there is an overall increase in the use of because in the sermon sub-corpora from 0.05 per 1000 words in the fifteenth-century sub-corpus to 1.8 per 1000 words in the twenty-first century material.¹⁵ Because-clauses are more common in sermons than statutes in our corpus from the fifteenth to seventeenth centuries (see Table 9 below for their frequencies in the statute material).¹⁶ This finding agrees with Claridge and Walker (2001:51), who note, when looking at the 1640-1740 period, that “sermons prefer because to for in four of the last six decades, and thus seem to lead the way in the promotion of because” in comparison to religious treatises and trials, which “show a consistent preference for for.”

Table 9.

Raw and Normalized Frequencies of Clausal Conjunctions Marking Causal Clauses in Statutes Across Time

Linguistic item	15th century	16th century	17th century
By cause/because	2 (0.1)	1 (0.05)	1 (0.05)
By that	1 (0.05)	0	0
For that	0	2 (0.1)	0
For as much as	8 (0.4)	0	3 (0.1)
In that	0	1 (0.05)	0
Total items marking causal clauses	11 (0.5)	4 (0.2)	4 (0.2)

Because-clauses are not very common in any of the correspondence sub-corpora, as Table 10 shows.

Table 10.

Raw and Normalized Frequencies of Clausal Conjunctions Marking Causal Clauses in Letters Across Time¹⁷

Linguistic item	15th century	16th century	17th century	18th century	19th century	20th century
As	4 (0.2)	6 (0.3)	9 (0.4)	80 (4)	22 (1.1)	16 (0.8)
By cause/because	3 (0.2)	13 (0.7)	12 (0.6)	2 (0.1)	13 (0.6)	9 (0.4)
For that	3 (0.2)	4 (0.2)	2 (0.1)	0	0	0
For as much as	3 (0.2)	2 (0.1)	0	0	0	0
In so much	0	3 (0.2)	0	0	0	0
In that	0	1 (0.05)	0	0	0	0
Sith/since	0	0	0	1 (0.05)	1 (0.05)	2 (0.1)
Total items marking causal clauses	13 (0.7)	29 (1.5)	19 (0.9)	83 (4.1)	36 (1.8)	27 (1.2)

It is also worth focusing on the conjunction as. Rissanen (1999:307) notes that “few occurrences with a clearly causal meaning” of as were found in the Early Modern English part of the Helsinki Corpus. This observation is reflected by our findings. For instance, in ME and EModE sermons, as is used much more often to mark clauses of manner or comparison. This use of as is rhetorical; it is employed to formulaically signal provenance, such as which book of the Bible the story being relayed comes from (see example (11) below in relation to the fourth Book of the Kings), or which religious figure said or did a certain thing (see example (12) below in relation to Saint Paul).

(11) Many also hidden Iewels & tresours were in that temple born awaye by Nabugodonosor and Nabuzardan his capytayn / as it apereth in the fourth boke of the kynges. (S15_002, 1495, Richard Fitzjames).

(12) ‘Saynte Paule sayth, that the holye gooste hath ordeyned all byshops to fede their flocke, as saynte Peter was bydden do.’ (S16_006, 1539, Cuthbert Tunstall)

Kohnen (2007:297), who also finds as being used in this rhetorical way in the ME and EModE sermons he investigates, argues that “this pattern may even be called a set phrase because it usually employs the same typical collocations (as + name of author + seith, writeth etc).” In our data, the numbers of as being used to mark clauses of manner/comparison steadily reduce from 5.6 per 1000 words in the fifteenth century to 1.4 per 1000 words in the twenty-first century sermon sub-corpus. There is a decrease in as being used as a marker or manner/comparison in the correspondence data set as well, from 5 per 1000 words in the fifteenth-century to 2.1 per 1000 words in twentieth-century letter sub-corpus.

It is possible to see a concomitant increase in as marking causal clauses in sermons and in letters in the TCWE. From the eighteenth century on, as appears to be the preferred causal conjunction in our correspondence data. In particular, Table 10 above shows a sharp increase in instances of causal as from 0.4 per 1000 words in the seventeenth century to 4 per 1000 words in the eighteenth century.¹⁸ An example of causal as from an eighteenth-century letter can be seen below.

(13) ‘the Ketchup is in pint bottels as it will be less liable to Spoil by being long open’ (L18_096, 1790, John Barrow to Richard Orford)

How are these eighty instances of causal as dispersed among the 117 eighteenth-century letters? Fifty-six out of the 117 letters (forty eight percent) in the eighteenth-century sub-corpus contain instances of causal as. All 117 letters are addressed to the same person, Richard Orford, but they are from a variety of senders. Of the thirty-nine writers in the eighteenth-century sub-corpus, twenty-five out of thirty-nine (sixty four percent) use it, although some of these twenty-five writers employ it more than others. The writers who use it in more than one letter are: Shaw Allanson (twelve letters, and in six of these he uses causal as several times); John Amson (ten letters, and in two of these it is used three times); Philip and Martha Barrow (seven letters, three of these being collaborative, then in two letters from Philip and two from Martha); Edward Barker (uses in three letters). There are then four writers (Edward Ackers, Robert Ashley, James Ashsworth, John Bellases) who all use causal as in two letters.

Shaw Allanson, the heaviest user of causal as in the sub-corpus, was probably Richard Orford’s servant. His letters contain information that he is passing on to Orford, and requests about how to proceed with work based on Orford’s orders. He also uses subscriptions such as “from yours at command” (L18_010, 1784) and “from Your Humble Servant” (abbreviations expanded) (L18_008, 1783). John Amson, the second heaviest user, appears to be of a higher social rank, because although he signs “truly your Obliged Servant” (abbreviations expanded; L18_021, 1779) on one occasion, he also signs one letter with “your Obliged Friend and Servant” (abbreviations expanded; L18_022, 1782) and also talks about his “tenant” in L18_027, dating to 1785. Philip and Martha Barrow appear to be merchants, because they write about their fruit-selling shop on several occasions.

Overall, Allanson, Amson, and the Barrows do not appear to be from the same social rank, so it is not possible to suggest that causal as is a feature preferred by individuals from a particular socio-economic group. However, it is possible to say that there are four heavy users of it as a linguistic feature. Therefore, the high frequency of causal as in the eighteenth-century sub-corpus relative to the other six correspondence sub-corpora is likely to be due to the stylistic preferences of these four individual language users, rather than evidence of either a genre-specific feature or a larger, more general trend in the language.¹⁹

5.2. Coordinators and Subordinators in Twenty-First Century Sermons, Statutes, Email, and IM

5.2.1. Overall Frequencies of Coordinators and Subordinators

There are more coordinators than subordinators per 1000 words in the twenty-first century sermon and IM sub-corpora, and more subordinators than coordinators per 1000 words in the twenty-first century email and statute sub-corpora (see Table 11 below). Table 11 shows that there are also more clausal coordinators in sermons and IMs than in emails and statutes, with twenty-first statutes having the lowest normalized frequency of all four genres.²⁰ If we view clausal coordination as an “involved,” “oral” feature (e.g., Biber 2003), this finding is explained by the various register profiles of the genres under consideration. Sermons and IM are both non-expository genres. IM is also a genre that is arguably more likely to contain informal, colloquial written language than email and statutes; and colloquial language, according to Mair’s (2024:196) definition, is more likely to be “closer to spoken usage.” Meanwhile, the professional Enron company emails under investigation are predominantly used in an informational capacity and so could be classified as more expository than both sermons and IM.²¹ Statutes, both historically and in the present day, are a specialist, expository register.

Table 11.

Raw and Normalized Frequencies of Coordinators and Subordinators in Four Twenty-First Century Genres

Linguistic item	Sermons	Instant messages	Email	Statutes
Coordinators	492 (24.5)	891 (20.4)	729 (16)	248 (12.2)
Subordinators	435 (21.7)	765 (17.5)	1028 (22.8)	356 (17.5)

5.2.2. Frequency of Individual Coordinators

Table 12 and Figure 3 below demonstrate that clause-level and is most frequent in the twenty-first-century sermons, followed by emails, and then IM, although the frequencies in email and IM are extremely similar (11 per 1000 words in IM and 11.6 per 1000 words in email). Clause-level and is least frequent in the twenty-first century statutes.²²

Table 12.

Raw and Normalized Frequencies of Clause-Level and, but, and or in Four Twenty-First Century Genres

Linguistic item	Sermons	Email	Instant messages	Statutes
And	357 (17.8)	529 (11.7)	480 (11)	146 (7.2)
But	93 (4.6)	136 (3)	345 (7.9)	11 (0.5)
Or	42 (2.1)	64 (1.4)	66 (1.5)	91 (4.5)
Total	492 (24.5)	729 (16)	891 (20.4)	248 (12.2)

Figure 3.

Normalized Frequency Comparison of Clause-Level and, but, and or in Twenty-First Century Sermons, IMs, Emails, and Statutes

The higher frequency of clause-level and in the sermon data relative to the three other genres could be attributed to the fact that whilst the sermon samples included in the study are writing-based (they are written texts and not transcripts of recorded speech), they are speech-purposed as a genre. Sermons are written to be read out loud, whereas both IM and email are both writing-based and purposed. Along with clause-level and, Biber, Johansson, Leech, Conrad, and Finegan (2021:182–183) find clause-level but to be characteristic of PDE spoken conversation. In contrast, they note that high frequencies of clause-level or are characteristic academic prose (2021:182–183). Clause-level but is most frequent in IM, as in example (14) below:

(14) I didn’t get everything right but could see where I’d gone wrong! [24/09/2018, 14:56:00] (TCWE, I21_009, IM conversation thread 9).

Figure 3 above shows clause-level but occurs most frequently in sermons, then email, and is least frequent in statutes.²³ It could be that there are fewer instances of clause-level but in sermons compared to IM because sermons are presenting an authoritative argument, so may be less likely to present contrastive viewpoints. The clause-level but results are also explained by the IM sample’s status as a more non-expository register than both the email and statute samples included in the study. Finally, clause-level or is more frequent in statutes than in other text types, which again fits their register profile as more specialist and expository.

5.2.3. Frequency of Causal Conjunctions

Table 13 below demonstrates that causal as is more common per 1000 words in the IM sub-corpus than in the email sub-corpus. It is also more common per 1000 words than because in the IM sub-corpus.²⁴

Table 13.

Raw and Normalized Frequencies of Clausal Conjunctions Marking Causal Clauses in Twenty-First-Century Sermons, Emails, and IMs

Linguistic item	Sermons	Email	IMs
Because	37 (1.8)	40 (0.9)	39 (0.9)
As	4 (0.2)	27 (0.6)	73 (1.7)
Since	1 (0.05)	28 (0.6)	0
In that	0	3 (0.07)	0
Seeing (that)	0	0	1 (0.02)
Total	42 (2.1)	98 (2.2)	113 (2.6)

An example of causal as from the IM data is shown below:

(15) ‘Hello! Had the sparkling cider you gave me last night as it felt suitably summery - really tasty!’ (TCWE I21_003, IM conversation thread 3).

In total, the IM sub-corpus contains the messages of twenty-four individual writers. I21_001-I21_008 are all individual chats between two conversation partners, representing eleven writers, I21_010 is a group chat containing the messages of four people, and I21_009 is a group chat involving nine conversation partners. Table 14 below shows how the seventy-three instances of causal as are distributed across the IM sub-corpus.

Table 14.

Raw Numbers of Causal as in Individual Chats in the IM Sub-Corpus

Chat number	Causal as raw freq.
I21_001	2
I21_002	1
I21_003	12
I21_004	1
I21_005	10
I21_006	6
I21_007	6
I21_008	0
I21_009	29
I21_010	6

It shows that, overall, the instances of causal as are fairly evenly dispersed across the sub-corpus, with all the chats except I21_008 containing at least one. Although I21_009 contains forty percent of the total instances of causal as in the IM sub-corpus, it is a group chat containing the messages of nine individuals, thirty eight percent of the total number of twenty-four writers. Therefore, it is unlikely that the high frequency of causal as in I21_009 is explained by the stylistic preference of an individual or two.

Altenberg (1984:41) and Tottie (1986) find causal as to be more common in PDE writing than in PDE speech. If IM is considered a more non-expository, interpersonal register than the email sample included in the study, and one more likely to contain colloquial language, and given Mair’s (2024:196) definition of colloquial language as closer to speech than non-colloquial language, it might be expected that there would be more instances of because than causal as in the IM data, especially as we included shorter variants of because in our count, namely cause, coz, and cos. It might also be expected that the email sub-corpus would contain more instances of causal as than the IM sub-corpus.

However, more recent corpus-based research about PDE by Biber, Johansson, Leech, Conrad, and Finegan (2021:2744) finds the proportion of causal as “consistently higher in BrE (British English) than AmE (American English).” This finding helps to explain our finding about the higher frequency of causal as in the IM sub-corpus relative to our email sub-corpus because the Enron emails are written in American English (Enron being a North American company) and the IMs are exclusively written in British English (henceforth BrE).

Biber, Johansson, Leech, Conrad, and Finegan (2021:2744) also observe that “both BrE conversation and academic prose use as for reason proportionally more than other registers,” the other registers being fiction and news. The fact that causal as is more frequent in BrE conversation than fiction and news supports our classification of IM as a non-expository register more likely to contain colloquial language, because it is exhibiting a similarity to present day BrE conversation.

6. Summary and Conclusion

Only the sermon sub-corpora demonstrate an increase in the normalized number of subordinators from the fifteenth to sixteenth centuries. This result suggests that it is important to take genre-conditioned variation into account when discussing the observed increase in the inventory of subordinators from ME to EModE (see Görlach 1991:95, 122; Kortmann 1997). The paper also engages with the Biberian argument that expository registers such as legal prose followed an essentially steady development toward ever more “literate” styles from 1650 to 1990, whereas the non-expository registers developed toward more “literate” characterizations from 1650 to 1800, before reversing this trend as they shifted toward more “oral” characteristics from 1800 to 1990 (e.g., Biber 1995, 2001; Biber & Finegan 2001). The results relating to frequencies of clause-level coordinators in the statute and sermon material support this argument.

However, the pattern observed in our study in relation to clause-level coordinators in the correspondence material, especially the decline of clause-level and, provides evidence that challenges it. This evidence instead provides support for Culpeper and Kytö’s (2010:168) argument about the development of the modern English written sentence. This study’s results demonstrate how the diachronic register variation present in the TCWE appears to be affected by genre, at least in relation to coordinators and subordinators.

Its results also contribute to our knowledge about shifts in generic conventions over time. The previously observed move toward more ornate, elaborate style in EModE sermons is arguably reflected in the increase in subordinator frequency from the fifteenth- to sixteenth-century sermon material. Meanwhile, the observed increase in causal conjunctions in sermons over time agrees with what Claridge and Walker (2001:52) find in relation to sermons dating from 1640 to 1730. It also builds on their findings by showing there are also increases from the fifteenth to the sixteenth, sixteenth to the seventeenth, and then again from the nineteenth to the twenty-first centuries.

The study finds that subordinators marking clauses of preference, exception, contrast, place, and degree increase from the eighteenth- to twenty-first-century statutes, which could be indicative of changes to formulaic convention within the genre. Finally, our findings in relation to as, from its use as a rhetorical device marking clauses of manner/comparison in late ME and EModE sermons to the concomitant increase in causal as in both sermons and letters, agree with Rissanen (1999), Claridge and Walker (2001), and Kohnen (2007). Our study also demonstrates that stylistic considerations are important, for instance in relation to the high frequency of causal as in the eighteenth-century letters sub-corpus relative to the five other correspondence sub-corpora.

The results relating to conjunction frequency in twenty-first-century sermons, statutes, email, and IM are largely explained by their register profiles. The finding that there is a preference for causal as in the IM sub-corpus relative to because and its variants, and relative to the email sub-corpus, is usefully contextualized with reference to Biber, Johansson, Leech, Conrad, and Finegan’s (2021:2744) findings relating to causal as in their PDE data.

It is essential for corpus studies to consider each genre on its own terms, as an individual entity, with its own conventions and history. This consideration applies to both historical genres like EModE letters, and more modern genres like PDE IM. Although a further study would benefit from using a larger and therefore more representative corpus, our study does gain from its use of a single corpus that includes texts written in English from the fifteenth century to the present day. It also benefits from the mixed method, quantitative and qualitative approach it employs. In particular, it demonstrates that when using qualitative analysis to interpret quantitative results, utilizing the complementary perspectives of register, genre, and style can be a fruitful way to highlight nuance.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article was funded by the British Academy and the Leverhulme Trust.

ORCID iD

Imogen Marcus

Notes

Author Biographies

Imogen Marcus is a Senior Lecturer in English Language at Edge Hill University. She researches variation and change in the English language, present and past, and the relationship between language and technology. She is currently using a lexical sociolinguistic approach to investigate everyday vocabulary in English depositions dating from 1560-1760.

Ursula Maden-Weinberger is an Independent Research Assistant in specialising in corpus linguistics, media representation and discourse analysis. Her recent work has focused on longitudinal corpus-based studies, including portrayals of autism and blindness in the British press, examining language patterns, societal attitudes, and shifts in public narratives over time.

References

Anthony

2021. AntConc [Computer Software]. Version 4.0.1. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software/AntConc (5 December, 2025).

Marcus

Imogen

Maden-Weinberger

Ursula

. 2021. Transhistorical corpus of written English. https://www.sketchengine.eu/transhistorical-corpus-of-written-english/. Also available via WebCorpLearn (https://www.webcorp.org.uk/tcwe).

VARD2 software application. https://ucrel.lancs.ac.uk/vard/about/.

Altenberg

Bengt

. 1984. Causal linking in spoken and written English. Studia Linguistica 38(1). 20-69.

Androutsopoulos

Jannis

. 2011. Language change and digital media: A review of conceptions and evidence. In Coupland

Nikolas

(ed.), Standard languages and language standards in a changing Europe 1, 145-159. Oslo: Novus Press.

Biber

Douglas

. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.

Biber

Douglas

. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.

Biber

Douglas

. 2001. Dimensions of variation among 18th-century speech-based and written registers. In Diller

Hans-Jürgen

Görlach

Manfred

(eds.), Towards a history of English as a history of genres, 89-109. Heidelberg: Universitätsverlag C. Winter.

Biber

Douglas

. 2003. Variation among university spoken and written registers: A new multi-dimensional analysis. In Leistyna

Pepi

Meyer

Charles F.

(eds.), Corpus analysis: Language structure and language use, 47-70. Amsterdam and New York, NY: Rodopi.

10.

Biber

Douglas

Conrad

Susan

. 2001. Multi-dimensional methodology and the dimensions of register variation in English. In Conrad

Susan

Biber

Douglas

(eds.), Variation in English: Multi-dimensional studies, 13-42. Oxon: Taylor and Francis.

11.

Biber

Douglas

Conrad

Susan

. 2019. Register, genre and style. 2nd edn. Cambridge: Cambridge University Press.

12.

Biber

Douglas

Finegan

Edward

. 2001. Diachronic relations among speech-based and written registers in English. In Conrad

Susan

Biber

Douglas

(eds.), Variation in English: Multi-dimensional studies, 66-83. Harlow: Longman.

13.

Biber

Douglas

Gray

Bethany

. 2016. Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.

14.

Biber

Douglas

Johansson

Stig

Leech

Geoffrey

Conrad

Susan

Finegan

Edward

. 2021. Grammar of spoken and written English. Amsterdam and Philadelphia, PA: John Benjamins.

15.

Blake

Norman

. 1992. The literary language. In Blake

Norman

(ed.), The Cambridge history of the English language. Vol. II: 1066–1476, 500-541. Cambridge: Cambridge University Press.

16.

Claridge

Claudia

Walker

Terry

. 2001. Causal clauses in written and speech-related genres in Early Modern English. ICAME Journal 25. 31-63.

17.

Claridge

Claudia

Wilson

Andrew

. 2002. Style evolution in the English sermon. In Fanego

Teresa

Méndez-Naya

Belén

Seoane

Elena

(eds.), Sounds, words, texts and change, 25-44. Amsterdam and Philadelphia, PA: John Benjamins.

18.

Crystal

David

. 2006. Language and the Internet. 2nd edn. Cambridge: Cambridge University Press.

19.

Culpeper

Jonathan

Kytö

Merja

. 2010. Early Modern English dialogues: Spoken interaction as writing. Cambridge: Cambridge University Press.

20.

Cutler

Cecelia

Ahmar

May

Bahri

Soubeika

(eds.). 2022. Digital orality: Vernacular writing in online spaces. Cham: Springer Nature.

21.

Gordon

Ian A.

1966. The movement of English prose. London: Longman.

22.

Görlach

Manfred

. 1991. Introduction to Early Modern English. Cambridge: Cambridge University Press.

23.

Hilpert

Martin

Gries

Stefan

. 2009. Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing 24(4). 385-401.

24.

Hundt

Marianne

Mair

Christian

. 1999. “Agile” and “Uptight” genres: The corpus-based approach to language change in progress. International Journal of Corpus Linguistics 4(2). 221-242.

25.

Jahandarie

Khosrow

. 1999. Spoken and written discourse: A multidisciplinary perspective. Stamford: Ablex.

26.

Jucker

Andreas

. 1991. Between hypotaxis and parataxis: Clauses of reason in Ancrene Wisse. In Kastovsky

Dieter

(ed.), Historical English syntax (Topics in English Linguistics, 2), 203-220. Berlin: Mouton de Gruyter.

27.

Kohnen

Thomas

. 2007. ‘Connective profiles’ in the history of English texts: Aspects of orality and literacy. In Lenker

Ursula

Meurman-Solin

Anneli

(eds.), Connectives in the history of English (Vol. 283), 289-308. Amsterdam and Philadelphia, PA: John Benjamins.

28.

Kortmann

Bernd

. 1997. Adverbial subordination: A typology and history of adverbial subordinators based on European languages. Berlin: Mouton de Gruyter.

29.

Kytö

Merja

Smitterberg

Eric

. 2022. Clausal and phrasal coordination in recent American English. Corpus Linguistics and Linguistic Theory 19(1). 23-46.

30.

Leech

Geoffrey

Smith

Nicholas

. 2006. Recent grammatical change in written English 1961–1992: Some preliminary findings of a comparison of American with British English. In Renouf

Antoinette

Kehoe

Andrew

(eds.), The changing face of corpus linguistics, 185-204. Amsterdam and New York, NY: Rodopi.

31.

Lutzky

Ursula

. 2012. Discourse markers in Early Modern English. Amsterdam and Philadelphia, PA: John Benjamins.

32.

Mair

Christian

. 2006. Twentieth-century English: History, variation and standardization. Cambridge: Cambridge University Press.

33.

Mair

Christian

. 2024. Colloquialisation: Twenty-five years on. Journal of Historical Pragmatics 25(2). 193-214.

34.

Nevalainen

Terttu

Raumolin-Brunberg

Helena

. 1993. Early modern British English. In Matti

Rissanen

Kytö

Merja

Palander-Collin

Minna

(eds.), Early English in the computer age: Explorations through the Helsinki Corpus (Topics in English Linguistics, 11), 53-73. Berlin: Mouton de Gruyter.

35.

Notopoulos

James A.

1949. Parataxis in Homer: A new approach to Homeric literary criticism. Transactions and Proceedings of the American Philological Association 80. 1-23.

36.

Quirk

Randolph

Greenbaum

Sidney

Leech

Geoffrey

Svartvik

Jan

. 1985. A comprehensive grammar of the English language. Essex: Longman Group.

37.

Rissanen

Matti

. 1999. Syntax. In Lass

Roger

(ed.), Cambridge history of the English language. Vol. III: 1476–1776, 187-331. Cambridge: Cambridge University Press.

38.

Rissanen

Matti

. 2005. The development of till and until in English. In Fisiak

Jacek

Kang

Hye-Kyung

(eds.), Recent trends in Medieval English language and literature in honour of Young-Bae Park, 75-92. Seoul: Thaehaksa Publishing Company.

39.

Smitterberg

Eric

. 2021. Syntactic change in late modern English. Cambridge: Cambridge University Press.

40.

Soffer

Oren

. 2012. Liquid language? On the personalization of discourse in the digital era. New Media & Society 14(7). 1092-1110.

41.

Tottie

Gunnel

. 1986. The importance of being adverbial: Adverbials of focusing and contingency in spoken and written English. In Gunnel

Tottie

Ingergerd

Bäcklund

(eds.), English in speech and writing: A symposium, 93-118. Stockholm: Almqvist and Wiksell.

42.

Walkden

George

. 2024. Parataxis and hypotaxis in the history of English. In Caon

Luisella

Gordon

Moragh

Porck

Thijs

(eds.), Keys to the history of English: Diachronic linguistic change, morpho-syntax and lexicography, 10-33. Amsterdam and Philadelphia, PA: John Benjamins.