Sage Journals: Discover world-class research

Abstract

The higher frequency of auxiliary do in poetry than in prose in Middle English (1150-1500) is one of the puzzles of the history of this construction. Previous studies have argued that the role of auxiliary do in poems was to place the infinitive at the end of the verse to make rhyme easier. The aim of this article is to examine to what extent auxiliary do was used for rhyme purposes and, furthermore, to determine whether it had other functions. On the basis of a conditional inference tree and random forests, this paper shows that auxiliary do was indeed used as a metrical tool to place the infinitive at the end of the verse to facilitate rhyme, although the degree to which poets used auxiliary do varied from dialect to dialect. The statistical analysis reveals that the auxiliary construction served also other functions, particularly in the Eastern Midlands and Northern dialects, where do favored the integration of verbs of foreign origin and ensured the metricality of the verse by maintaining a regular distribution of the beats in the line.

Keywords

Middle English poetry conditional inference tree random forests usage-based linguistics

1. Introduction

Several languages inside and outside the Germanic family exhibit a semantically empty form of the verb do (Jäger 2006). In English, this construction, which I will call auxiliary do, developed during the course of the thirteenth century in non-emphatic affirmative sentences and then spread to “NICE” contexts, that is, negation, inversion, code, and emphasis (Huddleston & Pullum 2002), between the sixteenth and eighteenth centuries (Ellegård 1953; Budts 2020). This paper is concerned only with the first stage of the history of auxiliary do, that is, before the spread to the NICE environments, when do is attested with an infinitive verb in affirmative sentences, as illustrated in (1) and (2).¹ In this context, do has grammatical meaning, that is, it expresses tense, but no semantic content, and the whole do + infinitive construction can be considered equivalent to a construction with a finite verb only (Ellegård 1953; Erb 2001).²

(1) Ase ore louerdes wille was þare-aftur it dude bifalle

As our lord will was there-after it did happen

‘As was our lord’s will it happened thereafter’ (PLAEME: South English Legendary: 51.99)

(2) Wyn and ale deden he fete, and maden hem glade and

Wine and ale did he get, and made them glad and

bliþe

happy

‘He got wine and ale and made them glad and happy’ (PCMEP: Havelok: 38.1244.656)

Auxiliary do has been the center of considerable attention among scholars for over a century (e.g., Dietze 1895; Royster 1922; Engblom 1938; Ellegård 1953; Visser 1963-1973; Denison 1985; Stein 1990; 1998; Ecay 2015; Budts 2020; Moretti 2021). An aspect of the history of this construction that has puzzled linguists is that its first attestations occur in poetical compositions of the thirteenth century, while its appearance in prose texts is dated only two centuries later. Since the period in which auxiliary do is first attested corresponds with a substantial gap in the transmission of prose texts between 1250 and 1350, it may be argued that the earlier occurrence of the auxiliary construction in poems is due to the lack of prose data. This possibility, however, seems rather unlikely. If we look at the use of auxiliary do in later (1350-1420) prose texts, for which we have a larger amount of data, it appears that the construction is virtually absent and becomes frequent only towards the end of the fifteenth century (Ellegård 1953:44).

The presence of auxiliary do in Middle English poems raises the question as to what function(s) the construction had in these texts. A possible explanation is provided by Ellegård (1953), who suggested that the presence of auxiliary do in poetry is closely related to the emergence of rhyme, a new versification mode that developed under the influence of French poetry and that characterizes most of the Middle English poetical production (Pearsall 1977). In particular, Ellegård (1953:146) argues that auxiliary do was a metrical device used to place the infinitive in the final position of the verse in order to facilitate the realization of a rhyme. Intuitively, the suggestion proposed by Ellegård (1953) makes sense, as there are some cases in which the infinitive in auxiliary do—infinitive constructions is involved in the rhyme. However, this hypothesis is not supported by quantitative evidence, since Ellegård (1953) did not include in his study any statistics regarding the proportion of the cases in which the infinitive that combines with auxiliary do occurs at the end of the verse and those in which it does not. Furthermore, Ellegård (1953) did not consider that rhyme is not the only feature that defines Middle English poems. In fact, besides the use of rhyme, the Middle English poetical tradition is characterized by other metrical practices like syllable-counting, a precise number of beats and the use of iambs, which became popular in the period that follows the Norman Conquest in 1066 (Pearsall 1977). Thus, a natural question that arises is the following: if poets used auxiliary do to facilitate rhyme, did they also use it to adhere to other versification modes? Ellegård (1953)—and other scholars that addressed the early history of auxiliary do (e.g., Denison 1985; Garrett 1998)—left these issues unsolved.

The present study aims to fill these gaps and uncover the functions of auxiliary do in Middle English poetry. On the one hand, it will test Ellegård’s (1953) claim and assess whether auxiliary do was used as a metrical device to favor end-verse rhyme. On the other hand, it sets out to determine whether the presence of auxiliary do in poetry was influenced by other factors that have not been considered by Ellegård (1953). To this end, I will use two types of multifactorial analyses that have recently become frequent in linguistic studies, namely conditional inference trees and random forests (Tagliamonte & Baayen 2012; Hansen & Schneider 2013; Szmrecsanyi, Grafmiller, Heller & Röthlisberger 2016; Hundt 2018; Tomaschek, Hendrix & Baayen 2018; Fonteyn & Nini 2020). By comparing the distribution of auxiliary do with causative do, which shares the same syntactic structure with auxiliary do, the results of the quantitative investigation provide statistical support for the claim that auxiliary do was used as a metrical device in Middle English poems, although the degree to which poets employed it varied across different dialects. Importantly, the statistical analysis shows that end-verse rhyme was not the only function fulfilled by auxiliary do. I present quantitative evidence that, particularly in the Eastern Midlands and in the Northern dialects, auxiliary do served to facilitate the adoption of verbs of foreign origin and as a metrical filler to ensure the metricality of the verse by allowing the metrical stress to fall on the linguistic stress.

The structure of the paper is as follows. Section 2 reviews previous studies that have addressed the issue of auxiliary do in poetry. Section 3 describes the data collection process and introduces the methodological apparatus of this investigation. Section 4 is concerned with the results of the statistical analysis, which are then discussed in section 5. Section 6 offers some final remarks.

2. Background

The higher frequency of auxiliary do in poems than in prose in Middle English has attracted a good deal of attention among scholars from at least the end of the nineteenth century (e.g., Dietze 1895; Sweet 1898; Engblom 1938; Ellegård 1953; Dahl 1956). These studies are consistent in suggesting that auxiliary do was a “metrical device” that had the purpose of making the rhyme easier. This hypothesis has been clearly formulated, for instance, by Ellegård (1953:208), who noticed that the construction auxiliary do + infinitive is typically found when the infinitive verb is the last element of the verse. The metrical device proposal has become widely accepted among scholars, and it is now commonly argued that the function of auxiliary do in poetry was to place the infinitive at the end of the verse in order to facilitate rhyme (e.g., Ogura 2018). In more recent studies, the functions of do in Middle English poetry have been overlooked, as scholars have tended to focus on the origin of the construction (e.g., Denison 1985; Garrett 1998), or on its spread to the NICE contexts (e.g., Kroch 1989; Budts 2020). The only exceptions are contributions that focus on the use of the auxiliary construction in the works of single poets. Smyser (1967), for instance, analyzes the use of do and another empty auxiliary, gan, in Chaucer. The results of his investigation are in line with previous studies, since Smyser (1967) argues that auxiliary do was largely used as a metrical device to favor end-verse rhyme. However, Smyser (1967) also claims that Chaucer often employed causative constructions involvin do, make, and let to place the infinitive in final position. This means that placing the infinitive in rhyme position was not a function exclusive to auxiliary do but involved other periphrastic constructions as well. This possibility is briefly hinted at by Ellegård (1953), who notes that, like auxiliary do, let + infinitive constructions are more frequent in poetry than in prose. The explanation provided by Ellegård (1953:60) for the presence of these constructions in poetry is that “obviously [. . .] those phrases were very useful for placing an infinitive in rhyme position.” Recent corpus studies, however, have shown that metrical needs are unlikely to influence the use of periphrastic constructions which do not feature an empty auxiliary. In particular, Moretti (2021) analyzes the position of the infinitive in the verse in modal + infinitive constructions in Middle English poems and observes that their presence in final position is not significantly more frequent than in other positions. Similar results are also presented in this paper (section 4), where it is shown that causative do + infinitive is not found more than expected in final position, but its distribution is due to random variation.

The suggestion that auxiliary do was a metrical device used for rhyme purposes, as formulated by Ellegård (1953), has two implications. First, it assumes that Middle English poets might want to avoid certain verbal endings to appear in the final position of the verse, perhaps because they were problematic to rhyme with. This point is further expanded in section 3.4. Second, it implies that infinitive endings were particularly suitable for rhyme. In this regard, it is worth pointing out that the infinitival ending was subject to a considerable amount of variation during the Middle English period. The Old English ending -an went through a process of phonological weakening in Middle English in which the verbal suffix was first reduced to -(e)n and then lost altogether (Lass 2006:80). From a metrical perspective, variation between the -en, -e, and zero endings provided Middle English poets with a greater array of rhyme possibilities, as shown in the examples in (3), (4), and (5).

(3) Ac his armure was so strong

the spere nolde him Afong

‘But his armor was so strong that the spear would not reach him’ (PCMEP: Alisaunder: 45.972.[Part_1].[Chap_6].580)

(4) Hou ihesu crist herowede helle

of harde gates ich wille telle

‘Of strong gates I will tell how Jesus Christ harrowed hell’ (PCMEP: HarrowHell: 2.2.1)

(5) He sholen hire clothen washen and wringen

and to hondes water bringen

‘They should wash and wring her clothes and bring water to her’ (PCMEP: Havelok: 38.1234.649)

The main weakness of previous studies, including Ellegård’s (1953), is that the metrical device hypothesis is not supported by an empirical analysis that shows whether, and to what degree, auxiliary do was used for metrical purposes. In particular, what is lacking are figures concerning the ratio of auxiliary do constructions with and without the infinitive at the end of the verse. Furthermore, as mentioned in section 1, Ellegård (1953) and other previous studies did not take into account other aspects of Middle English poetry that may have influenced the presence of auxiliary do in poems. In this regard, there are some studies that have shown that periphrastic constructions involving an empty auxiliary were used for metrical reasons other than rhyme (e.g., the study of Putter & Stokes [2000] on the functions of gan + infinitive in poetry). Therefore, while it is possible that auxiliary do served to place the infinitive in rhyme position, it cannot be excluded that it was also, and perhaps even more significantly, used for metrical needs that do not involve rhyme.

3. Data and Methodology

3.1. Corpus Description and Collection Process

The corpus data discussed in this paper stems from three syntactically-annotated corpora that contain Middle English poetic data. The first corpus is the Parsed Corpus of Middle English Poetry (PCMEP; Zimmermann 2015), which constitutes an ideal source for Middle English poems that range from 1150 to 1420, with forty-nine texts that account for a total of 215,917 words. PCMEP is syntactically annotated according to the Penn treebank format established for historical English and employed in its sister corpora that include prose texts. Further data has been collected from another recent corpus, A Parsed Linguistic Atlas of Early Middle English, 1250-1325 (PLAEME; Truswell, Alcorn, Donaldson & Wallenberg 2018). From this source the following poems have been included: Cursor Mundi, Genesis and Exodus, Infancy of Christ, The Northern Homily Collection, and South English Legendary (though not included in its entirety, see note 4), for a total of 106,551 words. PLAEME is a valuable tool for the present investigation because it is designed to complement other existing Middle English corpora: it consists of texts that have more than hundred words and are not included in other corpora. The last corpus consulted is the Penn-Parsed Corpus of Middle English, 2nd edition (PPCME2; Kroch & Taylor 2000), which includes an important piece of poetry, the Ormulum (50,579 words). Relevant for the present study is that every corpus examined is annotated for parts-of-speech (POS), and a specific label is dedicated to the lemma do, which facilitated the collection process. These corpora are compatible with the software CorpusSearch (Randall 2000) and every form of do (DO*) complemented by an infinitive (IP-INF and VB) was searched for.

Furthermore, I included three more early Middle English poems that are of particular interest for this study, namely Layamon’s Brut, King Horn, and the remaining parts of the South English Legendary.³ The decision to include these additional texts was made to ensure a better representation of Middle English dialects and, furthermore, to make the size of the final data set, which is 681,531 words, more suitable for a quantitative investigation. The extraction procedure, however, proved to be more difficult for these texts since they are not tagged for parts of speech and contain raw data only. The collection process involved two steps. Firstly, I extracted every instance of do with the help of the software AntConc 3.5.9 (Anthony 2020). Then, I isolated only the instances in which do took an infinitive complement. I acknowledge that this procedure may have introduced noise in the data provoked by human errors. However, the data has been carefully checked against the bibliographical entries provided by Ellegård (1953:213-255), who listed all the do + infinitive constructions he found in the Middle English texts he consulted, which include Brut, King Horn, and the South English Legendary. This additional step was carried out in order to keep manual errors to a minimum, which could compromise the results of the quantitative investigation.

3.2. Auxiliary and Causative Do-constructions: Issues of Classification

Auxiliary do constructions in Middle English are characterized by the presence of a finite form of the verb do and an infinitive verb. The semantics of the construction is determined by the infinitive verb, while auxiliary do is lexically empty and only contributes to the meaning of the construction with grammatical information (Anderson 2006). The identification of auxiliary do in Middle English, however, is not always an easy task, since the construction do + infinitive could also express a causative event. The typical pattern in which causative do occurs is formed by two noun phrases (NP) and an infinitive complement (INF), that is, NP1-do-NP2-INF. NP1 is the causer (the initiator of the causative event), NP2 is the causee (the performer of the action brought about by the causer), and the infinitive complement describes the action caused. In (6), for instance, þe king is the causer, þe mayden is the cause, and arise is the infinitive complement.

(6) þe king dede þe mayden arise / and þe erl hire

the king did the maiden arise / and the earl her

bitaucte

bestowed

‘the king made the maiden arise and bestowed her to the earl’ (PCMEP: Havelok: 7.208.95)

Less frequent are constructions in which the subject of the infinitive is absent, as in (7). In this example, Ptolemy is talking to the king and is asking him to save a Persian knight; in this particular excerpt, ye refers to the king and him to the Persian knight. It is rather unlikely that the actions of burning and hanging the knight were performed by the king himself, while it seems more realistic to assume that the king made someone else burn and hang him.

(7) Ye mowe wel him do brenne and honge

You may well him do burn and hang

‘You may well make someone burn and hang him’ (PCMEP: Alisaunder: 165.4022.2379)

In other cases, the meaning of the construction may be ambiguous, since it is not always clear who the performer of the caused situation is—the subject of the infinitive or the subject of do. On the one hand, if one assumes the implied presence of an external entity non-coreferential with the subject of do, the interpretation of the construction is causative. If, on the other hand, it is understood that the subject of do is the agentive subject of the infinitive verb, do can be interpreted as non-causative, that is, as an auxiliary. In principle, the two readings are both possible. There are no structural criteria available to distinguish between auxiliary and causative interpretations and only contextual clues, in some cases, allow us to disambiguate the meaning of the construction. However, there are several instances in which is not possible to assign a causative or an auxiliary reading and, therefore, such examples have been classified as ambiguous.

In this study, every example has been carefully analyzed in context. In order to make the interpretation as transparent as possible, I relied on the following guidelines. An auxiliary reading has been assigned only to constructions in which the general context allowed such an interpretation. In all the cases when the context did not disambiguate between causative and auxiliary readings, the construction has been analyzed as ambiguous. This is exemplified in (8) and (9): do in (8) is interpreted as an auxiliary, while in (9) it is considered ambiguous.

(8) Hwan he. hauede eten, and was fed, Grim dede maken a

When he had eaten, and was fed, Grim did make a

ful fayr bed and dede him therinne, and seyde,

full beautiful bed and did him therein, and said,

“Slep, sone, with muchel winne! Slep wel faste

“Sleep, son, with much joy! Sleep well fast

and dred thee nouth - from sorwe to joie art thu brouth”

and fear you nothing - from sorrow to joy are you. Brought”

‘When he had eaten and was fed, Grim made a beautiful bed, undressed him, put him in the bed and said: “Sleep son, and with much joy! Fall asleep quickly and do not fear anything – you have been brought from sorrow to joy”.’ (PCMEP: Havelok: 21.658.318)

(9) Ac ich am erof glad and bli e at ou art nomen in

But I am thereof glad and happy that you are taken in

clene live i soule-cnul ich wille do ringe, and masse

clean life your death-knell I will do ringe, and masse

for þine soule singe

for your soul sing

‘But I am happy and glad that you die pure in heart and I will make someone ring/ring the death knell and make someone sing/sing mass for you’ (PCMEP: Fox and Wolf: 196.277.252)

In (8), he refers to Havelok, and Grim is his servant. These lines describe a situation in which Grim is taking care of Havelok, who had not eaten in three days. Once Havelok is fed, Grim prepares his bed for him, undresses him, and puts him to bed. All these three actions are performed by Grim; in the text there is no other agent who could have plausibly done these actions. Thus, I interpret dede as an auxiliary and the string dede maken a ful fayr bed “made a very beautiful bed” as an auxiliary construction, which is semantically equivalent to unclothede him “undressed him” and dede him therinne “put him in there.” It should be noted that, despite sharing this reading, Ellegård (1953) interpreted this example as ambiguous: “Thus Grim dede maken a ful fayr bed (Havelok /79/, 658) has not been accepted as periphrastic, even though it is altogether likely that Grim, the servant, made the bed himself” (Ellegård 1953:37). Conversely, it is not clear in (9) who the performer of the action of ringing the death knell is, as the context and the lexical meaning of the infinitive do not provide any indication and both a causative and an auxiliary reading are possible.

The marking schemes of PCMEP and PLAEME have been crucial in the analysis of the data, since they do not allow for the presence of ambiguous cases; each instance has been coded either as a causative or as an auxiliary example. Causative cases are annotated with the tag IP-INF, since the verb phrase headed by the infinitive is a complement of do. Auxiliary constructions, by contrast, are those where the infinitive is tagged just as VB, which indicates that the infinitive is not considered a complement of do. The strategy adopted by the compilers of the corpora was to assign an auxiliary interpretation only when a causative reading is excluded, while the examples in which do is ambiguous are annotated with the label IP-INF along with causative instances. Ultimately, my understanding of the data agrees with the compilers of PCMEP and PLAEME in all but one instance, cited in (10), where the construction do + infinitive is tagged as causative or ambiguous in PCMEP, while I assigned an auxiliary interpretation.

(10) After mete, anon ryghtis, he dude noumbre his gode

After lunch, immediately right, he died count his good

knyghtis and sent fifteen thousand and hundredis seven al

knights and sent fifteen thousand and hundreds seven all

of Grece ybore by heven

in Greece born to heaven

‘Immediately after lunch, he counted his good knights and sent fifteen thousand and seven hundred, all born in Greece, to heaven’ (PCMEP: Alisaunder: 62:1396-1398)

At the end of the analysis, the distribution of auxiliary do, causative do, and constructions in which the interpretation of do was unclear (ambiguous do) was as shown in Table 1.

Table 1.

Frequency of Auxiliary, Causative, and Ambiguous Do in Middle English Poetical Texts

Construction	N
Auxiliary do	113
Causative do	174
Ambiguous do	77
Total	364

3.3. Conditional Inference Trees and Random Forests

The quantitative methods used to investigate the data are conditional inference trees and random forests (Tagliamonte & Baayen 2012; Deshors & Gries 2016; Szmrecsanyi, Grafmiller, Heller & Röthlisberger 2016; Hundt 2018; Tomaschek, Hendrix & Baayen 2018; Levshina 2020; Gries 2021). Conditional inference trees are a statistical technique based on recursive splitting of the data. Initially, the algorithm determines whether any of the independent variables (i.e., predictors) included in the model is associated with the dependent variable (i.e., the linguistic variable of interest). Then, the model evaluates which independent variable has the strongest effect on the dependent variable and divides the data set into two subsets. This procedure is repeated until there is no independent variable that can be associated with the response of the dependent variable at a statistically significant level, which in this study is the canonical 0.05. Random forests are an extension of conditional inference trees, since the algorithm first creates multiple inference trees and then merges the results of each tree, thus returning an outcome that is more accurate than a single tree. The number of trees grown in the forest can be determined by the researcher; in the present study, it has been set to 2000. Random forests are an ideal complement to conditional inference trees in that every conditional inference tree grown in the forest has a random sample of the independent variables included in the model. This is a form of bootstrapping that provides a reliable measure of the importance of every predictor. The software used to perform these computations is R (R Development Core Team 2020), and the package used for the conditional inference tree and the random forest is partykit (Hothorn & Zeileis 2015), a more recent version of the party package employed by Tagliamonte and Baayen (2012).⁴

3.4. Variables Included in the Statistical Models

The data set containing every do + infinitive construction was coded for six independent variables. The first predictor examines whether there is a significant difference between do + infinitive constructions with respect to the position the infinitive occupies in the verse. This variable considers two positions: end of the verse and non-end of the verse. By end of the verse is meant those instances in which the infinitive is the last element of the line and do is either adjacent to the infinitive or is separated from the infinitive by intervening words. An example of auxiliary do constructions not at the end and at the end of the verse is given in (11) and (12) respectively. The end of the verse is indicated by “/.”

(11) He was aferd sore of harme / Anon he dude caste his

He was afraid great of harm / Immediately he do cast his

Charme/

spell /

‘He was very afraid of being harmed and immediately cast his spell’ (PCMEP: Alisaunder: 9.104.45)

(12) And ouercomen hem with maistrie / The king onon dude

And overcame them with power / The king immediately did

crye / that non misdone hem ne sholde /

shout / that nobody hurt them not should /

‘And overcame them with power. The king immediately shouted that nobody should hurt them’ (PCMEP: Alisaunder: 221.5335.3159)

This predictor aims to examine whether auxiliary do was used as a metrical device to place the infinitive at the end of the verse, as previously suggested by several scholars (see section 2).

The second variable investigates the possibility that auxiliary do was used to prevent the occurrence of particular phonotactic clusters (Dahl 1956; Fischer & van der Wurff 2006). Given the emphasis that Middle English poets put on rhyme, auxiliary do could be used as a facilitator device which served to modify “hard-to-rhyme” forms into forms that offered more rhyme opportunities. In fact, while verse ending syllables formed by a group of consonants were not an issue in the Old English alliterative line, they could represent a challenging task for Middle English poets. Among the different verbal endings, I have shown that verbal endings in -th are the ones that are less likely to appear in final position of the verse (Moretti 2021).⁵ The hypothesis that will be tested is that, since poets paid particular attention to rhyme, they could resort to auxiliary do to avoid the presence of the ending -th in the final position of the verse.

The third variable is related to the introduction of foreign verbs, particularly of French origin, after the Norman Conquest in 1066, when the Germanic core of the Middle English vocabulary was enriched with numerous borrowings. The influence of Norman French on Middle English has been widely acknowledged (Jespersen 1905; Dekeyser 1986; Culpeper 2005; Mugglestone 2006; Brinton & Arnovick 2011; Baugh & Cable 2013), and there have been some attempts to quantify the number of words that entered the language. In this regard, Kastovsky (2006:250) argues that a total of 10,000 words was incorporated during the Middle English period. The hypothesis that will be tested is whether auxiliary do was used to prevent a foreign word having to be integrated morpho-phonologically into the English inflectional system. In other words, since the addition of a native ending to a foreign stem could create phonological or morphological difficulties, poets could use auxiliary do to simplify their use. In this way, do would express tense, while the borrowing verb would be in the infinitival form. In order to determine the origin of the infinitive verbs, I checked their origin using the OED. Any verb that was adopted during the Middle English period has been labeled as “Borrowed,” while the verbs that were already in the language are referred to as “Germanic.”⁶ This means that verbs of foreign origin that entered English in Old English, mainly from Latin, have been grouped under the label Germanic. Furthermore, verbs of Germanic origin which came through other Germanic varieties, mostly from Scandinavian languages, were labeled as borrowings, although their number is only limited to one occurrence in my data set. The idea that the large influx of French items played a role in the use of auxiliary do has been suggested by various scholars. Fischer and van der Wurff (2006:155) mention it as a possible factor that may have helped the rise of auxiliary do, while Shaw and De Smet (2022) have shown that French-origin infinitives favor the use of auxiliary do in Early Modern English, particularly in the period 1500-1570 (see section 5 for more details).

A further influencing factor included in the statistical investigation is the dialect of composition of each poem. Ellegård (1953:40-42) conducted an accurate analysis of the dialectal distribution of do + infinitive constructions in Middle English and found that causative do occurs more frequently in what he refers to as the Eastern dialect, while auxiliary do is more common in the Western dialect, suggesting thus a western origin for auxiliary do. Although Ellegård (1953) has shown that dialect is an important factor in the use of auxiliary do, the operationalization of a dialectal variable is particularly complicated when it comes to data of the Middle Ages. This is due to several factors, such as scribal copying, dialect classification, and other reasons, as discussed by Zimmermann (2020:9). Nevertheless, if supported by a sound methodology, dialectal features can be included in quantitative investigations that study this type of data. The main issue with the present data set is that Middle English dialects were not sharply distinct, which means that there are some texts that show a mix of dialectal features, and, in a few cases, it is impossible to assign a specific dialect. In such cases, I used the following approach. If a text presents predominant features of a specific dialect, I assigned the text to that dialectal area. Texts for which the dialect is classified as unknown by the compilers of PCMEP and PLAEME have not been included in the investigation.⁷

A crucial difference between this study and the one carried out by Ellegård (1953) is that here the attribution of a specific dialect to each text is based on the detailed information provided in PCMEP and PLAEME, which in turn follow the standard dating system of the Helsinki Corpus (Kytö 1996), where the following dialectal areas are distinguished: Southern, Northern, Eastern Midlands, and Western Midlands. Conversely, the map of dialects used by Ellegård is based on Oakden (1930), who distinguished the following areas: South-Eastern, Central, North-Western, Western, Eastern, Northern. In his study, Ellegård (1953:42) further divided the Western area into West Midlands and South-Western when necessary, with the dividing line being the estuary of the river Severn. The Northern dialect of the HC overlaps with Ellegård’s (1953) Northern dialect, while the Eastern Midlands dialect roughly matches with the Eastern dialect plus the eastern part of the Central dialect in Ellegård (1953). The Western Midlands dialect corresponds to Ellegård’s (1953) North-Western, part of the Western and the Central dialects. Finally, the Southern dialect coincides with Ellegård’s (1953) lower Western dialect and his South-Western dialect.

The predictor verse considers the different types of verse of each text. The texts have been divided into rhyming and non-rhyming, that is, alliterative, poems. Some texts do not show uniform metrical features but present some sections written in rhyme and others in alliteration, such as The Proverbs of Alfred and the Bestiary. In order to avoid the addition of a further level to this variable that would include mixed texts, each occurrence of do in these poems has been localized to a rhyming and non-rhyming section. This variable is designed to determine whether auxiliary do was a prerogative of rhyming poems, or whether it was also used in alliterative compositions.

Lastly, time will also be considered as a factor. In the present study, I divided the texts into three periods, which correspond to the canonical sub-periods M1 (1150-1250), M2 (1250-1350), and M3 (1350-1420) that the Middle English corpora consulted use in the classification of the texts. The inclusion of a variable investigating the effect of time on the use of do allows us to determine whether (i) the use of auxiliary do changes across different periods, and (ii) whether the time variable interacts with the other variables included in the model. The attribution of a period to an individual text can be a problematic procedure, and there are cases in which the date of the surviving manuscript may differ from the dialect of the original text. In the current study, I chose to consider the date of the surviving manuscript, as it cannot be excluded that the original version of the text was manipulated by the scribes that copied the texts in later periods.

The final design of the statistical models is the following:

A dependent variable called construction: Auxiliary do versus Causative do.

An independent binary variable named position: End-verse versus Other, which indicates whether the infinitive is found at the end of the verse.

An independent binary variable called ending: -th ending versus Other, indicating whether the finite verb ends in -th.

An independent binary variable named inf_origin: Germanic versus Borrowed, which indicates the origin of the infinitive occurring with do.

A categorical independent variable named dialect: Eastern Midlands (EM) versus Western Midlands (WM) versus Northern versus Southern, representing the dialect of the surviving text.

A categorical independent variable named period: M1 versus M2 versus M3, indicating the period to which the surviving text is dated.

A categorical independent variable named verse: Rhymed versus Non-rhymed, which refers to the type of verse used in each text.

4. Results

4.1. Distribution of Auxiliary Do by Predictor Variable

The search for auxiliary do in the data set used for this study yielded a total of 113 occurrences. A first, preliminary inspection of the data reveals that not all the predictors contribute to the use of auxiliary do in Middle English poems. The first predictor that stands out is the type of verse in which the text was composed. It appears that auxiliary do was exclusively used in rhyming texts, while it is completely absent in texts, or text sections, which do not feature rhyme: all the 113 auxiliary examples occur in rhyming poems, as shown in Table 2.

Table 2.

Distribution of Auxiliary Do and Causative Do in Rhyming and Non-rhyming Texts

Construction	Rhyming texts	Non-rhyming texts
Auxiliary do	113 (100%)	0 (0%)
Causative do	110 (63.2%)	64 (36.8%)

Second, there seems to be a correlation between auxiliary do and the position of the infinitive in the verse. Table 3 shows that the vast majority of the auxiliary constructions is found when the infinitive occupies the last position of the line, which might lend support to the suggestion that auxiliary do was used in Middle English poetry as a metrical device. The distribution of causative do, on the other hand, appears to be due to random variation.

Table 3.

Distribution of Auxiliary Do and Causative Do by Position of the Infinitive in the Verse

Construction	No end-verse	End-verse
Auxiliary Do	15 (13.3%)	98 (86.7%)
Causative Do	89 (51.1%)	85 (48.9%)

A third influencing factor for the presence of auxiliary do seems to be the dialect of the text. Table 4 shows that auxiliary do is more used in Eastern Midlands texts, while it is less attested in Southern and West Midlands compositions. These results go against Ellegård (1953), since he recorded a larger number of auxiliary do constructions in the Western dialect in the thirteenth and fourteenth centuries (see in particular Ellegård 1953:44-46, Tables 1-4). The differences between the results presented in Table 4 and the numbers provided by Ellegård (1953) lie in the fact that he interpreted a large number of do-infinitive constructions in the Eastern dialect as ambiguous. In fact, while he found a larger number of do + infinitive constructions in the Eastern dialect than in any other dialect, the vast majority of the examples (164/237, 69.2 percent) are classified by Ellegård (1953) as ambiguous, and only 15 (6.3 percent) as auxiliary ones.

Table 4.

Distribution of Auxiliary Do and Causative Do Across Different Dialects (Normalized Frequencies ×10,000 Words Within Parentheses)

Construction	East Midlands	Northern	Southern	West Midlands
Auxiliary do	50 (3.1)	12 (1.64)	4 (1.84)	47 (1.14)

Fourth, as far as the ending variable is concerned, Table 5 shows that auxiliary do is frequently attested when the ending of the lexical verb would be -th. However, the proportions of causative do in the same context seem to suggest that the ending was not a factor that determined the use of auxiliary do.

Table 5.

Distribution of Auxiliary Do and Causative Do by Ending

Construction	-th ending	Other endings
Auxiliary do	87 (77%)	26 (23%)
Causative do	119 (68.4%)	55 (31.6%)

Lastly, if we look at the origin of the infinitives that occur in do + infinitive construction, it appears that the total number of foreign infinitives is rather limited, as there are only twenty-eight instances in the data set. However, there are several cases in which a verb of foreign origin is attested in combination with auxiliary do, as shown in Table 6. If we compare the distribution of auxiliary do with that of causative do, it is likely that this variable will turn out to be a significant factor in the use of auxiliary do.

Table 6.

Distribution of Auxiliary Do and Causative Do by Origin of the Infinitive

Construction	Borrowed infinitives	Germanic infinitives
Auxiliary do	19 (16.8)	94 (83.2%)
Causative do	9 (5.2%)	165 (94.8%)

4.2. Statistical Analysis

We can now determine the relative importance of the variables described in the previous sections in the use of auxiliary do. In a first attempt to explore the data, I built a model with the following characteristics: a dependent variable with three levels, that is, auxiliary do, causative do, and ambiguous do, with position, ending, inf_origin, dialect, period, and verse as independent variables. The performance of the conditional inference tree, however, was far from encouraging. The accuracy level of the model, that is, the ratio of correctly predicted observations, was rather low, since the score for the conditional inference tree was 0.62, while for the random forests it was 0.67. These values are well below the 0.8 level recommended in other studies (e.g., Tagliamonte & Baayen 2012). In a second attempt to improve the accuracy of the model, I removed all the ambiguous examples and ran an analysis in which the dependent variable consisted of two levels, that is, auxiliary do and causative do, while the independent variables were as before. Model validation returned an accuracy level of 0.78 for the conditional inference tree and 0.81 for random forests, which represents a significant improvement compared to the scores of the first model. Thus, I opted for the model with the higher accuracy that features auxiliary and causative do as levels of the dependent variable.

The results of the conditional inference tree are illustrated in Figure 1. The ovals contain the independent variable selected by the algorithm to get the best split in terms of classification accuracy when predicting the dependent variable, and its p-value. The levels of the independent variable are described by the branches, while the bar plots represent the leaves and indicate the proportions of auxiliary and causative do in each end node, called “bin,” which includes all the observations for every level of the dependent variable. The total number of the observations in each bin is given in the parentheses above the boxes.

Figure 1.

Conditional Inference Tree of Auxiliary and Causative Do

The first conspicuous result is that the independent variable ending is not part of the tree, which means that it is not statistically significant in this dataset. Among the significant predictors, the first split at the top of the figure (Node 1) divides the data set into texts that feature rhyme and others in which rhyme is not used. The left side of the tree is concerned with non-rhyming texts and involves only Node 1 and Node 2, which are directly connected. Node 2 contains only causative constructions, which means that auxiliary do is completely absent in texts that are not composed in rhyme. Moving on to the right side of the tree, which is characterized by rhyming texts only, we see that the next split (Node 3) divides the data set according to the position that the infinitive occupies in the verse. Do-constructions whose infinitive does not appear at the end of the verse are further split according to the dialect of the texts (Node 4). In texts composed in the Eastern Midlands and Northern dialects, the majority of the constructions involve causative do, although there are also some instances of auxiliary do (Node 5). In texts of the Western Midlands and Southern dialects, on the other hand, there are no instances of auxiliary do in non-final position (Node 6). The portion of the data in which do + infinitive constructions occur at the end of the verse is divided according to the period in which the texts were written (Node 7). Node 8 shows that in M1, which refers to the period 1150-1250, the majority of the few do-constructions attested involve do used as a causative verb. In later periods, auxiliary do is more frequent. The link that connects Node 9 to Node 13 indicates that a large number of do + infinitive constructions in final position in the Southern and in the Western Midlands involve auxiliary do. In the Eastern Midlands and in the Northern dialects, the presence of the auxiliary construction was influenced by the origin of the infinitive it occurred with (Node 10). As it appears, auxiliary do was likely to occur when the infinitive was a borrowed item. By contrast, when the infinitive was a native verb, we observe an almost equal proportion of causative and auxiliary do constructions.

Let us now calculate the importance of each predictor by using a random forest (ntree = 2000, mtry = 2). The outcome is shown in Figure 2. The most powerful predictor is the one concerning the position of the infinitive in the verse, with the one addressing the period in which the text was written coming second. The effect of the predictor concerning the dialect, the metrical features of the texts, that is, rhyming versus non-rhyming texts, and the origin of the infinitive are less powerful. The predictor concerning the variable ending has no explanatory power in this model, as also indicated by its absence in the conditional inference tree.

Figure 2.

Dot Chart Evaluating the Conditional Variable Importance of Each Predictor

5. Discussion

The first important finding of the statistical analysis is that auxiliary do was a feature specific of rhyming poems. While this result is not surprising, it provides quantitative evidence for the assumption that we often find in the literature that auxiliary do is a construction typical of Middle English rhyming poetry (e.g., Engblom 1938). The second conspicuous result concerns the suggestion that auxiliary do was employed as a metrical tool to place the infinitive at the end of the verse. The conditional inference tree and the random forest lend quantitative support to this hypothesis, as the presence of auxiliary do is strongly influenced by the position the infinitive occupies in the verse. It is important to stress, however, that there is a considerable degree of variation across different dialects. On the one hand, the situation in the Southern and in the Western Midlands dialects, which is summarized by Node 6 and 13 in Figure 1, is rather straightforward: auxiliary do was used exclusively to place the infinitive at the end of the verse. In the Eastern Midlands and in the Northern dialects, on the other hand, the predictor position is not the only factor that determines the presence of auxiliary do. In these dialects, auxiliary do was used more than expected when the infinitive that was meant to occupy the final position of the verse was a borrowing. Therefore, it seems that auxiliary do was employed as a strategy to facilitate rhyme particularly when the verb was an item borrowed from another language. These findings raise the question as to whether auxiliary do was used as a “facilitator” device to integrate foreign verbs regardless of the position the infinitive had in the verse. Let us therefore isolate all the auxiliary and causative do constructions in the Eastern Midlands and in the Northern dialects and analyze the observed frequency of borrowing and native verbs. The results are illustrated in Table 7. Auxiliary do is observed fifty-eight times: in fourteen instances the infinitive is a borrowing item and in forty-four is a native verb. Causative do is found 129 times: in eight cases the infinitive is a borrowing and in 121 is a native verb. This data was submitted to a chi-squared goodness-of-fit test, which reveals that the difference between auxiliary do and causative do with infinitives of foreign origins is statistically significant (χ² = 10.73, df = 1, p < .01).

Table 7.

Distribution of Auxiliary Do and Causative Do by Origin of the Infinitive in Eastern Midlands and Northern Dialects

Construction	Borrowed infinitives	Germanic infinitives
Auxiliary do	14 (24.1%)	44 (75.9%)
Causative do	8 (6.2%)	121 (93.8%)

On the basis of these findings, I propose that one of the functions of auxiliary do in the Eastern Midlands and in the Northern dialects was to facilitate the introduction of borrowed items. The influence of Norman French on the English language in the period after the Norman Conquest was dramatic and affected in particular the vocabulary, which grew at a very rapid rate. As mentioned in section 3.4, Kastovsky (2006) claims that a large number of linguistic items of Romance origin entered English in the period post-Conquest. In a similar vein, Minkova (2004:13) argues that “[a]n unprecedented bulk of about 10,000 Romance words was added to English before the middle of the fifteenth century” (see also Dalton-Puffer 1996). In this scenario, the presence of a semantically light verb in the grammatical system like auxiliary do served to facilitate the adoption of new vocabulary items, as illustrated in (13), where dede serves as an accommodation strategy to use the foreign infinitive verb sayse “take possession of” (from Norman French saisir/seisir). In this way, auxiliary do bears the inflection, while the borrowed verb remains uninflected and carries only semantic information.

(13) þat he ne dede al engelond sone sayse intil his

that he ne did all England soon take in his

hond

hand

‘that he did not take into possession all England’ (PCMEP: Havelok: 8.251.113)

These findings fit in well with a recent study carried out by De Smet and Shaw (forthcoming), in which it is shown that French verbs that entered English during the late fourteenth century are preferentially used in non-finite contexts. More specifically, De Smet and Shaw (forthcoming) argue that French-origin verbs are subject to what they call “accommodation biases,” that is, morphosyntactic constraints: in their study of the fifteen most frequent verbs of French origin during the Middle English period, the share of finite forms is lower than expected in nearly every case. Furthermore, Shaw and De Smet (2022) have also shown in another study that, compared to native verbs, the use of auxiliary do is favored with French-origin verbs over the finite alternative in the period 1500-1570 (Shaw & De Smet 2022). In addition to these studies, it is also worth stressing that the practice of inserting a semantically light verb to help the adoption of foreign words is not uncommon cross-linguistically. Wohlgemuth (2009:103), for instance, argues that the introduction of a light verb, that is, a verb with broad referential scope (see Jespersen 1954:VI, 117-118), is typologically the second most frequent strategy behind what he calls “direct insertion.” Furthermore, Wohlgemuth (2009) argues that verbs meaning “to do” are the light verbs that are more frequently used across different languages. Using the light verb strategy allows the borrowed verb to be uninflected and bear only the semantic load, while the light verb carries the grammatical information. An example of this strategy is found in Moroccan Arabic-Dutch dert-hum ontmoeten “I met them” (literally “I did them meet”) (Versteegh 2010:647), where dert, past tense of the verb dar “to do,” facilitates the use of the Dutch-origin verb ontmoeten “to meet” (for more examples in other languages, see Wohlgemuth 2009:107-109).

The last finding to discuss is perhaps the most unexpected result of the statistical analysis carried out in section 4.2. There is a moderate number of auxiliary do constructions in Eastern Midlands and Northern rhyming texts that do not appear at the end of the verse. More specifically, 32.5 percent of the forty-three cases in Node 5 (14 occurrences) are auxiliary constructions in which do is not used to place the infinitive at the end of the verse. The question that arises is, therefore, if auxiliary do was not used for rhyme purposes, what type of function could it have? The conditional inference tree did not produce any split in that portion of the data, which suggests that the variables considered in this study are not statistically significant. A definitive answer, not surprisingly, remains elusive. However, there is a possible explanation that is worth considering. Middle English poetry is characterized by several features that set it apart from Old English. While the Old English poetical production is remarkably uniform and rests on the principles of alliteration (Fulk 1992), Middle English poems hinge on rhyme, regularity of beats, syllable-counting, and, particularly in late Middle English compositions, the use of iambs (Pearsall 1977; Lester 1996; Putter forthcoming). The results of the statistical analysis presented in section 4.2 illustrate that auxiliary do was indeed used for metrical purposes, especially to place the infinitive at the end of the verse. At the same time, it is possible that this was not the only metrical function of auxiliary do. That is, there is the chance that poets used auxiliary do to further manipulate the structure of the verse in order to adhere to the new versification modes that developed during the Middle English period.

Let us investigate this possibility in greater detail by looking at the occurrences of auxiliary do in non-final position in the poem Havelok, which contains seven out of the fourteen total examples. The metrical features of the Havelok have been carefully described by Skeat (1868:xliv), who claims that the poem rests on the fundamental rule that every verse has four beats, which can be afforded by lines that contain between seven and nine syllables. Consider example (14). Following the metrical indications provided by Skeat (1868), the four beats, which are marked by accents in this and in the following examples, are distributed in the following way.

(14) He déde máken and fúl wel hólden

(PCMEP: Havelok: 2.29.20)

Let us now assume that the author did not use auxiliary do. In such a case, the poet had to conjugate the verb make to the past tense made. The resulting verse, which is shown in example (15), contains seven syllables and, following Skeat’s (1868) analysis, the four beats would fall as follows.

(15) *Hé madé and fúl wel hólden

As can be seen, example (15) presents a problem in the distribution of the beats. In particular, the main issue concerns the second beat, which does not correspond with the linguistic stress of the verb make. Unlike (14), in which the second beat falls where it should from a linguistic point of view, that is, on the root of the verb, the second beat in (15) falls on the unstressed past tense ending. This creates a mismatch between the metrical and the linguistic stress, since the latter in English typically falls on the root syllable. There is broad agreement among scholars that meter is patterned on the linguistic characteristics of the language (e.g., Kiparsky 1975; Russom 2017). This implies that metrical features are determined by the metrical structure of the words that form the verse. Thus, notions like metrical feet and metrical position are abstracted from the words and the syllables of the language (Russom 2017), which means that strong metrical positions have to match with linguistic stress (Minkova 2016).

The hypothesis I suggest is that in these cases auxiliary do had the purpose of allowing the metrical stress to fall on the syllable that carries the linguistic stress. The role of auxiliary do, in other words, was to ensure the metricality of the verse, which would be violated if the corresponding simple form of the lexical verb was used. This proposal is supported by other instances in Havelok, since it appears that the situation described above is not an isolated case. In (16), for instance, auxiliary do is used exactly as in (14), since its presence permits the metrical stress to match the linguistic stress of the verb callen. The absence of do, on the other hand, would cause the metrical stress to be on the unstressed past ending -ed, as shown in (17). Note that another metrical scansion, such as placing the first beat on he rather than on þat, is not possible for the verse in (17). Although the placement of the first beat on he would allow the second beat to correctly fall on the root of called, the verse would still be unmetrical, as it would contain three beats only.

(16) þat hé ne díde him cállen ók

(PCMEP: Havelok: 82.2899.1311)

(17) *þát he né calléd him ók

The same process is illustrated in (18) and (19). As we can see, the four beats are well-distributed in the verse described in (18). Example (19), on the other hand, is unmetrical. The only way whereby the verse in (19) would have four beats is by placing the first beat on Grim. This, however, would cause the second beat to fall on the unstressed ending of made, with the result that linguistic and metrical stress would not match.

(18) Grim déde máken a fúl fayr béd

(PCMEP: Havelok: 21.658.318)

(19) *Grím madé a fúl fayr béd

In the other four instances found in the Havelok, which are not provided here for reasons of space, auxiliary do is used as a metrical support exactly as in the examples illustrated above.⁸ It seems plausible, therefore, to suggest that one of the functions of auxiliary do in Middle English poems was to manipulate the structure of the verse in order to ensure the metricality of the verse, and in particular to allow the metrical stress to match the linguistic one. It is interesting to note, in addition, that this particular function of auxiliary do links up well with previous studies that have shown that the addition of a semantically empty auxiliary to serve metrical needs other than rhyme is not unusual in the Middle English poetical tradition. Putter and Stokes (2000), in fact, have convincingly demonstrated that the construction gan + infinitive verb was used to create an iambic foot, which is a sequence of an unstressed syllable followed by a stressed one, in the work of the Gawain-poet, and in particular in the poem Pearl. In this case, Putter and Stokes (2000:79) argue that this construction “yields an instant iambic foot” in which gan is the non-stressed syllable and the first syllable of the infinitive is the stressed syllable.

Lastly, the results of the conditional inference tree and random forests concerning causative do show that it is particularly frequent in the Western Midlands and in the Southern dialects, while in the other dialectal areas it is rarely attested (for similar results, see Ecay 2015; Ogura 2018). The important observation that can be drawn from the statistical analysis, however, is that causative do was not used for metrical purposes, but only for linguistic reasons. This contradicts the suggestions put forward by Ellegård (1953) and Smyser (1967), who proposed that causative do (and other causative verbs like let and make; see section 2) were employed to place the infinitive at the end of the verse.

To conclude, the results of the statistical analysis concerning the presence of auxiliary do in Middle English poetry returned a more complex picture than it is usually assumed in the literature. While it is true that do was used to place the infinitive at the end of the verb, there is also evidence that this was not its only function. In the Eastern Midlands and in the Northern dialects, auxiliary do served also as a device to help the integration of verbs of foreign origins and as an alternative to the simple form of the lexical verb to ensure the metricality of the verse by allowing the metrical stress to fall on the syllables that carry the linguistic stress.

6. Conclusion

This study has investigated the functions of auxiliary do in Middle English poetical compositions. It has been argued in previous studies (e.g., Ellegård 1953) that the auxiliary construction served as a metrical device to place the infinitive in the final position of the verse to favor end-verse rhyme. However, these studies are not supported by empirical analyses. This article explicitly adopted a quantitative perspective and, by means of a conditional inference tree and random forests, illustrates that arguing that auxiliary do functioned only as a device to facilitate rhyme is an oversimplification. Firstly, the quantitative investigation has clearly shown that auxiliary do is a feature of the Middle English rhyming tradition. This result should not be overlooked, since the Middle English poetical production is characterized by a great degree of metrical variation. Moreover, the statistical analysis has confirmed that auxiliary do was mainly used as a metrical tool to place the infinitive at the end of the verse. However, this use was not uniform across different dialects. Evidence that auxiliary do was exclusively a rhyming tool has been provided in particular for the Southern and the Western Midlands dialects. As for the Eastern Midlands and the Northern dialects, the quantitative analysis has shown that auxiliary do covered a wider range of functions. Besides end-verse rhyme, the presence of auxiliary do in these dialects was determined by the type of the infinitive it occurs with, since it favored French-origin verbs. Additionally, I argue that auxiliary do is found in these dialects as a metrical filler used to manipulate the order of the beats in the verse and ensure that the metrical stress corresponded with the linguistic stress. Causative do, on the other hand, appears to be used for linguistic reasons only, contrary to what has been suggested by Ellegård (1953) and Smyser (1967).

Overall, from a methodological point of view, this paper highlights, once again, the benefits of combining qualitative and quantitative approaches in investigating the features of linguistic constructions. In addition, it points to the validity and the strengths of tree-based methods, which have only recently made their way into linguistic studies. I hope to have shown that conditional inference trees and random forests are more than a mere alternative to regression-based models and that they can be a valuable tool in tracking the development of a specific construction over time.

Footnotes

Acknowledgements

I would like to thank Tine Breban, Kersti Börjars, Graeme Trousdale, Ad Putter, and Donka Minkova for their comments on earlier versions of this paper, as well as two anonymous reviewers for helpful feedback.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research for this paper was funded by an Arts & Humanities Research Council UK (AHRC) doctoral scholarship.

ORCID iD

Lorenzo Moretti

Notes

Corpora

Kroch, Anthony & Ann Taylor. 2000. Penn parsed corpus of Middle English. 2nd edn (PPCME2). Philadelphia, PA: Department of Linguistics, University of Pennsylvania. (7 February, 2020).

Truswell, Robert, Rhona Alcorn, James Donaldson & Joel Wallenberg. 2018. A parsed linguistic atlas of Early Middle English, 1250-1325 (PLAEME). Edinburgh: Edinburgh University Press. (29 May, 2020).

Zimmermann, Richard. 2015. The parsed corpus of Middle English poetry (PCMEP). Geneva: University of Geneva. (19 May, 2020).

Software

Anthony, Lawrence. 2020. AntConc (3.5.9). Tokyo, Japan: Waseda University. (8 January, 2020).

R Development Core Team. 2020. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. (25 July, 2020).

Randall, Beth. 2000. CorpusSearch: A Java program for searching syntactically annotated corpora. Philadelphia, PA: Department of Linguistics, University of Pennsylvania.

Author Biography

Lorenzo Moretti is a postgraduate student in the Department of Linguistics and English Language at the University of Manchester. His research interests are historical linguistics, corpus linguistics, and cognitive linguistics.

References

Anderson

Gregory D. S.

2006. Auxiliary verb constructions. Oxford: Oxford University Press.

Baugh

Albert C.

& Thomas Cable. 2013. A history of the English language. 6th edn. Boston, MA: Pearson.

Brinton

Laurel J.

Arnovick

Leslie K.

2011. The English language: A linguistic history. Oxford: Oxford University Press.

Budts

Sara

. 2020. On periphrastic do and the modal auxiliaries: A connectionist approach to language change. Antwerpen, Belgium: Universiteit Antwerpen PhD dissertation.

Culpeper

Jonathan

. 2005. History of English. London: Taylor & Francis.

Dahl

Torsten

. 1956. Linguistic studies in some Elizabethan writings: The auxiliary do. Aarhus: Universitetsforlaget.

Dalton-Puffer

Christiane

. 1996. The French influence on Middle English morphology: A corpus-based study of derivation. Berlin: Mouton de Gruyter.

De Smet

Hendrik

Shaw

Marlieke

. Forthcoming. Missing link: Code-switches, borrowings and accommodation biases. Linguistics Vanguard.

Dekeyser

Xavier

. 1986. Romance loans in Middle English: A re-assessment. In Kastovsky

Dieter

Swedek

Aleksander

(eds.), Linguistics across historical and geographical boundaries, 253-265. Berlin: Mouton de Gruyter.

10.

Denison

David

. 1985. The origins of periphrastic do: Ellegård and Visser reconsidered. In Eaton

Roger

Fischer

Olga

Koopmand

Willem F.

van der Leek

Frederike

(eds.), Papers from the 4th international conference on English historical linguistics, 45-60. Amsterdam: John Benjamins.

11.

Deshors

Sandra C.

Gries

Stefan Th.

2016. Profiling verb complementation constructions across New Englishes: A two-step random forests analysis of ing vs. to complements. International Journal of Corpus Linguistics 21(2). 192-218.

12.

Dietze

Hugo

. 1895. Das umschreibende ‘do’ in der Neuenglischen prosa. Jena: Frommannsche Hof-Buchdruckerei.

13.

Ecay

Aaron

. 2015. A multi-step analysis of the evolution of English do-support. Philadelphia, PA: University of Pennsylvania PhD dissertation.

14.

Ellegård

Alvar

. 1953. The auxiliary do: The establishment and regulation of its use in English. Stockholm: Almqvist & Wiksell.

15.

Engblom

Victor

. 1938. On the origin and early development of the auxiliary do. Lund, Sweden: Gleerup.

16.

Erb

Marie C.

2001. Finite auxiliaries in German. Tilburg, Netherlands: University of Tilburg PhD dissertation.

17.

Fischer

Olga

van der Wurff

Wim

. 2006. Syntax. In Hogg

Richard

Denison

David

(eds.), A history of the English language, 109-198. Cambridge: Cambridge University Press.

18.

Fonteyn

Lauren

Nini

Andrea

. 2020. Individuality in syntactic variation: An investigation of the seventeenth-century gerund alternation. Cognitive Linguistics 31(2). 279-308.

19.

Fulk

Robert D.

1992. A history of the Old English meter. Philadelphia, PA: University of Pennsylvania Press.

20.

Garrett

Andrew

. 1998. On the origin of auxiliary do. English Language and Linguistics 2(2). 283-330.

21.

Gries

Stefan Th

. 2021. Statistics for linguists with R: A practical introduction. Berlin: Mouton de Gruyter.

22.

Hansen

Sandra

Schneider

Roman

. 2013. Decision tree-based evaluation of genitive classification: An empirical study on CMC and text corpora. In Iryna

Guravych

Chris

Biemann

Torsten

Zesch

(eds.), Language processing and knowledge in the web, 83-88. Berlin: Springer.

23.

Hilpert

Martin

Gries

Stefan Th.

2010. Modeling diachronic change in the third person singular: A multifactorial, verb- and author-specific exploratory approach. English Language and Linguistics 14(3). 293-320.

24.

Horstmann

Carl

. 1851. The early South English Legendary or lives of saints. London: Early English Text Society.

25.

Hothorn

Torsten

Zeileis

Achim

. 2015. partykit: A modular toolkit for recursive partytioning in R. Journal of Machine Learning Research 16(118). 3905-3909.

26.

Huddleston

Rodney

Pullum

Geoffrey K.

2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press.

27.

Hundt

Marianne

. 2018. It is time that this (should) be studied across a broader range of Englishes: A global trip around mandative subjunctives. In Deshors

Sandra C.

(ed.), Modeling World Englishes: Assessing the interplay of emancipation and globalization of ESL varieties, 217-244. Amsterdam: John Benjamins.

28.

Jäger

Andreas

2006. Typology of periphrastic ‘do’-constructions. Bochum, Germany: Brockmeyer.

29.

Jenset

Gard B.

McGillivray

Barbara

. 2017. Quantitative historical linguistics: A corpus framework. Oxford: Oxford University Press.

30.

Jespersen

Otto

. 1905. Growth and structure of the English language. Leipzig, Germany: Teubner.

31.

Jespersen

Otto

. 1954. A Modern English grammar on historical principles. London: Allen & Unwin.

32.

Kastovsky

Dieter

. 2006. Vocabulary. In Richard

Hogg

David

Denison

(eds.), A history of the English language, 199-311. Cambridge: Cambridge University Press.

33.

Kiparsky

Paul

. 1975. Stress, syntax, and meter. Language 51(3). 576-616.

34.

Kroch

Anthony

. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1(3). 199-244.

35.

Kytö

Merja

. 1996. Manual to the diachronic part of the Helsinki corpus of English texts: Coding conventions and lists of source texts. Helsinki, Finland: University of Helsinki.

36.

Lass

Roger

. 1992. Phonology and morphology. In Blake

Norman

(ed.), The Cambridge history of the English language, vol. 2, 1066-1476. Cambridge: Cambridge University Press.

37.

Lass

Roger

. 2006. Phonology and morphology. In Hogg

Richard

Denison

David

(eds.), A history of the English language, 43-108. Cambridge: Cambridge University Press.

38.

Lester

Godfrey Allen

. 1996. The language of Old and Middle English poetry. New York: St. Martin’s Press.

39.

Levshina

Natalia

. 2020. Conditional inference trees and random forests. In Paquot

Magali

Gries

Stefan Th.

(eds.), A practical handbook of corpus linguistics, 611-643. Berlin: Springer.

40.

Minkova

Donka

. 2004. Alliteration and sound change in early English. Cambridge: Cambridge University Press.

41.

Minkova

Donka

. 2016. Prosody-meter correspondences in Late Old English and Poema morale. In Heidorf

Leonard

Pascual

Rafael J.

Schippey

Tom

(eds.), Old English philology. Studies in honour of R.D. Fulk, 122-143. Cambridge: D. S. Brewer.

42.

Moretti

Lorenzo

. 2021. On multiple constructions and multiple factors in language change: The origin of auxiliary do. Manchester: University of Manchester PhD dissertation.

43.

Mugglestone

Lynda

(ed.). 2006. The Oxford history of English. Oxford: Oxford University Press.

44.

Nevalainen

Terttu

. 2006. Mapping change in Tudor English. In Mugglestone

Lynda

(ed.), The Oxford history of English, 178-211. Oxford: Oxford University Press.

45.

Oakden

James P.

1930. Alliterative poetry in Middle English: The dialectal and metrical survey. Manchester: Manchester University Press.

46.

Ogura

Michiko

. 2018. Periphrases in medieval English. Frankfurt am Main: Peter Lang.

47.

Pearsall

Derek

. 1977. Old English and Middle English poetry. London: Routledge & Kegan Paul.

48.

Putter

. Forthcoming. Verse forms. In Cooper

Helen

Edwards

Robert

(eds.), Oxford history of poetry in English. Oxford: Oxford University Press.

49.

Putter

Stokes

Myra

. 2000. Spelling, grammar, and metre in the works of the Gawain-poet. Parergon 18(1). 77-95.

50.

Royster

James Finch

. 1922. Old English causative verbs. Studies in Philology 19(3). 328-356.

51.

Russom

Geoffrey

. 2017. The evolution of verse structure in Old and Middle English poetry: From the earliest alliterative poems to iambic pentameter. Cambridge: Cambridge University Press.

52.

Shaw

Marlieke

De Smet

Hendrik

. 2022. Loan word accommodation biases: Markedness and finiteness. Transactions of the Philological Society 120(2). 1-17.

53.

Skeat

Walter W.

1868. The lay of Havelok the Dane: Composed in the reign of Edward I. about A.D. 1280. London: N. Trübner & Co.

54.

Smyser

H. M.

1967. Chaucer’s use of gin and do. Speculum 42(1). 68-83.

55.

Stein

Dieter

. 1990. The semantics of syntactic change: Aspect of the evolution of do in English. Berlin: Mouton de Gruyter.

56.

Sweet

Henry

. 1898. A new English grammar: Logical and historical, vol. 2, Syntax. Oxford: Clarendon Press.

57.

Szmrecsanyi

Benedikt

Grafmiller

Jason

Heller

Benedikt

Röthlisberger

Melanie

. 2016. Around the world in three alternations: Modeling syntactic variation in varieties of English. English World-Wide 37(2). 109-137.

58.

Tagliamonte

Sali

Baayen

R. Harald

. 2012. Models, forest, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135-178.

59.

Tomaschek

Fabian

Hendrix

Peter

Baayen

R. Harald

. 2018. Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics 71. 249-267.

60.

Versteegh

Kees

. 2010. Contact and the development of Arabic. In Hickey

Raymond

(ed.), The handbook of language contact, 634-651. Oxford: Wiley Blackwell.

61.

Visser

F.Th

. 1963-1973. An historical syntax of the English language. Leiden, The Netherlands: Brill.

62.

Wohlgemuth

Jan

. 2009. A typology of verbal borrowings. Berlin: Mouton de Gruyter.

63.

Zimmermann

Richard

. 2020. Testing causal associations in language change: The replacement of subordinating then with when in Middle English. Journal of Historical Syntax 4(4). 1-59.

The Functions of Auxiliary Do in Middle English Poetry: A Quantitative Study

Abstract

Keywords

1. Introduction

2. Background

3. Data and Methodology

3.1. Corpus Description and Collection Process

3.2. Auxiliary and Causative Do-constructions: Issues of Classification

3.3. Conditional Inference Trees and Random Forests

3.4. Variables Included in the Statistical Models

4. Results

4.1. Distribution of Auxiliary Do by Predictor Variable

4.2. Statistical Analysis

5. Discussion

6. Conclusion

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

Corpora

Software

Author Biography

References