Abstract
In this paper, we investigate variable past marking in Australian Aboriginal English as spoken on Croker Island, Northern Territory. Employing data from twenty speakers and both mixed-effects regression and random forests, we show that despite a high degree of individual variability the occurrence or non-occurrence of a past-marked verb is subject to conditioning factors that are known from other varieties of English, most notably lexical and grammatical aspect and marker persistence. Moreover, the constraints governing the preverbal marker bin relate in systematic ways to those governing inflection. Our results suggest that the specifics of contact influence may be less relevant to explaining variable linguistic processes such as past marking than more general discourse-pragmatic and cognitive principles of language variation and change. This has implications for the debate about the uniqueness of creole languages, which have often been considered a language type like no other.
Keywords
1. Introduction
Past marking is one of the best-studied morphosyntactic variables in English. Studies have examined the use of inflected versus unmarked past-temporal reference verbs in second-language (L2) acquisition (e.g., Wolfram & Hatfield 1984; Bayley 1996; Hawkins & Liszka 2003), English as a lingua franca (e.g., Kirkpatrick & Subhan 2014), New Englishes (e.g., Gut 2009; Biewer 2015; Bohmann & Babalola 2023; Hackert, Laliberté & Wengler 2024), and traditional (e.g., Jankowski & Tagliamonte 2022) and high-contact (e.g., Schreier 2005) first-language (L1) varieties.
1
The focus, however, has been on non-obligatory past inflection in African American Vernacular English (AAVE) and Caribbean English-lexifier creoles (CECs), with at least part of this interest probably owed to the variable’s central position in the once heated debate about the origins of AAVE. Whereas the absence of morphological marking on consonant-final regular verbs (e.g.,
This controversy already highlights one of the major problems involved in analyses of non-obligatory past inflection in English: the variable’s location “at the intersection” of different structural processes (Patrick 1991:171), with extralinguistic factors like age, gender, and social background also entering into the equation. Analyses “have gained in sophistication and breadth” (Poplack & Tagliamonte 2001:103); they now generally make fine distinctions with respect to these factors and processes. However, irrespective of variety, some recurrent constraints have emerged, including structural factors such as morpho-phonological verb class and lexical and grammatical aspect as well as cognitive ones such as priming. There is mounting evidence that, unlike previously assumed, at least some of these factors are not at all specific to creoles and related varieties such as AAVE and thus unconnected to potential creole universals, particular substrates, or the existence of a specific historical contact situation.
In this paper, we analyze patterns of variable past marking in English on Croker Island, Northern Territory, Australia. English on Croker Island is interesting for at least two reasons. First, it cannot be seen as a homogenous variety or even a linearly ordered continuum of features. This situation raises significant questions about functioning communication in the community and the acquisition of English. It also sheds further light on the process of indigenization, that is, the adoption of English as a distinct variety and means to express Indigenous identity. Although English is acquired as an L1 by all speakers on Croker Island, it has not undergone other steps that are usually associated with indigenization, especially the reduction of variation and the adoption of English as the main community language and a means to express Indigenous identity (cf. Schneider 2007). Second, English on Croker Island has been shaped by language contact but in a highly complex way due to its multilingual ecology and longstanding and variable contact situation. Consequently, expected linguistic outcomes of language contact cannot always be diagnosed with sufficient certainty (cf. Mailhammer 2021:165-169). Our main research question is what constraints govern past marking in English on Croker Island and if these constraints are specific to the variety or resemble those found in other world regions and variety types.
We coded and analyzed 1274 tokens of past marking on lexical verbs using a multivariate approach. We tested the standard constraints described in the literature: morpho-phonological verb class, grammatical aspect, lexical aspect, and temporal disambiguation by means of adverbials. For consonant-final regular verbs, we investigated following phonological environment. We considered age and gender as social factors and linguistic task as a stylistic factor. We also coded for individual speaker and lexical item, as these constraints have repeatedly been shown to strongly affect variable linguistics processes of the kind investigated here. The effects of the factors we investigated align with what has been found in other studies of non-obligatory past marking in varieties of English, including CECs, New Englishes, and L1 varieties of English in North America and elsewhere. This has significant ramifications for our understanding of language contact and variation, especially in the context of postcolonial Englishes and creole studies.
In what follows, we first outline the research context in which our study is embedded (Section 2) and then give details concerning our data and method (Section 3). Section 4.1 presents the results of our statistical analysis of variable past inflection in the community at large. Section 4.2 turns to preverbal bin, a minor yet important past-marking strategy employed primarily by one elderly speaker, and investigates in what way this strategy aligns with the majority pattern. Section 5 discusses our results in the context of previous research; Section 6 offers concluding remarks.
2. Research Context
2.1. English on Croker Island
Croker Island is located off the coast of Northwestern Arnhem Land, Northern Territory, Australia, as shown in Figure 1.

Croker Island, Northern Territory, Australia
The island measures about 45 km from north to south and at most 15 km from east to west. Minjilang, Croker Island’s only settlement, is located at Mission Bay on the east coast. The community facilities include a primary school, a shop, a preschool, a community center, and the local government administration. According to the 2021 census, 88 percent of Minjilang’s population of 265 are Aboriginal people. The linguistic situation is complex. Most Indigenous community members speak at least one Indigenous language, mostly Iwaidja, Mawng, and Kuwinjku, or have at least a passive understanding of Iwaidja, which, since the early twentieth century, has been considered Croker Island’s primary language. In addition, everyone speaks English as an L1. However, input, domains, and patterns of usage are highly variable. While some community members use English very frequently or exclusively, others only use it in domains that are associated with “white” contexts or in the absence of a common Indigenous language. Indigenous languages occupy almost all community domains except school and government-related communication. For official community announcements, the default language is Iwaidja, and there is no translation into other languages. For English on Croker Island, this means that the local languages, and especially Iwaidja, Mawng, and Kuwinjku, functioned not only as substrates historically but are also concurrently spoken languages that exert ongoing influence on English and are influenced by it. Linguistic commonalities between local Englishes and Kriol, an English-lexifier creole spoken in the area (cf. Schultze-Berndt, Meakins & Angelo 2013), suggest that Kriol might have also formed part of the English “feature pool” (Mufwene 2001) available to Indigenous people, even though it was never used as an official medium of instruction.
This linguistic variability is closely connected to Croker Island’s settlement history. Via British military posts that were established in the early nineteenth century, Aboriginal people in Northwestern Arnhem Land were first exposed to non-standard dialects as well as institutionalized forms of English. They initially used the language as a means to communicate with soldiers, explorers, and then settlers with whom they interacted. But with increasing dominance of the colonizers, English became increasingly forced on them, especially through Christian missions that were set up from the early twentieth century onwards. The mission on Croker Island was established in 1941. It was reserved for children and teenagers of mixed ancestry, who were forcibly brought there from all over northern Australia; almost all Aboriginal residents had to leave, which led to a break in settlement continuity. From 1958 onwards, Croker Island was re-settled, partly by original inhabitants but also by many newcomers, which increased the Aboriginal community’s heterogeneity. While white people resided at the mission community, which was located where Minjilang is today, Indigenous people lived at a beachside camp just outside the mission settlement. As a result, Aboriginal people used English at the mission and Indigenous languages at the camp; children would sometimes speak English among themselves. This situation lasted until the early 1970s, after the end of the mission, when the Indigenous population, for many of whom this was their ancestral land and home, moved into the mission area with the subsequent construction of new housing. It was only then that Minjilang’s mixed population formed a community. The persistent heterogeneity of English on the island is probably rooted in differences in acquisition and usage that remain considerable. A more detailed account of the history and current sociolinguistic setting of English on Croker Island is provided by Mailhammer (2021:40-54).
2.2. Previous Research on Variable Past Inflection in English
As noted in Section 1, variable past inflection has received extensive sociolinguistic attention and sophisticated statistical treatment, beginning with the earliest quantitative studies of AAVE (Labov, Cohen, Robins & Lewis 1968; Wolfram 1969). United in their rejection of the verbal “deficit” theory, these studies aimed at demonstrating the variety’s systematicity and status as a legitimate dialect of English by uncovering the patterned constraints on variable features such as the copula or past inflection. This included an emphasis on the identity of structural constraints across stigmatized and mainstream varieties of the language. Accordingly, the frequent occurrence of unmarked consonant-final regular verbs such as
Arguing from a comparative sociolinguistic perspective (cf. Tagliamonte 2013), Poplack and Tagliamonte (2001) present multivariate analyses of variable past inflection in earlier African American English. Phonological conditioning accounts for most of the variation among regular verbs. The behavior of individual lexical items also has a strong effect, particularly among irregular verbs. Further predictors include “morphological priming” (2001:129) and “discourse preferences” (2001:141). The Bickertonian factors of anteriority and stativity are inconclusive at best. Poplack and Tagliamonte conclude from these findings that “the grammar of AAVE originated largely from the regional and nonstandard Englishes to which [. . .] early African Americans were exposed, and not from any widely-spoken creole” (Poplack 2000:2).
Other studies have confirmed the effects identified by Poplack and Tagliamonte (2001) and added additional pertinent findings. In a comparative study of AAVE and Trinidadian Creole, Winford (1992:335) demonstrates that, with regard to verbal aspect, “[t]he proper distinction is between verbs that refer to specific past situations and those that refer to habitual/characteristic situations” rather than between punctual and non-punctual ones. In a detailed examination of over eight thousand past-reference verbs from urban Bahamian Creole, Hackert (2004:161-166) finds that the marking propensities of individual lexical items explain not only the behavior of particular morpho-phonological verb classes but also the apparent stativity effect, which largely disappears when a single, variably stative and non-stative verb, that is,
More recent research on varieties from outside the Atlantic area and with different sociohistorical origins has uncovered similar patterns of variation. Biewer’s (2015) study of variable past inflection in South Pacific Englishes is closely modeled on Poplack and Tagliamonte (2001) and Hackert (2004) and therefore offers a particularly interesting comparative perspective. Importantly, it demonstrates in an empirically sound and statistically robust fashion that the depressing effect on past marking exerted by habituality is not restricted to creoles and related varieties (cf. also Bohmann & Babalola 2023:31). Biewer also fails to find a significant effect for stativity (2015:259), further supporting the idea that the effect of this factor posited in earlier studies may actually be an artifact of lexical and discourse-pragmatic constraints on past marking. Hackert, Laliberté, and Wengler’s (2024) comparative study of conversational data from the Hong Kong, India, Jamaica, and Philippine subcomponents of the International Corpus of English corroborates this view and also highlights the influence of marker persistence and verb frequency.
As noted above, Poplack and Tagliamonte (2001) had concluded that patterns of variation in earlier AAVE were more consistent with an English history than with a creole background. Extrapolating from this conclusion in the absence of creole as a factor in the context of Croker Island, we propose that the recurring appearance of constraints on variable linguistic processes such as past-tense marking across all varieties, including factors that have been specifically linked to creoles, makes it likely that we are dealing with pan-English or even crosslinguistic constraints. This is the underlying hypothesis of the present study.
Past marking in Aboriginal English (AbE) has been described as generally reduced (Rodriguez Louro & Collard 2021:6). However, the only study that has empirically investigated this is Malcolm (1996), who recorded seven school children between five and ten years old (two female; mean age: 8) in interviews with teachers 2 in the early 1980s. The two key points to note about this study are, first, that all children employ the base form of the verb about 20 percent of the time and, second, that the range of variants also includes not just inflected and unmarked forms but also the preverbal particle bin followed by the base form of the verb, with use of this construction ranging from 2 percent to more than 20 percent among the children (Malcolm 1996:153).
3. Data and Method
3.1. The Sample
The data used in this study were collected between 2013 and 2018 in the community of Minjilang on Croker Island, as part of a larger project investigating the dynamics of language contact between English and Australian Aboriginal languages (Mailhammer 2021). They come from twenty speakers (ten female, 11-78 years, mean age = 45.5), who performed a variety of linguistic tasks including semi-structured interviews and elicitation sessions (2021:13-14). Elicitation included the so-called “pear story” stimulus, a short, silent video clip about children stealing fruit commonly used in typological fieldwork (cf. Chafe 1980), as well as another (self-designed) video elicitation task aimed primarily at depicting aspectual contrasts. Other tasks, such as a card game, did not enter the corpus underlying the present study, as they did not contain past-reference verb situations in sufficient numbers. This excluded eight speakers—primarily children—of Mailhammer’s original community sample of 28. As in Mailhammer (2021:62), speakers are referred to by their initials in this paper. 3
3.2. Circumscribing the Envelope of Variation
As pointed out by Poplack and Tagliamonte (2001:114), accurate variable extraction and coding is “particularly important in the past temporal reference sector,” as “there is little isomorphy between form and function in this sector in either creoles or English: both marked and unmarked forms can encode a number of tense/aspect functions.” This situation is illustrated for Crooker Island English in Examples (1) and (2). Forms included in the following statistical analyses are marked in bold, whereas those omitted are underlined.
(1) you know one year . . . you know small we (2) CM I Researcher 1 yeah CM they CM and that old man at Arrarru we CM all over and we Researcher 1 right, ok CM and we CM that wire, he CM for that light Researcher 1 what was the wire for? oh right Researcher 2 it’s a big light, must have been a big cable CM CM big light - we Researcher 1 yes, kerosene. CM kayirrk you Researcher 2 . . . so the lights over there at Cape Don CM that’s all morning time . . . off afternoon it Researcher 1 and how old were you then when you did that job? CM Researcher 1 what like Justin or – CM before that but I
Like most studies of variable past marking in AAVE, CECs, and World Englishes, we focus on the interplay between unmarked and past-inflected lexical verbs with unambiguous past-time reference. This excluded all tokens with non-temporal semantics, such as counterfactuals and conditionals, as well as contexts that permitted both a past and a present-time interpretation. The term inflection refers to three different morphological processes: suffixation, as in chopped it up (Example 2), vowel change, as in my father spoke to him (Example 1), and suppletion, as in we went down under the track (Example 1). We counted only tokens of past inflection on lexical verbs but additionally included two of the “primary” ones (Quirk, Greenbaum, Leech & Svartvik 1985:96), that is,
In addition to variably inflected past-reference lexical verbs and auxiliary structures, Example (2) additionally features the preverbal particle bin. This marker is common not only in CECs but also in the Australian creole Kriol, which is spoken across the Northern Territory (Schultze-Berndt, Meakins & Angelo 2013:246). The speaker in (2) uses bin not only before lexical verbs in their base form (bin V), as in we bin dig it, but also before other tense-aspect markers, as in we bin used to see light here, as well as before noun phrases indicating location or direction, as in I bin Darwin first, and adjectives, as in bin little bit young. Finally, he also employs it in a standard English perfect progressive construction: I’ve bin working there to that lighthouse. In this versatility, bin in Croker Island English closely resembles what is found in mesolectal CECs such as Bahamian Creole (cf. Hackert 2004:115). Whereas all bin constructions involving auxiliaries or non-verbal elements were automatically excluded, closer inspection of bin V revealed that it appears to have the same temporal-aspectual semantics as variably inflected lexical verbs in Croker Island English. As only CM used the construction to any significant extent, we treat this speaker’s data separately in Section 4.2.
3.3. Predictor Variables
As outlined in Sections 1 and 2.2, a frequent explanation for variable past inflection in AAVE, CECs, and World Englishes is phonological and maintains that unmarked verbs do not represent an underlying zero form but are the result of
irregular verbs;
syllabic regular verbs, whose stem ends in /t/ or /d/, with /ɪd/ as their past-tense suffix (e.g.,
regular verbs ending in a vowel and taking /d/ as the dental suffix (e.g.,
regular verbs ending in a consonant other than /t/ or /d/, hence forming a coda cluster in the past tense (e.g.,
The class of irregular verbs includes both verbs undergoing only vowel-change (e.g.,
We next coded for aspectual semantics. The term aspect refers to two different yet closely intertwined dimensions of temporal information. There is, first, the classification of events according to the dimensions of stative/dynamic, punctual/durative, and telic/atelic. Smith (1997) uses the term situation aspect, other terms are aktionsart and lexical aspect, which is what we will use here. Second, there is the internal temporal viewpoint taken on events as either bounded or unbounded, referred to as viewpoint or grammatical aspect (cf. Comrie 1976; Smith 1997), which is what we use here. Lexical and grammatical aspect are independent theoretical notions, but there are important interactions between them. For example, the use of the progressive aspect with stative verbs is generally marked (Smith 1997:77). We coded for lexical and grammatical aspect separately and retained them as separate predictors in the statistical analysis.
To code for
Apart from the stative-dynamic dimension, lexical aspect is not usually investigated in analyses of variable past marking in AAVE, CECs, or World Englishes. It is, however, in studies of language acquisition, and we took inspiration from this field to take a closer look at the non-stative situation types of accomplishment, achievement, activity, and semelfactive. According to the Aspect Hypothesis, learners “will initially be influenced by the inherent semantic aspect of verbs or predicates in the acquisition of tense and aspect markers associated with or affixed to these verbs” (Andersen & Shirai 1994:133). More specifically, past marking has been found to spread from the telic situation types of achievements and accomplishments to the atelic ones of activities and states. While numerous studies have confirmed this for both L1 and L2 learners, especially among L2 speakers there seems to be considerable variation, which is often explained by confounding factors such as individual variation, transfer effects, and task variation (Bardovi-Harlig & Comajoan-Colomé 2020). Semelfactives are “single-stage events with no result or outcome” (e.g.,
For
Another factor we included is “
Another controversial constraint on past marking in CECs and related varieties is temporal disambiguation. It is a popular assumption that creole tense-mood-aspect systems rely heavily on the surrounding discourse context, including conjunctions and adverbials, for a verb’s temporal characterization (cf. Bickerton 1975:150-160), but Tagliamonte and Poplack (1993:189-190), Hackert (2004:174, 178), Biewer (2015:259), and Bohmann and Babalola (2023:31) find no significant effects if the category of temporal
adverbial indicating a point of time (e.g., yesterday, in 1960, after we had had lunch);
adverbial indicating duration (e.g., for two months, until they left school);
adverbial indicating frequency (e.g., ever; every summer);
adverbial indicating relationship: already, still, yet;
then.
Even though non-temporal uses of then were omitted, we still assigned a separate category to this adverb because of its sheer frequency (N = 110).
Regarding social factors, we only tested for speaker
3.4. Statistical Analysis
As outlined in Section 3.3, variable past inflection in English is subject to numerous linguistic and social constraints, whose impact can be captured with a multivariate statistical analysis. Our analysis consists of two parts. First, we investigate the alternation between inflected and unmarked past forms at the community level (Section 4.1). Second, we separately consider the data from an individual, CM, whom we excluded from the community analysis due to his idiosyncratic use of an additional third variant, bin (Section 4.2). Each analysis consists of two steps: (I) determining each predictor’s strength and direction of effects, and (II) assessing each predictor’s overall impact on the alternation. Due to the different data structures, step I involves different multivariate techniques: we employ generalized mixed-effects regression modeling for the community analysis and a random forest-based technique suggested by Gries et al. (e.g., Heller, Bernaisch & Gries 2017) for the analysis of CM’s data.
Generalized mixed-effects regressions can handle datasets which comprise non-independent, that is, grouped, data points. In sociolinguistic data, such grouped data structures occur if speakers contribute more than a single observation or if lexical items occur more than once. By adding grouping factors as random effects, fixed effects like
As more complex random- and fixed-effects structures caused convergence problems, we only added varying intercepts for the two random effects and did not consider any interaction terms between the fixed effects in the full model. The final model achieved a C-value of 0.86, which indicates a good fit (Tagliamonte & Baayen 2012:156). For all factors, variable inflation scores (VIFs) were computed using the R-package “performance” (Lüdecke, Ben-Shachar, Patil, Waggoner & Makowski 2021). VIFs measure the magnitude of multi-collinearity between factors and were below 3 in our final model (
CM’s analysis did not lend itself to parametric regression modeling owing to the “small n large p” problem (cf. Tagliamonte & Baayen 2012:24). Also facing multicollinearity and empty cells, we employed random-forest (RF) modeling, which constitutes a non-parametric regression approach that is based on recursive binary partitioning of the data such that each split results in increasingly homogeneous subgroups. RFs have been found to be robust against overfitting, resulting in reliable outcomes (e.g., Tomaschek, Hendrix & Baayen 2018:250). The RF was based on an ensemble of 1000 inference trees; the parameter specifying the number of randomly selected predictors at each split (mtry) was set to 3. To visualize the direction of effects for each predictor, we created effects plots based on group averages of the RF’s predicted probabilities (cf. Heller, Bernaisch & Gries 2017:123-124). Even though the resulting effects “do not control for the effects of all other predictors at the same time, [. . . a number of] studies have used this [method] successfully and comparisons of such plots with corresponding effects plots of regressions have been very encouraging” (Heller, Bernaisch & Gries 2017:124). Plotting was done in “ggplot2” (Wickham 2016).
To assess the overall impact of each predictor on the past tense alternation (step II), we also used RF modeling. This means that for the analysis of the community at large, we fitted an RF including the same factors that featured in the final step I model (ntree = 1000, mtry = 3). For CM’s analysis, we used the RF created during step I. Based on these RFs, all predictors’ impacts were calculated using Strobl, Boulesteix, Kneib, Augustin, and Zeileis’s (2008) unconditional variable importance estimation, an algorithm that has been found to be unaffected by multicollinearity.
4. Results: Patterns of Past Marking in Croker Island English
Figure 2 summarizes the distribution of past markers by speaker in terms of absolute numbers and percentages. Of the overall number of 1274 tokens, 65 percent are inflected. Frequencies of inflection range between 56 percent and 96 percent (mean = 77%, median = 78%), with the exception of CM, who has only about 12 percent past-inflected verbs. In fact, for this speaker, the major competitor for unmarked verbs is not the standard English past inflection but bin V. No other speaker in the sample makes anywhere near as much use of this construction as CM, and it appears as if this speaker’s past-tense system operates on principles quite different from those of the other speakers, based as it is on a tripartite instead of on a bipartite structure.

Distribution of Past Markers by Speaker in Terms of Absolute Numbers and Percentages
In brief, CM must be considered a “sociolinguistic outlier,” that is, an individual “whose linguistic behavior for some reason falls well outside that of the wider speech community” (Britain 2003:191). Such outliers, as noted by Britain (2003:191), have attracted the attention of researchers since “the early days of variation studies,” but in almost all cases, “a fuller exploration of the social background of the individual has shed light on their anomalous linguistic behaviour.” This fuller exploration constitutes the subject not just of a separate statistical analysis but also a separate section (4.2) of this paper. What follows in Section 4.1 is restricted to the data from the remaining nineteen speakers and the variation between unmarked and past-inflected verbs (N = 933). All these speakers’ bin constructions (N = 16) were discarded. Tables 1 and 2 present an overview of tokens and marking rates by predictors and predictor levels for the community sample and CM’s sample.
Overview of Predictors and Predictor Levels, Community Sample
Overview of Predictors and Predictor Levels, CM’s Sample
4.1. The Community at Large
Table 3 shows the results from a type II ANOVA based on our final model. During model selection,
Analysis of Deviance for Past Inflection, Community Sample
Note: Signif. codes: 0 ‘***’ .001 ‘**’ .01 ‘*’ .05.
Figure 3 displays the effects of the linguistic predictors.

Linguistic Predictors’ Effects on Past Inflection, Community Sample (Error Bars for 95% Confidence Intervals): (a) verb class, (b) lexical aspect, (c) persistence, (d) grammatical aspect, and (e) adverbial
Figure 3a shows considerable differences between the most and least marked verb classes in English on Croker Island. However, none of these differences are statistically significant, except for those between consonant-final regular verbs (“cons.fin.reg”) and
Figure 4 presents absolute frequencies and percentages of past inflection for consonant-final regular verbs by following phonological environment. We subjected this verb class to a separate analysis to investigate syllable-final consonant cluster reduction, that is,

Frequencies and Percentages of Marked and Unmarked Consonant-Final Regular Verbs by Following Phonological Environment
As outlined in Section 2.2, stativity has generally been found to favor past marking in both AAVE and CECs, and Figure 3b shows this effect in English on Croker Island, too. In fact, the difference between statives and other lexical aspect types is statistically significant (accomplishments: est = 1.25, p < .05, activities: est = 1.36, p < .05) except for achievements (est = 0.35, p = .88). A closer inspection of the phenomenon in Bahamian Creole (Hackert 2004:164-166) revealed, however, that the marking propensity of statives is at least in part an artifact of lexical idiosyncrasies. The class of statives is made up to a large extent of the high-frequency verbs
Our findings concerning the Aspect Hypothesis are somewhat mixed. While achievements fully conform to expectations, being substantially more marked than both other dynamic situation types (Figure 3b), the overall distinction seems to be between punctual (achievements) and durative (accomplishments, activities) situation types rather than between telic (accomplishments, achievements) and atelic (activities) ones. Only the contrast between achievements and accomplishments is statistically significant (est = 1.17, p < .05).
Grammatical aspect (Figure 3d) exerts a very significant statistical effect in the expected direction, with perfectives clearly more frequently past-inflected than habituals (est = 1.09, p < .01). The persistence principle operates in our data, too (Figure 3c): the likelihood of finding an inflected or unmarked verb increases when this verb follows a verb of like marking status, resulting in a highly significant difference between unmarked and marked forms in the preceding slot (est = −1.17, p < .001). Thus, speakers of Croker Island English conform to a more general tendency (cf. Szmrecsanyi 2006): they tend to reuse morphosyntactic patterns they have produced or heard before. Interestingly, “N/A” contexts also show high probabilities of marking, which can easily be explained by way of the fact that such verb situations carry the functional load of framing the following stretch of discourse within past temporal reference.
As seen in Figure 3e, in English on Croker Island only a single adverbial type, that is, then, noticeably affects rates of past inflection, with statistically significant contrasts found only between then and adverbials indicating a point of time (est = −1.40, p < .05) on the one hand, and then and no adverbial at all (est = −1.07, p < .05) on the other. Importantly, as hinted at in Section 3.3, then is not only a very frequent adverb but also often functions as a discourse marker indicating progression not through reference time but through discourse time, as in Example (3).
(3) I didn’t talk Mawng, only Iwaidja and like Kunwinjku. And I could go out and we met together, and old man Snowy
There are also other non-temporal uses, such as (but) then in the sense of ‘on the other hand’: (4) . . . only the Kunwinjku kids, they speak to each other in Kunwinjku,
Even though we did not count such uses, they might well have had a depressing effect on past marking rates in clauses containing then.
Figure 5 shows the effects of the only statistically significant non-linguistic predictor in our mixed model, that is,

The Effect of
Figure 6 presents the variable importance ranking for past inflection in the community at large. As in the mixed model, we included only the statistically significant predictors. The random forest achieved a C value of 0.96.

Predictor Importance Ranking for Past Inflection, Community Sample
As seen in Figure 6, by far the largest amount of variation in the past inflection of lexical verbs in Croker Island English is explained by individual
4.2. CM’s System of Past Marking
CM’s system of past marking lexical verbs has a tripartite structure, with the preverbal marker bin accounting for over half (N = 158) of this speaker’s tokens (N = 308; cf. Figure 2). CM produced both interview and video task speech, but the latter contained so few tokens in several important contexts, in particular statives (N = 3) and habituals (N = 6), that the following statistical analysis is restricted to his interview speech (N = 242).
Figure 7 presents the variable importance ranking for CM’s interviews. The random forest predicted 80 percent of all tokens correctly, which is substantially better than chance prediction based on the most frequent of the three alternating forms, that is, bin (51 percent). Individual

Predictor Importance Ranking for Past Marking, CM’s Sample
Figure 8 zooms in on the effects of the linguistic predictors. The pattern found for morpho-phonological

Linguistic Predictors’ Effects on Past Marking, CM’s Sample: (a) Verb Class, (b) Lexical Aspect, (c) Gramm. Asp., (d) Persistence, and (e) Adverbial
When it comes to
It is plausible that bin in English on Croker Island originally derives from New South Wales Pidgin, where it was the standard way to express past reference (Troy 1994:250). This variety spread throughout Australia in the nineteenth century and gave input to restructured varieties and creoles, including Kriol (Munro & Mushin 2016:83). It is likely that bin V was the past tense for many speakers of Aboriginal varieties of English before the start of formal schooling, which is consistent with CM’s data, who acquired English largely unsupervised in the 1940s, when formal schooling had not yet become standard for Aboriginal people in Northern Arnhem Land. Since then, English has formed part of the linguistic input for people growing up in Minjilang, either directly through the local speech community or through contact with Kriol and other varieties of AbE.
5. Discussion
The most significant finding of this study is that past inflection in English on Croker Island is subject to the same recurrent structural constraints as in other varieties of English. This is interesting from at least two perspectives. First, because of the specific history and variability of English on Croker Island, we expected relevant constraints to be difficult to identify. In fact, individual variation is a strong factor, but this situation is by no means unique to Croker Island, and it also does not mean that variability in the speech community cannot be accounted for rigorously. It may well be that further studies on other variables show a similar picture, which would be interesting for the relationship of grammar and variation and for the question of how phenotypically homogenous grammatical systems must be. Second, the fact that variable past inflection is governed by constraints that are found elsewhere, irrespective of variety type and history, for example, high-contact, low-contact, creole, Caribbean, Asian, etc., suggests that exactly those two characteristics—type of variety and history— may not be as relevant to explaining the phenomenon as previously thought and that other factors are much more important. We return to this point below, after we contextualize our findings with respect to English on Croker Island and AbE.
The overall level of past inflection across the sample is comparable to the much smaller sample of Western Australian Aboriginal children from the early 1980s (Malcolm 1996:152). That said, rates of occurrence of a feature are not actually decisive in such comparisons, as they may vary according to a wide variety of extralinguistic factors. What matters is whether varieties or lects “share an underlying grammar, and to what extent” (Tagliamonte 2013:161), which necessitates attention to more abstract, deeper-level patterns of variation as evident in the direction and strength of the structural constraints operating on the variable in question. Malcom’s study provides much more coarse-grained data on the types of past tense employed, but some details are worth comparing.
Malcolm’s (1996:153) data suggest a trend toward standard marking correlating with age that is not evident in our data. As noted in Section 4.1,
Unfortunately, Malcolm’s analysis does not go beyond identifying different types of marking for individual speakers in his sample. Consequently, it is impossible to compare this sample of AbE with that of Harkins (1994), who claims that past marking in her Central Australian sample is pretty much identical with the standard system and that deviations are phonologically conditioned (Malcolm 1996:153). Not unexpectedly, phonological conditioning also plays a role in our sample. Section 4.1 showed that consonant-final regular verbs evidence the lowest inflection rates and that this trend is even more pronounced before consonants and pauses, which is consistent with other high-contact varieties of English. The general explanation for elevated rates of
It is remarkable that Croker Island English at first glance follows the presumably creole pattern of favoring past marking on stative verb situations. However, upon closer inspection, this apparent stativity effect is as much an artifact of lexical idiosyncracies as it is in mesolectal CECs such as Bahamian Creole. Our data partly confirmed the Aspect Hypothesis, which says that telicity favors past marking in language acquisition more generally. At least for Iwaidja speakers, there may also be the supporting factor that Iwaidja has an aspectually underspecified past tense, an anterior, which is contrasted with a past imperfective (Caudal & Mailhammer 2022:5). In the anterior, it is the telicity of the verb that determines its aspectuality: if the verb is telic, the default reading is perfective, if it is atelic, the default reading is imperfective. Consequently, speakers of English on Croker Island who also know Iwaidja—and this is probably a significant number (cf. Mailhammer 2021:143)—would have access to a tense category that would support the use of the Aspect Hypothesis in the acquisition process. As for the other substrate languages, they have been described as having an aspectual contrast in the past tense, that is, normally between a perfective and an imperfective past tense (for Kunwinjku cf. Evans 2003; for Mawng cf. Capell & Hinch 1970). However, it is uncertain what the respective interactions with lexical aspect look like, and whether they would support the Aspect Hypothesis as an acquisition strategy.
Finally, the gender effect we observed is in line with the more general trend in Western societies that women tend to use more formal or prestigious speech forms than men. This is not the norm in postcolonial or non-Western societies (e.g., Bakir 1986), especially if the interaction between gender and other social variables such as social status and education is considered (e.g., Hackert 2004), and even in Western contexts female speakers sometimes show more non-standard speech than male speakers (e.g., Eckert 2000), but the same observation has been made in other postcolonial contexts (e.g., Leimgruber, Lim, Gonzales & Hiramoto 2021). On Croker Island, women have often assumed leadership roles in the community, showing high levels of functionality in Indigenous and Western contexts combined with multilingualism and high levels of education. However, as Indigenous women, their status in the community is likely not reflected in their position in monolingual mainstream society. Hence, it is not unlikely that Croker Island women may move toward prestige norms in mainstream contexts (including interview settings), as has often been theorized for women elsewhere, and for similar reasons related to power and sociolinguistic awareness. Our results indicate that it would be worthwhile to further examine the relationship between variation and community-specific ways of doing gender on Croker Island.
Returning to the issue of how patterns of past marking in high-contact varieties of English are best explained, our results confirm the general picture that has been emerging in the last twenty years or so: the variable is governed largely by the same set of lexical, structural, discourse-pragmatic, and cognitive factors irrespective of whether we are dealing with creoles, high-contact, or L1 or L2 varieties. None of these factors are specific to creoles. Consequently, they cannot be explained as stemming from a “creole prototype.” Similarly, they are not germane to varieties with contact histories and therefore cannot be explained as substrate effects, either. Instead, we think that they constitute general linguistic principles that affect the acquisition and use of past marking in English more generally.
6. Conclusion
In this paper we analyzed past marking in English on Croker Island, Northern Territory, Australia using data from twenty speakers supplying 1274 tokens in total. Our analysis showed that the most predictive factors apart from
Footnotes
Appendix
Summary of Mixed-Effects Model, Community Sample (Predictions for Marking)
| Estimate | SE | z-Value | p | |
|---|---|---|---|---|
| (Intercept) | 0.14 | 1.56 | 0.09 | .929 |
| lex_aspectachievement | 1.18 | 0.40 | 2.98 | .003 |
| lex_aspectactivity | 0.17 | 0.45 | 0.38 | .703 |
| lex_aspectstative | 1.52 | 0.53 | 2.88 | .004 |
| verb_classv.fin.reg | 1.60 | 0.75 | 2.13 | .033 |
| verb_classsyll.reg | 0.72 | 0.55 | 1.30 | .194 |
| verb_classirregular | 1.09 | 0.41 | 2.68 | .007 |
| verb_classd-g-h-m-s | 2.45 | 0.65 | 3.75 | <.001 |
| gramm_aspectperfective | 1.09 | 0.35 | 3.12 | .002 |
| persistencemarked | 0.12 | 0.56 | 0.22 | .825 |
| persistenceunmarked | −1.04 | 0.60 | −1.74 | .081 |
| adverbialduration | −2.02 | 1.61 | −1.26 | .209 |
| adverbialfrequency | −1.36 | 1.64 | −0.83 | .407 |
| adverbialpoint of time | −1.13 | 1.44 | −0.78 | .436 |
| adverbialthen | −2.54 | 1.44 | −1.77 | .077 |
| adverbialnone | −1.47 | 1.41 | −1.04 | .297 |
| genderwomen | 0.76 | 0.28 | 2.70 | .007 |
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Australian Research Council, Discovery Grant DP130103935, CI Robert Mailhammer.
