Abstract
Aims and Objectives:
This study investigates the use of definite noun phrases involving demonstratives in adolescent and adult monolingually raised and heritage speakers of Greek, Russian, and Turkish with the following research questions: (1) Do heritage speakers of Greek, Russian, and Turkish align with monolingually raised speakers regarding the production of demonstratives? and (2) Do mode and register affect the use of demonstratives?
Methodology:
We conducted a corpus study on production data of heritage speakers of Greek, Russian, and Turkish residing in the United States and Germany and the respective monolingually raised speakers in Greece, Russia, and Turkey. The majority languages of the heritage speakers were German or English, respectively. Data were elicited in two distinct registers (formal vs informal) and in two distinct modes (spoken vs written). Participants were asked to narrate what happened in a short video showing a fictional minor car accident.
Analysis:
Oral and written narrations were annotated and analyzed using generalized linear mixed-effects regression modeling on the use of demonstratives by heritage and monolingually raised speakers accounting for individual variation, country of elicitation, mode, and register.
Findings:
The results show that heritage and monolingually raised speakers of Greek, Russian, and Turkish converge in their use of demonstratives. Also, mode and/or register significantly affect the production of definite noun phrases with demonstratives across all speaker groups.
Originality:
This is the first cross-linguistically comparable large-scale corpus study with ecologically valid production data of definite noun phrases with demonstratives in bilinguals.
Significance:
The study contributes to understanding the use of demonstratives in heritage and monolingual Greek, Russian, and Turkish. It provides insights into the use of demonstratives in languages with different determiner systems and the impact of mode and/or register, which seems to be pronounced roughly similarly across these languages.
Introduction
This paper aims to model how heritage speakers of Greek, Russian, and Turkish produce demonstratives in definite noun phrases (NPs). Specifically, we investigate the distribution of demonstratives in three typologically different languages, making our analyses comparable. We take into account several factors, such as mode (spoken vs written) and register (formal vs informal), as well as the typological differences and different majority languages (English and German) that these heritage languages are in contact with. Research into heritage languages and their speakers is a relatively young sub-discipline on bilingualism. Heritage speakers can be defined as speakers of both a minority language, which is usually spoken among the members of their core family, and a majority language spoken by the larger society (Polinsky, 2018; Rothman, 2009). These speakers acquire two languages in a naturalistic setting, meaning they belong to the nativeness continuum (Rothman & Treffers-Daller, 2014; Wiese et al., 2022), unlike L2 learners. The speakers’ competence usually varies in these two languages and their dominance changes across their lifespan (Kim & Puigdelliura, 2020; Papastefanou et al., 2019). Heritage languages are typically acquired at home and instantiate the vernacular, the informal oral variety. Thus, speakers are more familiar with the spoken mode than with the written one. Studies focusing on register variation in heritage speakers claim that, as long as heritage languages are only acquired within the family setting, heritage speakers usually lack formal instruction in their heritage language and most heritage speakers do not fully master formal registers (for a discussion, see Rothman, 2007; Rothman et al., 2023; Rothman & Treffers-Daller, 2014; Tsehaye et al., 2021; Wiese et al., 2022). These languages are characterized by conversational and casual style, limited to everyday topics, leading to a consequent register narrowing (Chevalier, 2004; Dressler, 1991). Thus, heritage speakers are usually not exposed to register variation but only to the informal register, leading to register leveling as evidenced by data in the study by Wiese et al. (2022) for different groups of heritage speakers. From a sociolinguistic perspective, Greek, Russian, and Turkish migrant communities in the United States and Germany share common features in their language-related behaviors and experiences in the host countries. For instance, these migrant groups tend to form close-knit communities serving as important hubs for cultural and linguistic preservation and contributing to language maintenance efforts. However, despite the efforts to preserve and transmit the linguistic heritage to younger generations, the heritage language of the second-generation migrants undergoes significant changes in lexicon and grammar (Benmamoun et al., 2013; Montrul, 2015; Polinsky, 2018, among many others). This study addresses the question of how heritage Greek, Russian, and Turkish in contact with English and German express definiteness using demonstratives and whether a cross-linguistic interference from majority languages on this particular phenomenon is at hand. To answer this question, we investigate semi-spontaneous speech of adolescent and adult heritage and monolingual 1 speakers of Greek, Russian, and Turkish using a single experimental design, as demonstrated in the Methodology section.
Marking definiteness by means of demonstratives is a common feature in Greek, Russian, Turkish, English, and German, even if there are some notable typological differences, as shown in the following section “Definiteness marking.” In cross-linguistic definiteness marking, inherently definite demonstratives are used in deixis, that is, they represent unique referents identifiable by interlocutors through contextual information. Semantically, unlike other definite referents, demonstrative reference contrasts the actual and potential referents (Lyons, 1999). According to Wolter (2005), demonstrative descriptions are sensitive to salience and speaker demonstrations and their scope varies depending on the NP. Diachronically, they usually serve as the origin of definite articles in languages that developed ones (Diessel, 1999).
This paper is structured as follows: in the section “definiteness marking,” we discuss definiteness marking in the languages involved in the study, namely the majority ones, German and English, and the heritage ones, Greek, Russian, and Turkish. Subsections “Definiteness marking in Greek” to “Definiteness marking in Turkish” present previous studies on definiteness realization in Greek, Russian, and Turkish, while the section “Definiteness marking in bilingual speakers” deals with studies in bilingual populations. In the section “Research questions and hypotheses,” we introduce our hypotheses and research questions and the section “Methodology” exhibits a detailed analysis of the methodology including the experimental design (in subsection “Experimental design”), the participants pool (in subsection “Participants”), the corpus compilation (in subsection “Corpus annotation and queries”), and finally the “Statistical analysis” subsection. The section “Results” presents the results of demonstrative use in the different heritage speaker groups. In the subsection “Descriptive results grouped by language,” the results are presented descriptively, subsection “Mixed-effects regression” provides the output of the mixed-effects regressions followed by interim results in the subsection “Interim summary”. Finally, the section “Discussion” discusses our findings.
Definiteness marking
The means of definiteness marking highly vary cross-linguistically. For instance, in German, English, and Greek, (in)definiteness is expressed via determiners, articles, and demonstratives, as shown below for German: 1. ein/ das Buch a-SG.NOM/ the-N.SG.NOM book-N.SG.NOM ‘A/ The book.’ 2. dieses Buch this-N.SG.NOM book-N.SG.NOM ‘This book.’
Similarly to German, definiteness in English is expressed via the definite article the and demonstratives, which show singular and plural forms, for example, this, that, these, and those: 3. the book/books the book-SG/books-PL 4. the/this/that book the/this/that-SG book-SG 5. these/those books these/those-PL books-PL
As is well known, German articles and demonstratives inflect for gender, case, and number, while English does not exhibit agreement features on D-elements as can be seen in the examples above.
The languages involved in this investigation, namely English, German, Greek, Russian, and Turkish, differ with respect to how they mark definiteness: Turkish and Russian lack definite determiners. However, they all have one common feature, namely, they use demonstratives, as summarized in Table 1. Besides, as we have shown above, the majority languages German and English express definiteness by means of demonstratives similarly. 2 The following subsections “Definiteness marking in Greek,” “Definiteness marking in Russian,” and “Definiteness marking in Turkish” provide an overview of definiteness marking in heritage and monolingual varieties of Greek, Russian, and Turkish, focusing on demonstratives as a common anchor. Subsequently, the subsection “Definiteness marking in bilingual speakers” presents a summary of existing research on (in)definiteness marking in bilingual populations.
Definiteness marking across languages.
Definiteness marking in Greek
Greek has both definite articles such as o, i, to and indefinite articles enas, mia, and ena, which mark the [+/–definite] feature of NPs, respectively. Furthermore, Greek in certain contexts can realize multiple definite determiners, a phenomenon labeled determiner spreading (synonymous terms: spreading of definiteness, polydefiniteness, double-definiteness, definiteness agreement/concord; Alexiadou, 2003, 2014; Kolliakou, 2004; Lazaridou-Chatzigoga, 2009; Lekakou & Szendrői, 2012). This is the case in the context of demonstratives, which co-occur with the definite article obligatorily. Thus, in this paper, we use the term double-definiteness for structures that involve demonstratives, determiners, and the nominal head.
As seen in Examples (6) and (7), demonstratives appear either pre- or post-nominally co-occurring with determiners within the NP: 6. afti tin imera this-F.SG.ACC the-F.SG.ACC day-F.SG.ACC ‘This day.’ 7. tin imera afti the-F.SG.ACC day-F.SG.ACC this-F.SG.ACC ‘This day’
Many authors (Alexiadou, 2014; Grohmann & Panagiotidis, 2004; Manolessou, 2000; Manolessou & Panagiotidis, 1999; Panagiotidis, 2000) have claimed that when the demonstrative appears post-nominally and post-adjectivally, it lacks a deictic force and hence is considered to be anaphoric (7). Marinis (2003) claims that demonstratives are in complementary distribution with the indefinite article but not with a definite one.
Several studies have focused on double-definiteness in Greek, particularly in L1 acquisition. The most comprehensive study was done by Marinis (2003), who found a developmental sequence for NPs with determiner spreading. Even before the age of 24 months, children’s utterances include specific definite references with demonstratives (Marinis, 2003).
Definiteness marking in Russian
Russian is an article-less language (Cho & Slabakova, 2014; Erteschik-Shir, 2014; Seres & Borik, 2018). Definiteness can be expressed by syntactic, morphological, or lexical means as well as by combinations of those. Syntactic means, namely word order, play a crucial role in the marking of (in)definiteness (a detailed analysis can be found in the study by Seres & Borik, 2018). For instance, preverbal subjects are usually perceived as definite while postverbal ones—as indefinite (Chvany, 1973; Pospelov, 1970, among others): 8. Kniga ležit na stole. book-NOM lies on table-LOC ‘The book is on the table.’ 9. Na stole ležit kniga. on table-LOC lies book-NOM ‘There is a book on the table.’
(Seres & Borik, 2018, p. vi)
Also, aspect can trigger (in)definite interpretations in certain contexts, such as with verbs of consumption and their direct objects. Perfective aspect is typically used in definite contexts, while imperfective aspect is used in indefinite contexts, as noted by various sources (Apresjan & Košelev, 1995; Filip, 1993; Schoorlemmer, 1995; Seres & Borik, 2018, among others): 10. Vasja s” el jabloki. Vasja ate-PFV apples-ACC ‘Vasja ate the apples.’ 11. Vasja el jabloki. Vasja ate-IMPF apples-ACC ‘Vasja ate / was eating (the) apples.’
(Seres & Borik, 2018, p. v)
(In)definiteness in Russian can be marked via the genitive–accusative opposition (Apresjan & Košelev, 1995; Seres & Borik, 2018). Changing the direct object’s case marking from accusative to genitive can lead to a partitive interpretation, which also triggers indefinite interpretation even with a perfective verb: 12. Vasja s” el jablok. Vasja ate-PFV apples-GEN ‘Vasja ate some apples.’
(Seres & Borik, 2018, p. v)
However, this mechanism is limited to inanimate plural and mass objects and cannot be considered a general pattern for (in)definiteness marking in Russian (Seres & Borik, 2018).
Finally, similar to the other languages involved in this work, in Russian, definiteness can be also realized by demonstratives (Padučeva, 1985; Seres & Borik, 2018): 13. Tot rebënok pel kakuju-to pesnju. That child-NOM sang some song ‘That child sang some song.’
(Seres & Borik, 2018, p iii)
Definiteness marking in Turkish
Turkish lacks a definite article and employs accusative case to mark specificity. It is often claimed that NPs containing the accusative marker not accompanied by the indefinite article bir “one” receive a definite interpretation (von Heusinger et al., 2019; von Heusinger & Kornfilt, 2005): 14. Kitab-ɩ oku-du-m book-ACC read-PRF.PST.1SG ‘I read the book.’
However, this interpretation is mainly driven by specificity and less so by definiteness as the insertion of the indefinite article bir would make the NP indefinite but specific. This relates to differential object marking (DOM) in Turkish (von Heusinger & Kornfilt, 2005).
Turkish may use demonstratives to signal a definite interpretation, as illustrated in the following examples (van Schaaik, 2020, p. 102): 15. Bu kitab-ɩ imala-ma-dɩ-nɩz! this book-ACC sign-NEG.PST.3PL ‘You haven’t signed this book!’ 16. Banka bir süre sonra o para-yɩ da öde-di. bank a while later that money–ACC too pay-PST ‘A while later the bank paid out that money.’
The two NPs that follow the Turkish demonstratives bu and o are interpreted as definite. In addition, NPs containing demonstratives are also specific, as signaled by the fact that they bear accusative. Partly, the specificity of these NPs is given by the demonstratives. Removing the demonstrative makes the NPs nonspecific and ungrammatical without the accusative case.
In addition to these lexical means of marking (in)definiteness, there are certain word order restrictions regarding (in)definite NPs as outlined by Erguvanli-Taylan (1987). For example, a sentence with a non-verbal predicate cannot have an indefinite at the beginning: 17. *Bir kadɩn o evde mutlu. a woman that house-LOC happy ‘A woman in that house is happy.’ 18. O evde bir kadɩn mutlu. that house-LOC a woman happy ‘A woman in that house is happy.’
Furthermore, she observes that indefinite direct objects will be prioritized in being placed in the immediately preverbal position compared to all other indefinite NPs (Erguvanli-Taylan, 1987). Some other findings of Erguvanli-Taylan (1987) point to a three-way relation of word order, animacy, and indefiniteness, which is beyond the scope of this article.
Definiteness marking in bilingual speakers
Previous research has investigated the expression of (in)definiteness in language contact situations with different language pairs, that is, languages with and without articles (see e.g., Aalberse et al., 2017; Backus et al., 2011 for an overview). It has been pointed out that when languages without articles are in contact with languages possessing articles, they tend to show cross-linguistic interference and contact-induced grammaticalization (see Aalberse et al., 2017; Heine & Kuteva, 2006 for more details). Other important factors are often reported to be the age of onset (AoO) of the majority language as well as the quantity of input of the heritage language (among many others Aalberse et al., 2017; Coşkun Kunduz & Montrul, 2022; Topaj, 2020).
There are no consistent studies on double-definiteness marking in bilingual adult Greek speakers except for Zombolou (2011) who makes a remark about Greek heritage speakers in Argentina. She observes that a group of heritage speakers who incompletely acquired Greek omit the definite article in NPs with demonstratives (Zombolou, 2011). This cannot be explained by language transfer since Argentinean Spanish, similar to Greek, does not allow article omission with demonstratives. This pattern is argued to align with L1 acquisition as Marinis (2003) reports: definite articles are often omitted by children acquiring L1 Greek in the context of demonstratives. See also the high omissions rate of definite article observed by Chondrogianni (2007) and Chondrogianni et al. (2015) in Greek–Turkish bilingual children.
As for Russian and Turkish, different scholars report that heritage speakers use demonstratives more frequently than their monolingual peers. For the former group, Polinsky (2006) first observed that demonstrative pronouns in NPs are widely used by American Russian speakers compared to monolingual Russians. Topaj (2020) investigated production data elicited via narration tasks from Russian children residing in Germany and claimed that they use more frequently optional linguistic means for marking definiteness, especially demonstratives. Both researchers primarily attribute this increased use of demonstratives to cross-linguistic influences (Polinsky, 2006; Topaj, 2020). Specifically, in an article-less language like Russian under language contact with article languages like English and German, demonstratives are used as compensation for definite articles, as an extension of an already available feature in monolingual Russian.
Both researchers note that Russian heritage speakers’ productions are more explicit avoiding both ambiguity and cognitive load during reference tracking (Polinsky, 2006; Topaj, 2020). Topaj (2020) states that bilingual children only after the age of 6 years pattern similarly to monolingual peers in introducing, maintaining, and reintroducing referents. These findings imply that bilingual children may require more time to acquire reference-tracking skills than their monolingual peers and, as a result, may tend to be overtly explicit in order to avoid ambiguity.
For Turkish, studies about definiteness with case marking have shown that heritage speakers can distinguish different semantic contexts and behave monolingual-like in comprehension tasks (Yılmaz & Sauermann, 2023). Studies on definiteness marking in Turkish heritage speakers residing in Germany show divergent findings. On one hand, Turkish heritage speakers tested in offline tasks overuse definite NPs, which is something that indicates a reduced sensitivity to pragmatic cues (Felser & Arslan, 2019). On the other hand, Kupisch et al. (2017) performed acceptability judgment tasks in simultaneous and sequential Turkish heritage speakers tested in existential constructions and found their grammars intact. Overall, the studies show mixed results. These mixed results indicate dynamicity in the use of definiteness-marking strategies by heritage speakers of Turkish (Backus et al., 2011; Coşkun Kunduz & Montrul, 2022; Felser & Arslan, 2019; E. Krause & Roberts, 2020; Kupisch et al., 2017; Şahin, 2015; Yılmaz, 2019; Yılmaz & Sauermann, 2023). However, in some studies, heritage speakers are in line with the expected sensitivity to factors that interplay with definiteness and even use them to a higher degree than monolinguals. For example, E. Krause and Roberts (2020) report that heritage speakers of Turkish in Germany distribute DOM in a way that is oversensitive to the animacy level of the objects.
Research questions and hypotheses
Having thoroughly considered the literature on definiteness marking by different groups of heritage speakers, we aim to cross-linguistically investigate whether different groups of heritage speakers align with monolingually raised speakers regarding the production of demonstratives (research question 1, RQ1). More specifically, we expect Greek heritage speakers in contact with German and English to produce fewer double-definiteness NPs and even omit the definite article compared to their monolingually raised peers (Chondrogianni, 2007; Chondrogianni et al., 2015; Zombolou, 2011). Based on previous literature for Russian (Polinsky, 2006; Topaj, 2020) and Turkish (Felser & Arslan, 2019; Kupisch et al., 2017), we expect heritage speakers in contact with German and English to produce more demonstratives as markers of definiteness. The broader use of demonstratives can be either due to transfer from the majority languages German and English, or due to overexpliciteness, that is, to avoid ambiguity in reference marking. As the systems of expressing definiteness by means of demonstratives in English and German are similar, we do not expect distinct effects of the two majority languages.
Moreover, we include the mode and the register variation as part of our methodology, aiming to explore whether those factors affect the distribution of definite NPs with demonstratives (RQ2). Heritage speakers are exposed mainly to everyday communication settings making use of the vernacular, which is the spoken variety (Chevalier, 2004; Dressler, 1991). Furthermore, Rothman (2007) claims that heritage speakers lack features that are transmitted via formal education. Given the fact that heritage speakers tend to overgeneralize the informal oral setting in all instances of discourse (Wiese et al., 2022), we expect monolingually raised and heritage speakers to pattern differently in the four communication settings. Specifically, we predict monolingually raised speakers to adjust the marking of definiteness according to the mode, using demonstratives more frequently in the spoken mode due to the deictic nature of demonstratives, which is associated with speech (Lyons, 1999; Wolter, 2005). In contrast, we expect heritage speakers in contact with German and English not to discriminate their use of definiteness marking with demonstratives with respect to mode and register settings due to the tendency to overgeneralize spoken and informal settings.
Methodology
In the following, we provide an overview of the experimental setup, the composition of the participants’ sample, corpus annotation layers as well as the statistical approach used in this study.
Experimental design
The methodology applied in this study is an adaptation of Wiese’s (2020) set-up, the ‘Language situations’ paradigm, allowing researchers to elicit semi-spontaneous ecologically valid data. Detailed information on this method, including the experimental set-up, guidelines and all the materials, is available online on the Open Science Framework (OSF). This methodology provides comparable naturalistic data in both oral and written modes and in formal and informal registers. During the elicitation, the participants were shown a short video of a minor fictional car accident, and their task was to narrate what happened, imagining that they witnessed the incident, to either a close friend or a police officer. In that way, the participants took part in four different communication settings within one experimental session. Heritage speakers took part in two sessions at least 3 days apart, one in their majority and one in their heritage language, while monolingually raised participants took part only in one session in the majority language of the respective country of elicitation.
To test how formality and mode affect narrations, we simulated a formal spoken, a formal written, an informal spoken, and an informal written settings. The elicitation of the formal part took place in an office where the elicitor and the participant were sitting opposite each other. The elicitor of this register was dressed in a suit, used the standardized language and honorifics, and always contacted the participant via e-mail to set an appointment for the session. The spoken narration was to leave a voice mail on the answering machine of the police department, while the written one was to type a witness report on the police laptop. For the elicitation of the informal part, another elicitor was involved in this process who was casually dressed and was very talkative using the vernacular. A small chitchat preceded the main session in order to familiarize the participants with the elicitor and to ensure that the participants adapted to the informal communication situations. After this, the elicitor asked the participant to narrate the contents of the video in a voice message on WhatsApp to a close friend. The written task was to send a text message to the same close friend about the accident shown in the video again on WhatsApp. The whole session was recorded for transparency reasons and the data files have been pseudonymized. Elicitation orders and the languages in which the experiment took place were balanced.
After the data were transcribed and the basic annotation layers were carried out, they were published as the RUEG Corpus (Wiese et al., 2020), created within the Research Unit Emerging Grammars in Language Contact Situations: a Comparative Approach (RUEG). The multilevel annotated corpus RUEG is available in the ANNIS interface (T. Krause & Zeldes, 2014) and contains audio for spoken data and visualization options and consists of six sub-corpora for English, German, Greek, Russian, Turkish, Kurmanji, and several additional sub-corpora with special annotation such as referent tracking and syntax. The data for this study were drawn from the 0.4.0 sub-corpora versions of the RUEG corpus (Wiese et al., 2020) and contains narrations of 548 speakers in total. This number comprises monolingually raised and heritage speakers of Greek, Russian, and Turkish. Table 2 provides the numbers of the speakers, tokens in each sub-corpus, and the number of the NPs, which is taken as the upper limit for the normalization of the data.
Participants information.
Note. NP: noun phrase; AoO: age of onset.
Participants
The overall information about the number of participants grouped by country of elicitation, number of tokens and NPs, as well as mean AoO of the majority language can be found in Table 2. Our participants were recruited in urban areas via calls in mailing lists, social media, educational institutions such as schools, universities, language courses, and public organizations such as libraries, youth and sports clubs, shopping centers, and consulting offices. All bilingual participants in the study have acquired the heritage language from birth. The participants in the United States were tested from September 2018 until March 2019 in the greater Washington area, New Jersey, Chicago, and New York City. In Germany, the elicitation took part from September 2018 until January 2021 in Berlin and Potsdam. The participants were invited to participate if they spoke the relevant heritage language regularly with some members of their family. Before the experiment, the participants were informed about their rights and the procedure of the experiment and asked to sign the consent form. In the case of minors, one of their parents or legal guardians has been asked to sign the consent form. All participants reported no speech disorder and normal or corrected-to-normal hearing and vision.
The age ranges of the adolescent and adult groups comprise 14–18 and 22–35 years, respectively. An important criterion in the adolescent participant recruitment was school attendance, that is, the adolescent participants had to still attend school or at least had just finished attending school. This was used to ensure that they were involved in permanent contact with their school peers at the moment of the data elicitation. Another exclusion criterion was the amount of education in the heritage language: bilingual candidates with a high amount of formal education in the heritage language were not admitted to the experiment. Specifically, the participants might have attended bilingual primary schools or Saturday schools before but must have attended monolingual high schools with or without heritage language classes after that. The majority of the participants also took some private classes in their heritage language at some point during their education. All participants were either born in the host country, the United States, or Germany, or arrived there as young children with their families. The AoO of the majority language English or German does not exceed 48 months. Thus, the sample consists of simultaneous and sequential bilingual heritage speakers.
Monolingually raised participants from Greece, Russia, and Turkey were recruited in a similar way to bilingual participants in Germany and the United States, based on their assumption of speaking only the country’s majority language at home and in daily life. The age range of the monolingually raised speakers was the same as for heritage speakers. The monolingual data for Greek were collected in Athens in March 2019, while for Russian, participants were elicited in St. Petersburg from November to December 2018. Monolingually raised speakers of Turkish were recruited in the cities of İzmir and Eskişehir during September-October 2018. All participants signed a consent form in their respective majority language, had normal or corrected-to-normal hearing and vision, and had no speech disorder.
Corpus annotation and queries
The data for this study were drawn from the open-access RUEG corpus. The annotation in the sub-corpora for Greek, Russian, and Turkish includes i.a. the part of speech (POS) and lemma annotation and was carried out semi-automatically 3 according to the Universal Dependencies (UD) POS tags, lemmas, and features scheme (De Marneffe et al., 2014; Zeman, 2008). POS annotation was processed after the UD-POS scheme, according to which demonstratives belong to the determiners class. Another annotation level is the communication units (CUs) segmentation (Loban, 1976). A CU corresponds to a simple independent clause, whereas a dependent clause never obtains a separate CU (cf. Topaj, 2020, p. 114). The annotation was carried out by native to near-native speakers of the respective languages according to the annotation guidelines, as specified in the RUEG corpus documentation (https://korpling.german.hu-berlin.de/rueg-docs/v0.4/).
We extracted all CUs containing demonstratives that precede nouns in the sub-corpora for Greek, Russian, and Turkish separately. 4 Since in this study, we focus on the relevant demonstratives in the three languages of interest, we adjusted the queries by searching only for these forms on the CU, lemma, and POS tiers. Specifically for Greek, we searched for all instances of aftos and ekinos followed by determiners and nouns. For Russian, the query searched for all instances of the demonstratives ėtot and tot preceding nouns. The query for Turkish involved the demonstrative bu, şu, and o in contexts where they are preceding nouns and serving an attributive function as a demonstrative. In every query for every language, we added the relevant metadata such as country of elicitation (for the heritage groups, the countries are the United States and Germany, and for the monolingual groups are Greece, Russia, and Turkey, respectively) and the parameters of register (formal and informal) and mode (spoken and written).
Statistical analysis
We built three binomial generalized linear mixed-effects models with contrast sum-coding for the categorical predictors Country, Mode, and Register in R (R Core Team, 2020) using lme4, huxtable, and jtools packages (Bates et al., 2015; Hugh-Jones, 2021; Long, 2020). The sumcoding is particularly relevant for the Country variable, as it represents our conceptual idea of language varieties. In sum-coding, none of the levels serves as a baseline. Instead, the grand mean of all levels is taken as the baseline. This allows us to compare the trends and distributions in the data on equal grounds for each variety (monolingual, heritage in Germany, heritage in the United States). The binary dependent variable has two levels that encode whether a given NP is preceded by a demonstrative (1) or not (0). Random intercepts for participants were included. For data pre-processing and visualization, we used the tidyverse library (Wickham et al., 2019). Code and data that reproduce all analyses and visualization in this article can be found at OSF: https://osf.io/aqymt/?view_only=04b5d54aca434b5589fccd864c6923cb.
Results
This section presents the descriptive results for monolingual and heritage Greek, Russian, and Turkish, grouped by language. Subsequently, it discusses the outcome of a comparative approach, which included three linear mixed-effects regression models. In order to ensure that the numerical data are comparable cross-linguistically, the frequencies of definiteness structures under investigation are normalized by the total number of NPs for the visualizations.
Descriptive results grouped by language
Greek demonstratives/double-definiteness
The results for heritage Greek indicate that the double-definiteness pattern, involving both demonstratives and determiners, is preserved and used by both groups productively, although heritage speakers in our sample might have difficulties in establishing agreement patterns 5 as it has been reported in previous studies (Alexiadou et al., 2020; Paspali & Marinis, 2020).
Figure 1 demonstrates the distribution of the definite NPs with demonstratives and the individual variance within the different groups of Greek speakers. On all figures in this subsection, the speakers’ groups on the x-axis are represented according to the country of elicitation and the y-axis shows the percentages of the definite NPs with demonstratives by each participant. The boxplots in the graph show the distribution of the data into quartiles, representing 25% of the data points each, with the mean marked by a red dot and a median marked by a black horizontal line. Each black dot represents one individual speaker. 6 The empty red circles show outliers regarding the underlying distribution in the boxplots. As one can see, the in-group variance in the distribution of definite NPs with demonstratives in Greek is quite high across the three groups.

Distribution of definite NPs with demonstratives across different groups of Greek speakers.
Figure 2 shows the distribution of double-definite NPs across four different communicative situations varying by register (formal registers in the left part of the graph vs informal registers in the right part) and mode (spoken mode in the upper part vs written mode in the lower part). The vertical lines represent standard errors with the mean of the distribution for given conditions and the country of elicitation in the middle. We can observe an effect of register as the frequency of definite NPs increases in informal communicative situations across all groups of Greek speakers while no effect of mode can be observed.

Distribution of definite NPs with demonstratives per groups across communicative situations.
Russian demonstratives
Similar to the results for Greek presented in the section “Greek demonstratives/double-definiteness,” the variance in the use of determiners in Russian is rather high across the groups, especially in the heritage speakers in contact with English as compared to those in contact with German and in the monolingual speakers, as shown in Figure 3.

Distribution of definite NPs with demonstratives across different groups of Russian speakers.
Figure 4 shows the distribution of definite NPs with demonstratives across the communicative situations for different groups of Russian speakers. As can be seen, a slight effect of register was spotted in the Russian sample. In particular, the frequencies of demonstratives increase in informal spoken communicative situations in the right part of the graph and decrease in formal written situations in the left part of the graph regardless of the elicitation country. Moreover, mode seems to have an impact on the use of demonstratives in Russian. Specifically, more demonstratives are used in spoken mode across all groups as can be seen in the upper part of the graph.

Distribution of definite NPs with demonstratives per groups across communicative situations.
Turkish demonstratives
Figure 5 depicts the distribution of definite NPs with demonstratives and the individual variation in each group of Turkish speakers. Again, the variation within the heritage speaker groups is higher than that within the monolingual group.

Distribution of definite NPs with demonstratives across different groups of Turkish speakers.
Figure 6 shows the distribution of definite NPs with demonstratives across the different groups of Turkish speakers. In this graph, a moderate effect of mode can be observed. Specifically, demonstratives are used more frequently in the spoken mode, as can be seen in the upper part of the graph, compared to the written mode, which can be seen in the lower part of the graph. No effect of register can be spotted on the graph.

Distribution of definite NPs with demonstratives per groups across communicative situations.
Mixed-effects regression
As described in the “Statistical analysis” subsection, we ran three separate binomial generalized linear mixed-effects models, one for each language. Since all models have the same structure, we compare the effects for each factor in turn. Table 3 provides the outputs of the three models. Specifically, the first number in the row indicates the estimate output, the second number in brackets gives the standard error.
Regression table for results of three binomial GLMMs.
Note. GLMM: generalized linear mixed-effects model.
Asterisks indicate significance levels in the following way: p < .05 values are given by one asterisk (*), p < .01 values are shown by two asterisks (**), and finally p < .001 values are summarized by three asterisks (***).
Country
We do not find any significant effect for the country of elicitation for any of the models or between any of the varieties. This does not provide any evidence for the hypothesis that opts for differences between the monolingual and heritage varieties as formulated in the section “Research questions and hypotheses.” In particular, there does not seem to be an overuse of demonstratives in the heritage varieties across all three languages involved.
Mode
We observe a significant effect for mode in the Russian and Turkish models. Particularly, the models show moderate mean effect sizes of −0.24 and −0.30. Given the two levels of the mode variable, spoken and written, this indicates that there is a significantly smaller tendency to use demonstratives in written mode, or vice versa that there is a significantly higher tendency to use demonstratives in spoken mode. We do not observe any significant effect of mode for the model that modulates the Greek data.
Register
We find a significant effect of register in the models for Greek and Russian, but we do not find any significant effect of register for Turkish. The two significant effects are of moderate effect size. Given the two levels, formal and informal, that this variable encodes, the positive estimate values indicate that there is a higher probability of using demonstratives in informal registers compared to formal ones.
Interim summary
The results of the generalized linear mixed-effects models show that heritage speakers do not extend the use of demonstratives compared to the relevant monolingual groups. In other words, heritage speakers of Greek, Russian, and Turkish analyzed in this study performed in a similar way as monolingually raised speakers.
The elicitation of different communication settings allowed us to explore the factors of register and mode in the production of definite NPs with demonstratives. The statistical models reported a moderate effect of register for Greek and Russian indicating that in informal settings speakers tend to use more NPs with demonstratives. This result does not hold for Turkish speakers, since the model for Turkish revealed no impact of register on the distribution of definite NPs with demonstratives. Besides, a strong effect of mode is observed in Russian- and Turkish-speaking participants. Specifically, in the spoken mode, participants produced significantly more demonstratives than in their written texts. No impact of mode was found in the Greek sample.
Discussion
In this paper, we aimed to get insights into the use of demonstratives in marking definiteness in NPs in heritage Greek, Russian, and Turkish in contact with English and German as well as in their monolingual variants in the respective countries. We first discussed the existing research on definite NPs with demonstratives in the majority languages English and German as well as in Greek, Russian, and Turkish. Subsequently, we provided an overview of the most relevant studies dealing with definiteness marking in bilingual speakers. Based on the literature and our theoretical assumptions, we formulated the following research questions (see the section “Research questions and hypotheses”), repeated here:
RQ1: Do different groups of heritage speakers align with monolingually raised speakers regarding the production of demonstratives?
RQ2: Do mode and register affect the distribution of demonstratives across different groups?
To answer these questions, we investigated the distribution of demonstratives in different groups of heritage and monolingually raised speakers by conducting a corpus study and applying three binomial generalized linear mixed-effects models. We will now address each research question in turn.
As for RQ1, our results show no significant effect of country of elicitation, which refutes hypotheses claiming divergent behavior of heritage and monolingually raised speakers. Specifically, heritage speakers in our sample use demonstratives to mark definiteness in a similar way as monolingually raised speakers do. Thus, our results do not provide support for transfer from the majority languages English and German. However, as mentioned in the “Results” section, all heritage speaker groups showed a high in-group variance in the use of demonstratives, indicating i.a. that some heritage speakers tend to a wider use of demonstratives. This wider use is, however, not statistically significant. It seems that these findings cautiously reveal a non-significant pattern of higher explicitness in narrations by heritage speakers of Greek, Russian, and Turkish, which is in line with some previous studies about definiteness (Polinsky, 2006; Topaj, 2020; Yılmaz & Sauermann, 2023). Crucially, this tendency cannot be attributed to transfer, since, as stated earlier, all heritage speaker groups behave alike. As specified in the section “Research questions and hypotheses,” if transfer were at play, heritage speakers of Greek should have produced fewer definite NPs, while heritage speakers of Russian and Turkish should have produced more definite NPs as compared to the respective monolingually raised speakers. Furthermore, this finding makes clear that the participants’ samples used in this research are not homogeneous. On one hand, this is obviously a limitation of this study. On the other hand, the non-homogeneity of heritage speakers as a whole appears absolutely natural and ecologically valid since it mirrors the non-homogeneity in the native language continuum they belong to (Rothman & Treffers-Daller, 2014; Wiese et al., 2022). Regarding RQ2, the results reveal that the parameters of register and mode variation affect the production of demonstratives both in heritage and monolingually raised speakers in a similar way, namely a wider use of demonstratives was observed in informal and/or spoken communicative situations. With respect to mode, the predictions were fulfilled by Russian and Turkish speakers of all groups, but not for Greek speakers. Formality, however, proved to be a predictor for Greek, and additionally for Russian, beside mode. Specifically, across all groups of Greek speakers in our sample, more double-definite patterns were found in informal communicative situations, while no effect of mode was reported by the model. Also, in all groups of Russian speakers, more demonstratives were produced in spoken mode and informal registers. Finally, in all groups of Turkish speakers, more demonstratives were used in spoken mode, while no effect of register was observed. Taken together, informal and/or spoken communicative situations seem to trigger a wider production of demonstratives to mark definiteness across languages.
In contrast to our prediction, our results show that heritage speakers, similarly to monolingually raised speakers, are able to adjust their use of demonstratives as markers of definiteness according to the mode and register variation. Hence, our findings do not support the claim that heritage speakers tend to overgeneralize spoken and informal settings, suggesting that register and mode leveling seems not to be a tendency that extends to all linguistic domains and/or all heritage speaker populations. Thus, definite NPs with demonstratives do not seem to belong to a domain vulnerable to mode and register leveling under language contact situations (contrary to verbal aspect [Alexiadou & Rizou, 2023] and the plural indefinite determiner in Greek [Alexiadou, 2014], which were found to be prone to register leveling). One possible explanation for this might be the properties of the phenomenon under investigation, for instance, its link to explicitness marking. This relates to claims in Polinsky (2018) stating that the D layer, where demonstratives are arguably located, is a resilient domain in heritage languages: D is very high in the functional sequence, and as such it is resilient to change and restructuring. Since deixis is generally highly sensitive to context and discourse, demonstratives are expected to be utilized predominantly in spoken settings cross-linguistically—a more or less universal mechanism, shared both by heritage as well as by majority language speakers. Thus, heritage speakers in our sample might take advantage of their linguistic knowledge shared by the languages they speak and transmit strategies that appear salient to them from their majority language (which is often more dominant) into the heritage one. Specifically, in the particular case of demonstratives, heritage speakers might rely on their majority language competence and utilize demonstratives in their heritage languages Greek, Russian, and Turkish according to the patterns in their majority languages English and German, which roughly coincide with the patterns of the monolingual varieties of Greek, Russian, and Turkish, respectively. However, since this analysis did not assess the appropriateness of the use of demonstratives, we cannot exclude that heritage speakers in our sample might have used demonstratives infelicitously. This aspect remains beyond the scope of this paper and must be left for further research.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Deutsche Forschungsgemeinschaft, as part of the research unit Emerging grammars in language contact situations: a comparative approach (FOR 2537) in project P10 (project no. 313607803, GZ AL 554/15-1, SZ 263/6-1, GA 1424/10-1).
