Abstract
Aims and objectives/purpose/research questions:
This study investigated three key issues in heritage language research. Previous research shows heritage language speakers have an advantage on tasks of oral production compared to L2 speakers who instead perform better on written tasks requiring use of metalinguistic skills. Furthermore, both L2 and heritage speakers are claimed to have a yes-bias towards retaining ungrammaticality in grammaticality judgement tests (GJTs). Finally, the morphological domain has been shown to be as problematic for heritage language speakers as L2 speakers, but research in lesser-known languages is needed.
Design/methodology/approach:
Adult L1, L2, and heritage language speakers of Italian were compared on an oral priming task and timed GJT. Both accuracy and response times were elicited from the latter test. The forms investigated were object and si-passive pronouns which lack corresponding forms in Swedish, the dominant language of the bilingual groups.
Data and analysis:
Mixed-effects regression was modelled to accuracy on the priming and GJT and response time data from the GJT. In contrast, a d-prime analysis was used to measure the degree of sensitivity to grammaticality and bias towards correct and incorrect answers in the GJT.
Findings/conclusions:
Overall, the two bilingual groups performed quite similarly across the measures tested. All three groups show high sensitivity to grammaticality and a very similar bias for yes-answers on both grammatical and ungrammatical items.
Originality:
The study is the first to employ a d-prime analysis to explore in greater detail the differences in knowledge of grammaticality between heritage, L2, and monolingual populations. It also presents a brief review of the limited existing research in heritage Italian.
Significance/implications:
Any advantages by task for either bilingual group level out by the time high proficiency levels are reached but may be associated with literacy levels when metalinguistic skills are measured. The yes-bias is likely a characteristic intrinsic to GJTs rather than a peculiarity of bilingual speaker knowledge. Morphology is problematic also for heritage speakers of lesser-known languages.
The contribution of this study is methodological. For quite some time, accuracy has been accepted as a key indicator of acquisition. In L1 acquisition, first proposed 90% to signal the acquisition of morphology in child language. In L2 acquisition, where the relevance of accuracy has been questioned, five values have been proposed: 60, 70, 75, 80, and 90. However, these values are reputed to be arbitrarily assigned since no plausible rationale has been offered. Insofar as accuracy can be deemed a valid indicator, this study argues 60% accuracy is the most appropriate value from an empirical standpoint. The acquisition of lexical verb inflection was investigated under methodologically equivalent conditions in adult L1 Spanish and L1 French learners of English and L1 Spanish and L1 English learners of Italian. The L1s selected contrasted for morphological richness with the L2. Analyses of individual participant data reveal accuracy nears 60% when lexical verb inflection is used productively and systematically in both observational and experimental tasks.
Introduction
Comparing heritage language speakers (HLSs) and L2 learners (L2Ss) has proven insightful in recent decades as it has allowed researchers to draw conclusions on the effects of early exposure to a target language and the impact qualitatively different input can have on acquisition (Montrul, 2016; Polinsky, 2018, for a state-of-the art). HLSs are simultaneous or sequential bilinguals whose weaker language is a minority language of their society, while their stronger language is dominant in that society. Oftentimes, the context in which the weaker language develops is so remarkably similar to monolinguals that HLSs have come to constitute a special type of native-speaking population. Compared to monolingual children, heritage children are exposed to the native variety already from birth and in highly communicative contexts, where aural input and spoken production abound, at the expense of the written input and production which typifies adult L2 traditional classroom learning. HLSs, often children who grow up as simultaneous bilinguals of a majority, dominant language and a minority, family language, show remarkable similarities to L2Ss. It is widely known, for instance, that properties which involve the production, comprehension, and grammatical judgement of morphological forms are considerably difficult to fully acquire by both HLSs and L2Ss (Romano, 2020; Montrul, 2010; Montrul et al., 2008; Montrul & Bowles, 2009; Montrul & Perpiñán, 2011; Polinsky, 2008, 2008b; Potowski et al., 2009; Silva-Corvalán, 1994, 2014). Furthermore, HLSs and L2Ss show similar biases towards accepting ungrammatical sentences in grammaticality judgement tests (GJTs) which Polinsky (2018) labels the yes-bias. No one, though, has yet examined sensitivity to grammaticality and the yes-bias in any detail by comparing HLSs and L2Ss. Thus, starting from the general claim of a yes-bias, the first contribution of this study is to describe heritage and L2 knowledge of ungrammaticality in a more specific manner.
One area where HLSs and L2Ss are believed to differ is task effects. It has been claimed that HLSs present advantages on tasks that require spoken production or listening comprehension, putatively owing to significant opportunity to speak and exposure to the family language from very early ages (Bowles, 2011; Montrul, 2016; Montrul et al., 2008). On the flip side, L2Ss fair better on tasks of written production and reading comprehension as these types of tasks activate metalinguistic skills developed over years of explicit language instruction (Bowles, 2011; Correa, 2011; Montrul, 2016; Montrul et al., 2008). Metalinguistic skills, here, are intended as those that tap into knowledge which requires a high degree of awareness, attention to grammatical form, use of explicit knowledge, and rely on grammatical terminology and rules (Bowles, 2011).
Most of the studies available to date have been conducted on mainstream target languages, which calls for attention to research focused on lesser-known heritage languages (Polinsky, 2018). To this effect, the objective of this study is to put the key findings above to the test by comparing heritage, L2, and L1 speakers (L1Ss) of a lesser-known language, Italian, for task effects and the yes-bias in relation to knowledge of morphology.
Task-related issues in heritage language acquisition
Scholars in the field of bilingualism have extensively debated on means of adapting experimental tasks to populations such as HLSs (Blom & Unsworth, 2010; Crain & Thornton, 1998; Ionin, 2013; Ionin & Zyzick, 2014; Mackey & Gass, 2012; McDonough & Trofimovich, 2008; Sekerina et al., 2008; VanPatten & Jegerski, 2010, inter alia). A basic yet important distinction in the HL literature is between offline and online measures (Montrul, 2016). Offline measures elicit comprehension, production, and judgement data from tasks such GJTs and written comprehension/production. In contrast, online measures aim to describe language processing as it resolves, tally a response time under pressure, an eye-gaze, or physiological responses during brain activity. Oral production is typically assumed to belong to this category of measures because production unfolds over time, opening a window onto the processes involved in the composition of sentences.
A second key characteristic of HLSs in relation to task type is reduced metalinguistic knowledge (Bowles, 2011; Correa, 2011; Montrul, 2016; Montrul et al., 2008). In particular, Montrul et al. (2008) maintain an advantage for L2Ss on tasks that involve writing and reading but a disadvantage for those that involve comprehension and speaking as these tap less into metalinguistic skills. I return to this study in more detail in the next section. Further evidence of differential performance on metalinguistic tasks between HLSs and proficiency-matched L2Ss was found in Bowles (2011). In this study, HLSs of Spanish who had received no more than 2 years of explicit classroom instruction but were exposed to the family language and consistently used it from birth were compared to two groups: (1) proficiency-matched L2Ss with an average 6 years of classroom instruction who had been exposed to Spanish after puberty; and (2) L1Ss from various Spanish-speaking countries. The differences between the HL and L2 groups, therefore, amounted to total number of years of classroom instruction and earlier age of onset for the HL group. Participants completed two online tests, namely, an oral imitation and oral narrative test, along with three offline tests, a time and untimed GJT and a metalinguistic knowledge test. It was found that the key differences amounted to higher scores for L2Ss than the HLSs on the offline tasks but lower scores for the online tasks which Bowles attributes to differences in amount of instruction and age of onset. However, on this basis of her design, the effects of age of onset cannot be disentangled from the amount of formal instruction. In other words, it is still unclear whether L2Ss perform better than HLSs on offline tasks and HLSs better than L2Ss on online tasks because of greater amounts of formal instruction, on the one hand, and later age of onset, on the other. Moreover, later age of onset is typically associated with other variables such as greater L1 transfer in L2Ss, which this study did not control for. Yet another caveat of this study was the lack of an independent measure of proficiency at the time of study. Because no measure of general proficiency was available for the three groups, it is unclear whether differences in general proficiency between the groups could have accounted for the main findings. This study addresses the above caveats by testing HL and L2 groups matched for proficiency at the time of study that have similarly received more than 3 years of formal language instruction.
Metalinguistic skills are also considered by Polinsky (2018) who claims assessment and rejection of linguistic data to be challenging for HLSs (p. 97). She also stresses the need for data elicited by tests that show participants’ awareness of what is not allowed by a language (i.e., ungrammaticality). As native, monolingual grammatical intuitions represent the benchmark for judging what is possible in a language, L1 groups are an important methodological ingredient for better understanding HL knowledge.
Baseline
When performance on certain task types is measured, it is important to establish how these function with a monolingual baseline (i.e., native, monolinguals) (Domínguez et al., 2019; Montrul, 2016). According to Domínguez et al. (2019), for instance, this should be achieved by comparison to populations that display far less variability and respond to the same instruments in a more uniform manner 1 as this allows falsification of hypotheses stemming from theory, testing the effects of linguistic and non-linguistic variables, and testing the validity of a test or task, among others.
A reliable baseline is also contemplated as a crucial element in sample selection (Polinsky, 2018). Because many HLSs are raised exposed to a variety of heritage language that is no longer spoken in the homeland, as in the case of lesser-known or endangered languages, or are exposed to input from native parents whose language has been subject to attrition, it is crucial that the sample selected for research is homogeneous. Thus, Polinsky adopts the term ‘baseline’ to indicate the variety of input to which the heritage speakers are exposed to during language development. For the purpose of conducting research on heritage languages, it is thus paramount that the variety to which speakers are exposed to from childhood be consistent for all participants.
Yes-bias
Specific task effects in HLSs are also believed to affect responses in GJTs which have been widely adapted in the generative L2 research arena as a tool for establishing nativeness of a learner’s grammar across several morphosyntactic phenomena (Polinsky, 2018, p. 95). While L2Ss have been found to judge in both native- and non-native-like manner, early HLSs perform better on GJTs than early L2Ss, yet still provide non-native judgements at more advanced levels in comparison to monolinguals (Benmamoun et al., 2013; Scontras et al., 2015, inter alia). A large body of research highlights a tendency for ungrammatical structures to be retained by L2Ss (Ellis, 1991; Flege et al., 1999; Gass, 1983; Johnson et al., 1996; Juffs & Harrington, 1996; Murphy, 1997; Orfitelli & Grüter, 2013; White, 1985, 1986). Although L2Ss tend to accept grammatical sentences at fairly native-like rates, ungrammatical structures are likely to be incorrectly retained as grammatical. This pattern, termed the yes-bias, is rooted in ‘uncertainty about language’ and is shared by HLSs and L2Ss (Polinsky, 2018). In this study, the yes-bias is tested empirically not only to establish the validity of Polinsky’s claims but also to shed light on differences in the knowledge ultimately attainable by early and late bilinguals compared to monolingual native speakers.
Knowledge of morphosyntax in HLSs
Despite often receiving exposure to language in comparable measure to monolingual children, HLSs develop grammatical knowledge that resembles L2Ss more than monolinguals in domains such as morphosyntax. Divergence between HL and L1 performance is well-attested for properties like gender assignment (Montrul et al., 2008; Polinsky, 2008a, 2008b), use of accusative clitics (Romano, 2020; Montrul, 2010), mood morphology (Montrul & Perpiñán, 2011; Potowski et al., 2009; Silva-Corvalán, 1994, 2014), and differential object marking (Montrul & Bowles, 2009). 2 To exemplify, Montrul et al. (2008) investigated gender marking in Spanish comparing L2Ss and HLSs ranging from low to advanced proficiency level to baseline controls. The L2Ss received input in both instructed and naturalistic settings after puberty, whereas the HLSs were exposed to Spanish from birth. The tasks employed were two offline tasks, a written picture identification and gender recognition task, along with an online task, an oral picture description. The tests elicited agreement for gender and number, which is marked on determiners, nouns, and adjectives, as illustrated in (1) and (2).
(1) La casa roj-a the.DET.F house.N (F) red.ADJ-F ‘the red house’ (2) El auto roj-o the.DET.M car.N (M) red.ADJ-M ‘the red car’
While gender marking on nouns is intrinsically lexical, denoted by the parenthesised attribute for gender (F)/(M) in the examples, determiners and adjectives encode a syntactic gender feature F/M which they inherit via agreement with the noun. Montrul et al. (2008) found that HLSs made significantly more errors than the L2Ss on the offline tasks but were more accurate on the online measure. On this basis, the authors concluded that HLSs have an advantage over L2Ss on tests that elicit oral production but a disadvantage for tasks that tap into more metalinguistic knowledge (p. 542). It should be noted, however, that the effect size for the L2 advantage was low (d = 0.40) compared to the HL advantage (d = 1.1) which questions the likelihood of replicating the former trend in future studies.
Another core study conducted by Montrul (2010) compared low-intermediate L2 and heritage speakers dominant in English to L1 Spanish speakers on knowledge of Spanish accusative and dative clitics. An example with accusative clitics is shown in (3).
(3) a. Juan lo mira todos los días Juan it.cl.M.O watch every day ‘Juan watches it every day’ (Montrul, 2010, pp. 170–171)
An offline written acceptability judgement task, an online oral production task, and a speeded comprehension task eliciting both accuracy and response times were employed. Results from the first task did not show any disadvantages for HLSs as differences between groups depended on the complexity of the clitic structures tested. In no structures, however, was the performance of L2Ss ever more target-like than the HLSs. In the speeded comprehension task’s offline measure, expressly the accuracy scores, there were no significant differences between HLSs and L2Ss, although the HLSs showed more monolingual-like response latencies compared to L2Ss on the online measure (i.e., the response times). Finally, in the oral production task, HLSs again showed more L1-like performance than L2.
A final study of interest is Håkansson (1995) who compared five HLSs to six L2Ss of Swedish on knowledge of nominal gender agreement. The linguistic profiles of the HLSs were variegated as three were raised and educated in the United States, while the remaining two were raised in France and Sweden but educated in French. Crucially, all five HLSs received input in Swedish from their native-speaking mother from birth. The background to the L2 group was also variegated, with L1Ss ranging from Swahili to Persian. Production data were collected via a battery of offline written tests eliciting use of plural and gender agreement, among other properties. Leaving aside issues of generalisability, this study found a numerical advantage for L2Ss on noun gender agreement compared to HLSs in written production.
The findings thus far are consistent with the idea that HL grammars are more L2-like than target-like in the morphosyntactic domain. However, according to Polinsky (2018), ‘the main challenges that remain to study this population are the general paucity of research on less-commonly studied languages’ (p. 207). To this effect, we turn to Italian for investigating similarities and differences between heritage, L2, and L1 knowledge of morphology.
Italian as an HL
Studies of Italian as an HL are sparse compared to other languages and were largely published under the umbrella field of bilingualism rather than heritage language acquisition (Romano, 2020; Bardel, 2000, 2004; Bernardini, 2003; Bernardini & Schlyter, 2004; Bernardini & Timofte, 2014; Bianchi, 2013; Bonfatti-Sabbioni, 2018; Kupisch, 2012, 2014; Serratrice et al., 2011; Torregrossa & Bongartz, 2018; Wiberg, 1996). Most of these studies are only marginally relevant and were focused on comparing different types of bilingual groups by looking at cross-linguistic influence, typological relatedness, multilingualism, and language dominance. The studies most germane to this study are Kupisch (2012) and Bianchi (2013).
Kupisch (2012) compared two groups of German-Italian intermediate to highly proficient HLSs dominant in either German or Italian to German L2Ss of Italian.
3
Participants were tested on knowledge of articles in specific and generic contexts by means of (1) an acceptability judgement task in which participants heard and read sentences that they had to repeat aloud, if grammatical, or correct, if ungrammatical; (2) an offline truth-value judgement task in which they selected one of two potential readings, generic and specific, for sentences that they heard and read. Examples from the latter task are in (6a–6b): (6) a. Picture showing tailless kangaroos with ties, test condition I canguri hanno la coda the kangaroos have the tail (False response = specific reading, True response = generic reading) ‘The kangaroos have tails./Kangaroos have tails’. b. Picture showing blue sunflowers, test condition I girasoli sono blu the sunflowers are blue (False response = generic reading./True response = specific reading)
‘The sunflowers are blue./Sunflowers are blue’. (Kupisch, 2012, p. 748)
Both tasks yielded no significant differences in the scores of HLSs and L2Ss which runs counter to the claim that HLSs have an advantage over L2Ss on production tasks, while the latter have an advantage for tasks which require metalinguistic skills. Given the acceptability judgement task was a production task, the HLSs were expected to have an advantage, whereas for the truth-value judgement task which tested written comprehension skills, the opposite pattern was expected. The biographical information available, however, indicates that there was a wide range of proficiency among both the HL and L2 groups as established by a cloze test which, in principle, can account for similar performance on both tasks. Moreover, the author points out considerable within-group variation in the results, which is also potentially implicated in accounting for lack of advantages for one or the other group. This variation, additionally, underscores the need to record key background information to control for within-group differences that can be associated with more target-like performance.
Like Kupisch (2012), Bianchi (2013) is a study comparing adult HLSs of Italian vis-à-vis L2Ss of Italian with L1 German. The study investigated gender assignment and agreement between determiners and nouns by means of two production tasks, an acceptability judgement task similar to the one employed in Kupisch (2012) and elicited production. Testing of the two properties in the acceptability judgement task is exemplified in (7a–7b): (7) a. Ungrammatical prompt sentence * Ho usato la pettine verde e l‘ ho rimessa nel have.I used the.Fcomb (M) green and it have.I put.PSTPRT.F.again in-the cassetto drawer ‘I used the green comb and I put it in the drawer again’. b. Expected correction Ho usato il pettine verde e l‘ ho rimesso nel have.I used the.M comb (M) green and it have.I put.PSTPRT.M.again in-the cassetto drawer
‘I used the green comb and I put it in the drawer again’.
As can be seen in (7a), in Italian, gender assignment and agreement liken Spanish (recall (1)–(2)), insofar as nouns are lexically marked for gender while determiners and past participles inherit gender features syntactically from nouns. Thus, the expected response in the acceptability judgement task was that respondents correct (7a) where gender misagreement occurs between the noun and the determiner or past participle by repeating the sentence out loud as (7b), where the noun, article, and past participle agree. Similar to Kupisch (2012), a large correspondence between the two groups for knowledge of both gender properties was obtained. One caveat of this study, however, is the lack of information regarding the two groups’ general proficiency level and years of instruction which may have been a determining factor in explaining the results. Bianchi and Kupisch, nevertheless, remain to date some of the best efforts comparing heritage and L2 Italian.
Research questions and hypotheses
This was motivated by three key findings in the literature on HL acquisition. Despite earlier and qualitatively native-like exposure to the family language, HLSs acquire knowledge of morphosyntax that resembles L2Ss more than L1Ss. Moreover, L2Ss and HLSs display a similar yes-bias towards retaining ungrammatical structures as grammatical in GJTs (Polinsky, 2018). However, a separate strand of research shows HLSs to have advantages against proficiency-matched L2Ss on online oral tasks, in contrast to disadvantages on offline written tasks that tap into metalinguistic knowledge (Bowles, 2011; Montrul, 2016). The following research questions were, thus, addressed:
RQ1. What are the differences in task effects between HLSs and proficiency-matched L2Ss?
RQ2. Do HLSs display the bias for retaining ungrammaticality as attested widely for L2Ss?
RQ3. Does earlier age of exposure lead to more L2-like performance in the morphosyntactic domain also for lesser-known languages like Italian?
Method
Participants
Heritage (n = 14), L2 (n = 13), and L1 (n = 19) speakers of Italian dominant in Swedish participated in the study. Several factors, including the baseline and overall proficiency whose importance was highlighted above, were controlled via a background questionnaire borrowed and a cloze test tailored to HLSs of Italian used in previous studies (Kupisch et al., 2012). The main goal during sampling was to recruit as homogeneous an HL and L2 group by reducing differences between them to quality of input received and age of first exposure. The background information collected is summarised in Table 1.
Participant information.
Note. AFE = age of first exposure; EE = estimated exposure to Italian; HLSs = heritage language speakers; L2Ss = L2 learners; L1Ss = L1 learners.
All HLSs were first exposed to the Italian from birth (age 0) except one participant whose age of onset was 6 years, but results did not deviate significantly from the other HLSs. All HLSs had two native-speaking parents except one who had a native-speaking mother only. All were born in Sweden from first-generation immigrants and self-assessed their language skills on a scale from 0 (= no proficiency) to 5 (= native-level proficiency) with respect to their ability in both languages. The scores for Italian were 3.75 in reading, 3.42 in speaking, 3.00 in writing, and 3.75 in listening, while for Swedish these were 5.00 in reading, 5.00 in speaking, 4.92 in writing, and 4.92 in listening. The self-assessed proficiency scores in reading, listening, speaking, and writing skills were highly correlated between the HL and L2 groups (Spearman’s rank-order correlations: reading-speaking, r = .92, p < .001; reading-writing, r = .78, p < .001; reading-listening, r = .99, p < .001; speaking-writing, r = .82, p < .001; speaking-listening, r = .92, p < .001; writing-listening, r = .78, p < .001) and internally consistent (Cronbach α = .94, n = 12), which attests to the strong similarity in proficiency between the two groups. Half of the sample received some instruction in the mother tongue as children, while the whole sample had received more than 3 years of formal instruction in Italian and had paid periodic visits to their home country in their lifetime. In turn, the L2 group were all born and raised in Sweden. Their starting age of first exposure was 13 years and average scores for self-assessment in the two languages were 3.08 in reading, 2.75 in speaking, 2.75 in writing, and 3.25 in listening for Italian, in contrast to 4.83 in reading, 4.92 in speaking, 4.83 in writing, and 4.92 in listening for Swedish. All the L2Ss except one periodically visited Italy and received more than 3 years of formal instruction in Italian. Four participants even completed a bachelor degree in Italy. The L1Ss of Italian were all born and raised in the Italian region of Veneto and recruited among third and fourth year undergraduate students at the Ca’ Foscari University of Venice.
The overall proficiency test was a cloze test used previously and standardised with HLSs of Italian in Bianchi (2012) and Kupisch (2013). 4 The test is a standard cloze test comprising a text (in Italian) with instructions, also in Italian, where various parts of speech were replaced by a gap for a total of 44 gaps. Each participant’s overall proficiency score was obtained by dichotomous scoring (e.g., correct = 1 point, incorrect = 0 points) of each response for each gap. I refer the reader to the original studies for more details of this test. As the project to which the study was connected focused on advanced language users and advanced HLSs have been shown to have non-native-like judgements in GJTs (Polinsky, 2018), the heritage and L2 groups were recruited as proficient as possible in Italian. Scores for the three groups were out of 100. An independent-samples t-test indicated that the HL and L2 groups did not differ significantly for proficiency, two-tailed, t(21) = 2.07, p = .43, although both groups were significantly less proficient than the monolinguals, L2 vs L1: two-tailed, t(19) = −3.64, p < .01; HL vs L1: two-tailed, t(14) = −4.02, p < .01.
Structures
Five grammatical properties were investigated. These were three syntactic structures, namely, si-passive, transitive, and clitic-left dislocation (CLLD), and two morphemes, direct object pronouns and the passivising pronoun si. As the syntactic structures are not the object of this study, they will not be discussed further. Instead, we turn to a description of the two morphemes direct object pronouns and passivising si. Because these forms are absent from Swedish, they do not bias results in favour of one or the other bilingual group as both are prone to negative transfer effects.
While si-passive pronouns can be elicited in a number of simple clauses, clitic pronouns require more complex structures that subtend referentiality and agreement dependencies, as shown in (7) previously. One such structure is CCLD in which a focalised object surfaces left of a co-referring clitic, establishing both referentiality and agreement dependency between the head noun and the clitic (8): (8) I pesci, Pietro li cucina all’ aperto the fish.DP.O.FOC Pietro.DP. S them.DP. cl.O cooks.V in-the-outdoors.PP ‘the fish, Pietro is cooking them outdoors’
Thus, the object i pesci ‘the fish’ is coreferenced to the object clitic li ‘them’ with which it must also agree in gender and number. Compared to Spanish, plural marking in Italian offers more transparent cues to gender agreement. Whereas in Spanish the plural is largely marked by addition of the -s morpheme, conflating gender (M.PL > -s, F.PL > -s), in Italian plural marking is contingent upon gender (M.PL > -i, F.PL > -e), exhibiting more transparent form-function mapping. The closest equivalent in Swedish to (8) is shown in (9a–9b): (9) a. *fiskarna, Pietro dem lagar till utomhus the fish.OBJ Pietro.SUBJ them.PRO.OBJ cooks.Vfin in-the-outdoors b. Pietro tilllagar dem utomhus Pietro.SUBJ cooks.Vfin them.PRO.OBJ in-the-outdoors ‘Pietro is cooking them outdoors’
As shown in (9a), a direct translation of (8) is ungrammatical in Swedish, which instead requires a different structure where the object dem ‘them’ must surface postverbally as in (9b). The postverbal position, however, is also a consequence of more general differences between Italian accusative-marked pronouns (i.e., clitics) and Swedish accusative-marked pronouns (i.e., strong pronouns) (Cardinaletti & Starke, 1999). While strong pronouns are nominal and pattern with full DPs for placement (i.e., postverbal, phonologically strong, and prosodically stress-bearing), clitics are, foremost, phonologically weak, non-stress-bearing, and verbal insofar as they attach to a verb host with whom they form a single syntactic constituent (Belletti, 1999; Kayne, 1975; Roberts, 2010, chap. 3; Spencer & Luìz, 2015; Sportiche, 1996). Another key difference between Italian (8) and Swedish (9) is that the intonation necessary in Italian to realise the CCLD structure cannot be realised in Swedish altogether. Hence, (9b) excludes the initial left-dislocated fiskarna ‘the fish’. All items testing direct object clitics had the same DP.O DP.S DP.cl.O VP PP structure and contained seven words except those with modal and causative verbs which contained one extra (see Supplemental Material Appendices A and B). Object referents were equally split between singular and plural and masculine and feminine.
Si-passive pronouns were elicited via passivising verb structures that, together with declarative transitive verb structures, functioned as fillers to the object pronoun items in another study (Romano, 2021). Sample items testing the passivising pronoun si and declarative transitive verb structures are exemplified in (10) and (11), respectively: (10) La mela si pela col coltello the apple.S it.cl.PASS peeled.V with.the knife.PP ‘Apples are peeled with a knife’ (11) La mamma frigge l’uovo in padella the mother.S fries V the egg.O in pan.PP ‘the mother fries the egg in a pan’
In (10), the si clitic realises passive meaning in a way sematically equivalent to the well-known venire passive. I refer the reader to Trifone and Palermo (2019, p. 188) and Sobrero (2008, p. 222) for more details of the use of the clitic si with passive meaning.
As in the case of direct object pronouns in (8)–(9) above, no equivalent to Italian passivising constructions exists in Swedish. The closest equivalent to (10) is given in (12): (12) Äpplen skalas med kniv apples.S peel.V-s.PASS by knife.PP ‘Apples are peeled with a knife’
Compared to (10), (12) lacks a clitic realising the passive meaning of si in Italian, and passivisation is instead encoded by the verb which bears the passive morpheme -s. All items took the form DP.S DP.refl.cl VP PP for si-passive and DP.S VP DP.O PP for transitive structures, respectively. Both passive and transitive structure items were exactly seven words. In the analyses, only the direct object and si pronouns will be considered.
Priming task
The first task, a variant of traditional oral structural priming tasks following the confederate scripting technique (Branigan et al., 2000), was an online measure of oral production. A plethora of experimental research utilising structural priming in L1 and L2 research has developed in recent years (Bernolet et al., 2013; Jackson, 2017, for state-of-the-art; Branigan & Pickering, 2017; McDonough & Trofimovich, 2008; Pickering & Ferreira, 2008). Structural priming is the well-known tendency of speakers to repeat and hearers to re-use a structure previously processed in the input for purposes of production or comprehension relative to one or more structures with same meaning (Bock, 1986). Due to the high focus on meaning during priming, the processes underlying access to grammatical knowledge in the production of sentences are largely believed to be unconscious (Branigan & Messenger, 2016; Chang et al., 2006; Dell & Chang, 2014; Kaschak et al., 2014). As the purpose of structural priming tasks is the elicitation of competing syntactic permutations, the elicitation of morpheme use in said permutations is well-concealed and largely devoid of explicit attention.
Participants first saw a picture containing a prime CCLD, si-passive, or transitive sentence to be read out loud (8-second time-out). In this way, the structure was primed visually, by reading the sentence, and aurally, by repeating it aloud which neutralises task advantages for either bilingual group. There followed a fixation point on the screen for 500 ms and a new blank screen containing a true or false comprehension question related to the picture (3-second time-out). Subsequently, a new slide containing text prompts for the target sentence appeared above a new, matching picture. CCLD, si-passive, and transitive targets contained 4, 3, and 3 prompts, respectively, while the morphemes under study never figured among the prompts. Participants were then instructed to use the prompts to form a complete sentence describing the picture which had to be spoken aloud before the trial timed out (10 seconds) and the next trial started. In total, there were 24 CCLD, 12 si-passive, and 12 transitive prime/target pairs which alternated so that every CCLD target would be followed by either filler type. The order of presentation of trials was automatically randomised for each participant in three Latin-squared parallel versions of the task. See Supplemental Material Appendix A for one version.
Timed GJT
The second task, a timed GJT (TGJT), availed both an offline and online measure. By eliciting accuracy in judgement of clitic form, it provided a measurement similar to the offline tasks employed in previous studies. However, the timed aspect of the test also allowed the elicitation of response times, which is an online measure. The task presented participants with grammatical and ungrammatical versions of the CCLD, si-passive, and transitive structures: A. Direct objects with CCLD la birra Marco la vuole provare al bar the beer.F.O Marco it.F.cl.O wants try at-the bar ‘The beer, Marco wants to try at the bar’ (grammatical version) *la birra Marco lo vuole provare al bar the beer.F.O Marco it.M.cl.O wants try at-the bar ‘The beer, Marco wants to try at the bar’ (ungrammatical version) B. si-passive pronouns in si-passive constructions la nota si archivia nei cassetti piccoli the note Itself.cl.REFL files in-the drawers small ‘the note needs to be filed in the small drawers’ (grammatical version) *la nota archivia nei cassetti piccoli the note Ø files in-the drawers small ‘the note needs to be filed in the small drawers’ (ungrammatical version) C. Transitive SVO with no pronoun Alice archivia la nota in ufficio Alice.3SG files.PRES.3SG the note in the office ‘Alice files the note in the office’ (grammatical version) *Alice archivio la nota in ufficio Alice.3SG file.PRES.1SG the note in the office ‘Alice files the note in the office’ (ungrammatical version)
In type A sentences, the error was of gender agreement and had to be established by identifying the pronoun and its object referent, retrieving the person, number, gender features of the left-dislocated object, and checking its features against those encoded by the pronoun. The error in sentence type B consisted of omission of the si-passive pronoun, while in sentence type C, it consisted of a subject-verb agreement error. Each sentence had to be judged as correct, incorrect, or not sure. Response times for all items were recorded and timed out after 5 seconds. The task consisted of the same 48 critical and filler sentences of the priming task, but in a grammatical and ungrammatical condition, totalling 96 items (48 × 2). The targets alternated between two parallel versions of the task so that no participant saw the same item in both grammatical and ungrammatical conditions in any one version. (Supplemental Material Appendix B). The order of presentation was automatically randomised for each participant so that items testing object pronouns were interspersed with the other two types. Sentences appeared all at once. After a response was recorded or a sentence timed out, there followed a fixation point on the screen for 500 ms and a screen containing a new sentence.
Procedure
Experiments were run on two Lenovo ThinkPad 4173DC9 laptops in designated lab spaces. Both tasks were designed and the experiment run on E-Prime experimental software 2.0. Participants’ oral responses in the structural priming task were recorded via the laptops’ in-built microphone and transcribed by the author, a native speaker of Italian. Participants completed the linguistic background questionnaire and placement test online prior to arriving at the testing lab for administration of the priming task and TGJT, always in that order. The priming task took 33 minutes, while the GJT lasted on average 5 minutes depending on individual participant speed.
Results
The accuracy rates by group in the two tests are reported in Table 2. Any responses in the TJGT faster than 1,000 ms and one L1 participant’s scores in the priming task were removed as they constituted outliers.
Accuracy in the two tasks by group.
Note. HLSs = heritage language speakers; L2Ss = L2 learners; L1Ss = L1 learners; TGJT, timed grammaticality judgement test; max = maximum number of responses including missing; missing responses were 4%, 11%, and 15% from the L1, L2, and HL priming task data and 14%, 16%, and 26% from the L1, L2, and HL data.
The rate of correct counts signalled by the percent column in the table shows that the L1 group outperformed the bilingual groups. To explore this effect better in each task, a separate mixed-effects logistic regression model was fit to the data for each task with the lme4 package in R version 3.6 (R Development Core Team, 2013). The dependent variable was coded as a binary response representing log odds of a correct/incorrect response. The model included a fixed effect of group, a factor with three levels (L1, L2, HL), with random effects for participants and items and a random slope for group by item. An analysis of variance (ANOVA) type III for the priming task finds a main effect of group, χ2(2) = 21.11, p < .001. The effect was scrutinised via the simR package in R by running a power analysis (Green & MacLeod, 2016) which tests the statistical power of the main effect. 5 This is achieved by applying the model parameters to 1,000 simulated new datasets and obtaining a percentage power. The analysis finds a 95% power to reproduce the effect, which warrants scrutiny of pairwise comparisons. The data and R script for this and subsequent analyses are available open access at https://osf.io/xz8ev/. Pairwise comparisons between the groups which were obtained by effect coding the group factor and changing the level of reference find a significant probability of the L1 group being more accurate than the HL group—β (log odds) = 1.90, SE = 0.46, z = 4.109, p < .001—and the L2 group—β (log odds) = 1.82, SE = 0.47, z = 3.835, p < .001, but no significant difference between the HL and L2 groups—β (log odds) = −.06, SE = 0.47, z = −0.142, p > .05.
Turning to the TGJT accuracy scores, an ANOVA type III finds a main effect of group, χ2(2) = 10.21, p < .01. A subsequent power analysis confirms that the probability of reproducing the effect is 86% after simulating 1,000 datasets based on the model parameters. Pairwise comparisons revealed similar results to the priming task: a significant difference between the L1 and HL groups—β (log odds) = 1.18, SE = 0.41, z = 2.881, p < .01—and the L1 and L2 groups—β (log odds) = 1.10, SE = 0.42, z = 2.605, p < .01, but no significant difference between the HL and L2 groups—β (log odds) = −0.08, SE = 0.43, z = −0.194, p > .05).
To test the yes-bias, correct and incorrect responses to grammatical items were tallied as hit and miss, respectively, while correct and incorrect responses to ungrammatical items were coded as correct rejection and false alarms, following signal detection theory. In this way, sensitivity (d′) and bias scores (C) were obtained from a d-prime (d′) analysis. 6 Results are summarised in Table 3.
D′ analysis and responses by grammaticality in the TGJT.
Note. d′ = z(total hits)−z(total fa); C = −0.5 × [z(total hits)−z(total fa)]; fa = false alarms. HLSs = heritage language speakers; L2Ss = L2 learners; L1Ss = L1 learners; TGJT, timed grammaticality judgement test
The d′ coefficient signals the group’s sensitivity to grammaticality, where scores of 0.8 and higher are commonly assumed to reflect good sensitivity. Thus, the scores for all three groups reflect high sensitivity to grammatical and ungrammatical items, with the L1 group showing an advantage. On the contrary, C signals the degree of bias towards ‘yes’ and ‘no’ answers, and correct and incorrect responses, respectively, where positive values indicate a bias towards yes-responses (hit and false alarm in Table 3) and negative values indicate a bias towards no-responses (miss and correct rejection in Table 3). Given the C for all three groups is positive, an overall bias towards yes (i.e., correct) responses is found in the data. In addition, the similarity in the three values reflects a similar bias between the monolinguals and bilinguals.
Finally, the TGJT elicited response times as an online measure. A visual plot summarising these results is given in Figure 1.

Response times for correct responses on the TGJT.
The distribution of responses is very similar between the three groups insofar as the highest counts occur at around the 2,200-ms peak and the means also near each other at 2,341, 2,543, and 2,677 ms, respectively. A linear mixed-effects regression model with response times as the outcome, group as a fixed effect factor, random effects for participants and items, and a slope for group by items was fit to the data. The ANOVA type III finds a main effect which approaches significance, χ2(2) = 5.91, p = .051. A subsequent power analysis, however, reveals that the probability of reproducing the effect is low (60%) after simulating 1,000 datasets based on the model parameters. As the power lays below 80%, the effect is too weak to be retained, given the current dataset.
Discussion
This study set out to contribute to the field of heritage language bilingualism by addressing three research questions framed by three issues drawn from the state-of-the-art. The first question asked whether HLSs perform more similarly to L2Ss or L1Ss when tested on knowledge of morphology via offline and online measures. This was achieved by controlling for factors such as the baseline, general proficiency, and amount of instruction. The expectation based on claims in Montrul (2016), Montrul et al. (2008), and Bowles (2011) was that HLSs would perform more similarly to native speakers than their L2 counterparts on online measures, while the latter would show more native-like responses on offline tests which require more metalinguistic knowledge. Three analyses of a main effect for group were conducted to test these predictions: an analysis of accuracy in the oral priming task, which constituted an online measure; an analysis of accuracy in the TGJT, which constituted an offline measure; and an analysis of response time data in the TGJT, also an online measure. The first analysis yielded a significant difference between the HLSs and L1Ss as well as the L2Ss and L1Ss but not between the bilingual groups. Although this result does not provide direct evidence to confute the claim that HLSs have advantages on online tasks, it does show that at advanced proficiency levels any differences in the effect of age and quality of exposure are levelled out with L2Ss. In other words, although HLSs may have advantages at lower proficiency levels as shown in Montrul (2008) and Bowles (2011), these are expected to fade by ultimate attainment. The second analysis conducted on the offline measure yielded the same statistical outcome insofar as the only group difference found was between the bilingual groups and the L1Ss. Thus, this result can be accounted for in the same way as the priming task but for the L2Ss: any advantages L2Ss may have for metalinguistic knowledge at lower levels of proficiency are levelled out once HLSs reach high proficiency levels. This is consistent with the background of the HLSs in this study who were all highly literate in Italian, had more than 3 years of formal instruction, and were in part Italian language teachers. Together, these traits suggest HLSs likely possessed metalinguistic awareness which aided them in completion of the offline measure. A reviewer points out the interesting possibility that the statutory heritage language instruction classes that the Swedish government makes available to HLSs from the age of 6 years, also known as the ‘hemspråk’ programme, may be implicated in the high level of metalinguistic knowledge that the HLSs in this study developed. Yet another possibility suggested by the reviewer is that the current study failed to find the effect predicted by Montrul (2008) and Bowles (2011) due to the fact that the participants in those studies had not completed higher education as in this study. Consequently, the results of this study point to a directly proportional relationship between amount of literacy and metalinguistic knowledge in HLSs, which warrants further investigation. The final analysis, namely, response times in the TGJT, did not yield conclusive evidence as the group effect only approached significance. The lack of an effect can be attributed to a real absence of the effect in the population or underpowered participant or item sample. Future studies run with a larger participant sample than 45, or more critical items than 36 are advisable. All in all, the three analyses comparing scores on online and offline measures showed the bilingual groups perform quite similarly to each other but differ from the L1 group across the three measures elicited. This then begs the question of what the similarity between HLSs and L2Ss in contrast to the L2Ss with respect to the production, intuition, and processing of morphemes in the structures studied mean. 7 We return to this below.
A performance more similar to the L1Ss, instead, was found in the analysis which addressed the second question asking whether HLSs have a similar bias to L2Ss on retaining ungrammaticality (Polinsky, 2018). A d-prime analysis revealed that all three groups had high sensitivity to grammaticality, reflected by d′ scores above 0.8. However, the analysis also showed the three groups possess a very similar bias for yes-answers as their C scores were nearly identical. Although the results confirm Polinsky’s claim, the yes-bias does not constitute a peculiarity of bilingual linguistic knowledge. Rather, it is conceivably a characteristic of GJTs administered at high proficiency levels, regardless of the number of languages spoken by participants. Methodologically speaking, d-prime analyses also enabled us to look at differences and similarities in metalinguistic knowledge between the HLSs and the other two groups in a more sophisticated fashion. These analyses overcome the common pitfall of dichotomous responses testing where participants automatically score 50% by pressing on a single response for the whole test. Therefore, the d-prime analysis is a welcome addition to future studies attempting to define knowledge on GJTs.
Finally, via the third question, the study contributed to shed light on whether earlier age of exposure leads to more target-like knowledge in vulnerable domains such as inflectional morphology. This was investigated by comparing the ultimate attainment of direct object and si-passive morphemes in heritage, L2, and L1Ss. Moreover, a lesser-known language was selected given a paucity of research in this domain (Polinsky, 2018, p. 207). Results across both the priming and TGJT test for the two morphemes, the accusative clitic inflected for number and gender and the si-passive morpheme, showed that HLSs and L2Ss are more alike than monolinguals. These results are, thus, consistent with previous research, pointing to morphosyntax as a domain that remains particularly difficult to master even for simultaneous bilinguals. However, because this study tested a language with highly transparent morphological marking and productive morphological rules, namely Italian, it makes a unique contribution to the field by strengthening the existing body of research showing L2-like performance by HLSs in other morphologically rich heritage languages such as Spanish (Montrul, 2016). In other words, the divergence between HLSs and L1Ss of Spanish for gender assignment, use of accusative clitics, mood morphology, and differential object marking documented in the literature review is not haphazard, as divergence is also attested in the morphosyntactic domain within the confines of this study which dealt with another highly inflectional language. A reviewer suggests that ‘transparence’ might be interpreted more along the lines of how transparent the L2/HL morphology is in relation to the L1 or dominant language (i.e., the speakers’ previous linguistic knowledge). However, this definition of transparence is more in line with the issue of transfer or cross-linguistic influence which is addressed full-on in Romano (2021).
Conclusion
In the spirit of hypothesis testing or data-based research whose aim is to confirm or refute a research-informed theory in the field for the purpose of improving understanding of reality (i.e., epistemology) and/or the current practices followed in research, this study sought to improve the latter. Methodologically, this study has shown that any differences in the production and intuition of inflectional morphemes between HLSs and L2Ss are bound to disappear by the time advanced levels of proficiency are reached. Nevertheless, the literacy level of HLSs may play a role in explaining the metalinguistic knowledge they eventually attain in comparison with L2Ss with substantial formal instruction. This effect is unmitigated by age of onset, consistent with previous research. In turn, although this study did not yield conclusive evidence for effects of grammaticality in bilingual and monolingual knowledge, the ability to appropriately reject ungrammatical forms and retain grammatical ones is one aspect in which bilingual competence appears to resemble a monolingual’s more closely. It is hoped future studies employing d-prime analyses more extensively will be able to shed light on this finding.
Supplemental Material
sj-docx-1-ijb-10.1177_13670069211052770 – Supplemental material for Task effects and the yes-bias in heritage language bilingualism
Supplemental material, sj-docx-1-ijb-10.1177_13670069211052770 for Task effects and the yes-bias in heritage language bilingualism by Francesco Romano in International Journal of Bilingualism
Supplemental Material
sj-docx-2-ijb-10.1177_13670069211052770 – Supplemental material for Task effects and the yes-bias in heritage language bilingualism
Supplemental material, sj-docx-2-ijb-10.1177_13670069211052770 for Task effects and the yes-bias in heritage language bilingualism by Francesco Romano in International Journal of Bilingualism
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biography
Francesco Romano is an associate professor in English Linguistics at Halmstad University. His current research includes object drop in L2 Spanish and partitive ne in L2 Italian, the processing of V-and wh-movement in English/Swedish bilinguals and the production of accusative clitics in restructuring contexts in Spanish/Italian bilinguals.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
