Abstract
Aims and Objectives/Purpose/Research Questions:
The paper examined whether Uyghur-Chinese early successive bilinguals fully acquire motion event construal characteristic of their L2 Chinese, and how factors such as structural overlap between their two languages (i.e., verb-framing) and path type (i.e., presence or absence of boundary crossing) shape the acquisition process.
Design/Methodology/Approach:
Adopting a developmental approach, we included both child and adult bilinguals within a single design. Participants narrated video clips depicting both boundary-crossing and non-boundary-crossing events (i.e., across vs. up/down).
Data and Analysis:
The database comprised motion event descriptions of bilinguals representing three age groups (4–6 -year-olds [AG1, no. 48], 8–10 -year-olds [AG2, no. 48], and adult bilinguals [AG3, no. 30]), and Chinese monolinguals (no. 12). Data were analysed in terms of both ‘structural’ (linguistic devices used, and how they were arranged syntactically) and ‘pragmatic’ dimensions of motion expression (how frequently speakers profiled different event components).
Findings/Conclusions:
Analyses revealed that bilinguals fully established the ‘structural’ aspects of target system at AG2, but the ‘pragmatic’ aspect became target-like at AG3. Structural overlap led to crosslinguistic influence for non-boundary-crossing events while more general effects of path type manifested only at AG1.
Originality:
The paper enriches current research by featuring a non-Western bilingual community. By focusing on a language pair that is distant genealogically (Turkic vs. Sino-Tibetan) and typologically (agglutinative vs. analytic) but shares key structures in expressing motion, it contributes to a better understanding of the role of structural factors in bilingual event construal. Its developmental approach sheds light on both the process and the product of bilingual language acquisition.
Significance/Implications:
Our findings confirm that early successive bilinguals can eventually acquire L2-specific patterns of motion construal, but the developmental asymmetry observed between the ‘structural’ and ‘pragmatic’ aspects of motion expression underlines the importance of examining both these aspects for an accurate understanding of bilingual construal of motion events.
Introduction
The expression of motion events has been an important domain for exploring the relationship between bilingual cognition and language use (cf. Daller et al., 2011; Filipović, 2022). We extend this line of inquiry to the context of early successive bilingual acquisition of a Turkic language, that is, Uyghur, and a Sino-Tibetan language, that is, Mandarin Chinese (henceforth Chinese). Although the two languages are typologically distinct (agglutinative vs. analytic), they share lexicalisation patterns in expressing motion, and we examine whether and how this structural overlap shapes bilinguals’ construal of motion events in their L2. Specifically, drawing on insights from child language research that certain aspects of motion expression develop over time (cf. Hendriks et al., 2022), we adopt a developmental approach by incorporating both child and adult bilinguals within a single design, and importantly, unlike much previous research that focused on Western immigration contexts, we study a non-Western bilingual community in a very different socio-political milieu. Overall, while our main goal is to shed light on the various factors that shape bilingual event construal over time, we hope that our findings will serve to highlight how bilingual language acquisition and use may be informed by the unique sociological realities of each bilingual community.
Motion expressions across languages
Motion events typically involve a figure moving along a path with reference to a ground in a particular manner. Talmy (2000) proposed that within this macro-event, path is the framing event and manner the co-event, and depending on whether path is expressed in the verb or in satellites (e.g., particles and verbal prefixes), he categorised the world’s languages into satellite-framed (S-languages, e.g., Germanic) and verb-framed languages (V-languages, e.g., Romance). English is an S-language because path is expressed in a satellite and manner in the main verb, as in (1); Spanish is a V-language since path is typically expressed in the verb and manner in a optional subordinate clause, as in (2). Subsequent research noted that Talmy’s typology applies primarily to the encoding of events that involve the crossing of a spatial boundary (known as ‘the boundary-crossing constraint’), and that V-languages also license satellite-framed constructions in the absence of boundary crossing, as in (3) (e.g., Aske, 1989; Slobin & Hoiting, 1994).
Implications of this typology for language use have been extensively studied in relation to Slobin’s (1996) thinking-for-speaking hypothesis where a key finding is that S-language speakers typically produce semantically dense motion descriptions compared to V-language speakers (e.g., Hendriks et al., 2022; Slobin, 2004), which is explained in structural terms. In S-languages, both event components can be compacted in a monoclausal structure (cf. Example 1), but to do the same in V-languages, a syntactically complex structure is typically required (cf. Example 2) and since these structures incur greater processing load (cf. Özçalışkan, 2015), speakers usually express the framing event and omit the co-event. This understanding of what can be expressed relatively effortlessly given the linguistic repertoire available, and speakers’ sustained experience of the relevant structures within a language community leads to the formation of specific cognitive processing routines, or thinking-for-speaking patterns, an important part of which is the knowledge of how frequently speakers of one’s own community profile specific aspects of motion (cf. Gerwien & von Stutterheim, 2021). It is these mechanisms that are said to underlie differences in semantic density between S- and V-language speakers.
Motion expressions in Uyghur and Chinese
Uyghur is a Turkic language of the south-eastern branch. As an overall typological profile, it is an agglutinative language with an SOV constituent order and a rich case marking system. In (4), path is expressed in the main verb, with the goal of motion encoded in a dative case marker, and manner in the converb, the functional equivalent of gerunds in European languages. Recent studies showed that Uyghur is a typical V-language with respect to all the above-mentioned parameters (e.g., Tusun, 2022; Tusun & Hendriks, 2019), and interestingly, it has also been observed that, compared to speakers of other V-languages (e.g., French), Uyghurs frequently make use of case markers to offer additional path information (e.g., source and goal).
The Chinese equivalent of (4) is shown in (5) where manner and path are respectively expressed in the verbal elements (marked as V1 vs. V2) of a resultative verb compound (RVC). Another possible rendition of this event would be (6) where a third verbal element (V3) is added to the RVC to encode the deictic dimension of path (although this pattern does not allow a ground NP to follow the RVC). The status of Chinese in Talmy’s (2000) typology has been a topic of much debate. Talmy originally categorised it as an S-language by considering the V2 element of the RVC, as in (6), as satellites. He maintained that, like path particles in Germanic languages, morphemes occurring in the V2 slot of an RVC belong to a closed set, and they express semantic categories such as aspect and state change. However, others (e.g., Slobin, 2004) contended that the absence of morphological marking in Chinese makes it hard to ascertain the main verb status of the various verbal elements in an RVC. This is further complicated by the fact that the V2 and V3 elements, unlike Germanic satellites, can function as full verbs (cf. Example 7). It is thus proposed that Chinese be considered an equipollently framed language (E-language) since event components are expressed in devices sharing equal grammatical status. Having analysed Chinese based on an extensive list of semantic and syntactic criteria for establishing main verb status, Talmy (2016) also conceded that RVC constructions in which the V2 element functions as the main verb are indeed equipollently framed. But beyond the RVC constructions, many usage-based studies revealed that Chinese displays clear verb-framing tendencies in that speakers use verb-framed constructions such as (7), in which path is expressed in the verb and manner in a subordinate clause, as natural means of encoding motion (e.g., Shi et al., 2018; Wen & Shan, 2021).
Based on the above, we consider Chinese an E-language with verb-framing tendencies and assume that verb-framed constructions present an area of structural overlap between Uyghur and Chinese in motion expression.
Motion expressions in (bilingual) language acquisition and use
One central question in relevant child language research has been how children acquire language-specific motion event construal (e.g., Harr, 2012; Hendriks et al., 2022), and the general understanding is that children are highly sensitive to the lexicalisation patterns from the earliest stages of language acquisition, but their ability to produce motion descriptions of adult-like semantic density develops over time. Nonetheless, such developmental studies also showed that children acquiring S-languages tend to produce semantically denser motion descriptions compared to their V-language counterparts, although those acquiring E-languages like Chinese have been found to outperform even their S-language counterparts thanks to the facilitative effects of language-specific devices (e.g., RVC). For example, Ji et al. (2011) examined the motion expressions of 3-, 4-, 5-, 6-, 8-, and 10-year-old Chinese children and their age-matched English counterparts. They found that, across age groups, Chinese children’s motion descriptions were semantically denser than their English counterparts, and most striking for the Chinese data was that there were no differences between children and adults in any of the measures performed. That is, Chinese children were fully adult-like from ages 3–4 onwards (see also Ji, 2014). Such findings highlight the impact of language-specific factors, but certain language-universal tendencies have also been documented. For instance, young children seem to experience greater difficulty encoding events that involve crossing a spatial boundary (e.g., ACROSS/INTO) than those that do not (e.g., UP/DOWN), which researchers interpret in terms of the relative conceptual complexity inherent in the two event types, and its implications for the acquisition of the relevant form-meaning mapping. Specifically, boundary-crossing (BC) events denote a categorical change of location whereas non-boundary-crossing (NBC) events a gradual change of location, and the former type of events are argued to entail a more complex process of form-meaning mapping than the latter type (e.g., Hendriks et al., 2022; Hickmann et al., 2018).
Relevant research in the context of bilingualism has concerned the extent to which bilinguals’ event construal is language-specific, and whether there is crosslinguistic influence (CLI), defined as the overuse of motion constructions in bilinguals’ one language under the influence of their other language. Studies have investigated both child and adult bilinguals, typically speaking a V-language (e.g., Spanish, Turkish) and an S-language (e.g., German, English). A common finding has been that bilinguals’ event construal is largely language-specific, but they also exhibit CLI such that they use more path verbs in their S-language and more manner verbs in their V-language (e.g., Aktan-Erciyes et al., 2020; Aveledo & Athanasopoulos, 2016; Daller et al., 2011; Hohenstein et al., 2006). Regarding factors that contribute to CLI, structural overlap across languages is important (cf. Filipović, 2022), but it is modulated by, inter alia, (shift in societal) language dominance typically associated with the amount of exposure to a given language. To illustrate, Aktan-Erciyes et al. (2020) studied 5- and 7-year-old Turkish-English bilingual children’s motion expression and found that the directionality of CLI changed with children’s language dominance profiles operationalised as amount of language input. Younger bilinguals who attended English immersion kindergartens showed greater L2 to L1 influence, but the older children attending Turkish-medium schools, where quantity of English input was much reduced, displayed L1 to L2 influence. Aveledo and Athanasopoulos (2016) reported CLI in older Spanish-English bilingual children that was absent in younger children due to the former’s greater quantity of L2 input. Moreover, consistent with findings of a recent meta-analysis that societal language dominance is a significant predictor of CLI (van Dijk et al., 2022), Daller et al. (2011) found that Turkish-German bilinguals residing in Germany displayed L2 to L1 influence, whereas bilingual returnees in Turkey showed more L1 to L2 influence (see also Hohenstein et al., 2006). Finally, there is some evidence that CLI is modulated by the specific path configurations of the denoted events. Thus, when asked to verbalise motion events with or without boundary crossing, both child (Engemann, 2022) and adult bilinguals (Alonso, 2016, 2020; Goschler et al., 2020; Hendriks & Hickmann, 2011, 2015; Larrañaga et al., 2011; Laws et al., 2022; Treffers-Daller & Calude, 2015) speaking a wide range of language combinations displayed CLI either more strongly or exclusively with boundary-crossing events. As to why this may be so – recall that it is the boundary-crossing events that impose the strictest constraint on V-languages; that is, non-boundary-crossing events give bilinguals more leeway in pushing boundaries across languages without necessarily being target-deviant in their verbalisations; in contrast, when describing boundary-crossing events, bilinguals must confront the boundary-crossing constraint and restructure their system to be target-like, and it seems that this constraint is especially difficult to (un)learn for bilinguals. Recall further that boundary-crossing events have been noted to pose a greater challenge in L1 acquisition, where the proposed explanation is that such events entail greater conceptual complexity, that is, categorical change of location (cf. Hendriks et al., 2022). That bilinguals display stronger CLI with such events may also be related to this factor and indicates that their tendency to use cross-linguistically congruent structures becomes clearer with motion events requiring a more complex process of form-meaning mapping (cf. Filipović, 2022; Tusun, 2022).
The case for Uyghur-Chinese early successive bilinguals’ motion construal
Studying Uyghur-Chinese early successive bilinguals’ motion construal contributes to current research in several ways. Most generally, Uyghur and Chinese are genetically and typologically distinct, but share verb-framing in expressing motion. Therefore, in addition to diversifying the current landscape that predominantly features European languages, this language combination helps us to better understand how crosslinguistic differences in overall typological profiles (agglutinative vs. analytic, equipollent-framing vs. verb-framing) and the similarities within a functional domain, that is, motion, (verb-framing) shape bilingual language acquisition and use. Moreover, most research has focused on bilingual situations in Western immigration contexts where the L2 is almost always societally dominant, and the L1 the heritage language, while little is known about non-Western bilingual communities where such dichotomies do not easily apply. For instance, Uyghur and Chinese are co-official in Xinjiang, and Uyghur is also a lingua franca among other ethnicities there. In addition, Uyghurs constitute nearly half of the local population, and attach great importance to maintaining their language (cf. Zang, 2015). The distinction of societally dominant versus non-dominant language is less sharp in this context. Significantly, the education system in Xinjiang is such that Uyghur children typically attend Chinese immersion programmes from kindergarten onwards (cf. Ma, 2012), thereby offering a scenario of naturalistic early successive bilingualism where quantity of L2 exposure (8 hours daily) remains constant throughout childhood. This contrasts with bilingual situations investigated in previous studies, and in view of recent findings that quantity of L2 input can serve as a proxy for language dominance (Unsworth, 2016; Unsworth et al., 2018), Uyghur-Chinese bilingualism in Xinjiang may present a situation that is conducive to more balanced bilingual language development, and more information on bilingual language acquisition and use from such settings will benefit the field at large. However, the present study complements previous research in more concrete ways. For example, previous studies have mostly looked at bilinguals’ use of manner versus path verbs with little regard for whether and to what extent bilinguals also develop target-like patterns of language use in terms of syntactically organising such lexical resources, and of the semantic density of their motion descriptions. Furthermore, existing studies included either child or adult bilinguals but never both within the same design, and as such, we lack an ‘end-state’ perspective that can help us to delineate what patterns are developmental and what are not in bilingual language acquisition and use. By way of illustration, whether CLI is a developmental phenomenon (cf. Hulk, 2017) or part and parcel of being bilingual (cf. van Dijk et al., 2022) has been debated in the context of simultaneous bilingualism, primarily with respect to the acquisition of various morphosyntactic properties. A greater understanding of this issue is arguably as important in (naturalistic) early successive bilingualism, and fresh data on the acquisition of a conceptual/semantic domain will further enrich this debate. This study asks the following two questions:
1) Whether and at what point on the developmental path do Uyghur-Chinese early successive bilinguals fully establish the L2 equipollent system for motion event construal?
2) How does crosslinguistic influence shape the acquisition process from childhood to adulthood?
In terms of predictions, we know from Ji et al. (2011) who used the same elicitation materials and analytic framework as the present study that L1 Chinese children are fully adult-like in their motion encoding already at age 3. Furthermore, Ji (2022) showed that L2 learners do not have much difficulty acquiring target-like voluntary motion expressions in Chinese. Based on these and given our bilinguals’ early age of onset (i.e., circa age 3), their systematic daily exposure to their L2 Chinese (i.e., about 8 hours per day), and their relative balanced linguistic profile (see below), one possibility was that they would fully establish the target equipollent system very early on with no effects of age on any of our measures. Specifically, bilinguals regardless of age would use equipollently-framed constructions (i.e., manner + path in an RVC) and verb-framed constructions (i.e., path in verb and manner in a subordinate clause) as frequently as Chinese monolinguals; no difference therefore would be expected between bilinguals and monolinguals with respect to how semantic components are syntactically packaged and to semantic density. In other words, there would be no CLI, a possibility consistent with recent findings that societal language dominance is a predictor of CLI (e.g., Daller et al., 2011; van Dijk et al., 2022): there would be no CLI because Uyghur and Chinese share similar societal dominance in Xinjiang. However, given the structural overlap between Uyghur and Chinese, that is, verb-framed constructions, and that structural overlap has been found to predict CLI (e.g., Filipović, 2022), bilinguals could capitalise on this shared pattern and use it significantly more frequently than monolinguals, giving rise to CLI. As to how this CLI would play out over time, it might be developmental (cf. Hulk, 2017) in that there would be an effect of age on the use of verb-framed constructions: younger bilinguals would use such constructions more frequently than older bilinguals and monolinguals. Alternatively, CLI could remain stable across the developmental span (van Dijk et al., 2022) such that no such age effects would be observed: both younger and older bilinguals would consistently use the verb-framed constructions more frequently than monolinguals. In either scenario, and in view of the special status of boundary-crossing events in child language acquisition (cf. Hendriks et al., 2022) and bilingual language acquisition contexts (cf. Alonso, 2020; Engemann, 2022), it was hypothesised that younger bilinguals would experience greater difficult expressing BC events, and that CLI would to be stronger with BC events than NBC events.
The study
Participants
Three groups of speakers participated in the study: 1) 96 Uyghur-Chinese early successive bilingual children comprising four age groups with 24 children per group: 4-year-olds (age range 3:11–4:7; mean age 4:6), 6-year-olds (age range 5:9–6:6; mean age 6:5), 8-year-olds (age range 7:9–8:4; mean age 8:4), and 10-year-olds (age range 9:8–10:7; mean age 10:6); 2) 30 Uyghur-Chinese early successive bilingual adults (age range 19–21); and 3) 12 Chinese monolingual adults. To test for effects of age, we collapsed 4- and 6-year-olds into one age group (AG1), and 8- and 10-year-olds into one group (AG2) with bilingual adults constituting AG3. Bilingual children were recruited from Chinese immersion kindergartens and primary schools in Ürümchi in Xinjiang. The recruitment process began with an initial teachers’ screening that identified those who were raised in Uyghur families, and for the 4-year olds, those perceived as highly proficient in Chinese. Once the appropriate children were identified, their parents completed a questionnaire on family language practice, literacy activities and parent’s ratings of children’s proficiency (on a scale of 1–10) in Uyghur and Chinese. A similar questionnaire was administered to bilingual adults, who were first-year university students that had also attended Chinese immersion schools. Based on the questionnaire, we selected only those who had been exclusively exposed to and used Uyghur outside school and at home, thereby balancing out their 8 hours of daily immersion in Chinese at school, and those whose proficiency ratings for both languages were 8 or above. They were thus relatively balanced bilinguals. Chinese monolingual adults were university students in Beijing. 1
Experimental stimuli and procedures
Experimental stimuli consisted of 18 short video clips featuring agents moving up (6 items), down (6 items) and across (6 items) various landmarks in various manners (see Appendix 1 for the full list). The ‘up’ and ‘down’ items represented the NBC condition while the ‘across’ items represented the BC condition. They were randomised into six test orders and were assigned to the participants randomly. Participants were met in a quiet room, and the videos were presented on a computer screen. To ensure maximal reliance on linguistic means rather than gestures, older and adult speakers were asked to describe the clips to an imaginary audience who had no access to the clips, but who would have to reconstruct the scene based on their descriptions. The youngest bilinguals described the clips to an adult who sat opposite them, and so could not see the clips. Each session started with a training item where participants were probed when necessary, so that they would minimally notice the manner and path components. No such probes were made for the experimental items, and great care was taken to induce maximally monolingual mode.
Coding
Responses were first segmented into clauses, defined as units containing a verb and its arguments. A response like (8-11) needed no segmentation as they were monoclausal, whereas responses like (12-13) where manner and path were distributed across two clauses either through subordination (12) or co-ordination (13) were segmented into two clauses, as indicated by angle brackets. Note that with responses like (13) where there were potentially two descriptions for the same item, namely, c1 vs. c2, we coded clause 2 that expressed path information as the ‘target response’, based on the primacy of path in motion event construal (Talmy, 2000). In this case, c2 was coded as the ‘potential target’. Each target response was then coded with respect to 1) the semantic information expressed in various linguistic devices (information locus), 2) totally number of semantic components expressed (semantic density), and 3) how semantic components were syntactically organised (syntactic packaging). For information locus, two loci were differentiated, i.e., main verb (VERB locus) vs. satellite (OTH locus). Following Hendriks et al. (2022), all linguistic devices other than the verb were considered satellites (e.g., prepositions, adverbials and gerundive forms). Thus, in the coding for the VERB locus, responses fell into three categories − manner (8), path (10), and manner + path (7), and for the OTH locus, into four categories − manner (8), path (10), manner + path (9), and a residual category of zero where no satellites were used (8). For semantic density, responses were coded as SD1 if only one component (either manner or path) was expressed (8 and 13), and SD2 when both components were expressed (9-11). Double expression of the same component was counted as expressing it only once (11). To capture the full range of syntactic constructions used, for syntactic packaging, both the ‘target response’ and the ‘potential target response’ were included, which gave rise three main strategies: tight-simple if the description was monoclausal (8-11), tight-complex if it comprised a matrix clause and a subordinate clause (12), and loose-simple if it consisted of several coordinated clauses (13).
Analysis
For statistical analyses, our independent variables were age and condition, whereas dependent variables were the mean occurrence of path verbs, manner verbs, path satellites, manner satellites, SD1 and SD2 responses, as well as tight-simple, tight-complex and loose-simple responses. The count data were analysed by fitting generalised linear mixed-effects models with a Poisson distribution, using R (R Core Team, 2017), the glmer() function in the lme4 library. We first fitted a model with the fixed effects in question against a reduced model without the fixed effects on the same data. We then compared the relative goodness of fit of the two models using a likelihood ratio test via the anova() command, which revealed the relative fits (expressed as log likelihood) of the two models to test the statistical significance of the fixed effect removed in the reduced model. For all models fitted, random intercepts for participants and items were included. Tukey tests were applied for post hoc comparisons.
Results
Information in the VERB locus
Figure 1 displays information expressed in the VERB across age groups by condition. The model revealed a significant effect of condition (χ2(17) = 761.18, p < .001), which was qualified by a further interaction between the VERB locus and age (NBC: χ2(6) = 350.78, p < .001; BC: χ2(6) = 100.87, p < .001). In the NBC condition, while the drop of path between AG1 and AG2 was significant (βAG1-AG2 = –0.76, SE = 0.15, Wald z = –4.95, p < .001), all bilingual groups expressed path more frequently than monolinguals (βAG1-CAD = –1.80, SE = 0.27, Wald z = –6.45, p < .001; βAG2-CAD = –1.03, SE = 0.28, Wald z = –3.60, p = .001; βAG3-CAD = –0.93, SE = 0.30, Wald z = –3.06, p = .010); further, compared to all other groups, AG1 expressed manner more frequently (βAG1-AG2 = –1.73, SE = 0.40, Wald z = –4.32, p < .001; βAG1-AG3 = –3.28, SE = 0.72, Wald z = 4.55, p < .001; βAG1-CAD = –2.60, SE = 0.70, Wald z = –3.69, p < .001), but manner + path less frequently (βAG1-AG2 = 0.70, SE = 0.08, Wald z = 8.00, p < .001; βAG1-AG3 = –0.80, SE = 0.09, Wald z = –8.48, p < .001; βAG1-CAD = 0.75, SE = 0.10, Wald z = 7.27, p < .001). Under the BC condition, while no age effect was found for path, AG1 expressed manner more frequently than all other groups (βAG1-AG2 = –0.85, SE = 0.14, Wald z = –5.88, p < .001; βAG1-AG3 = –0.91, SE = 0.18, Wald z = –5.37, p < .001; βAG1-CAD = –1.19, SE = 0.29, Wald z = –3.97, p < .001) but AG1 expressed manner + path less frequently than all other groups (βAG1-AG2 = 0.75, SE = 0.003, Wald z = 193.46, p < .001; βAG1-AG3 = 0.79, SE = 0.003, Wald z = 204.06, p < .001; βAG1-CAD = 0.79, SE = 0.003, Wald z = 204.54, p < .001).

Proportion of semantic information encoded in VERB locus by group and condition.
Information in the OTH locus
Figure 2 shows information expressed in the OTH locus across age groups by condition. The model revealed a main effect of condition (χ2(24) = 321.07, p < .001), which was qualified by an interaction between OTH locus and age for the BC condition (χ2(9) = 122.31, p < .001): compared to all other groups, AG1 less frequently expressed path (βAG1-AG2 = 1.56, SE = 0.30, Wald z = 5.09, p < .001; βAG1-AG3 = 1.17, SE = 0.34, Wald z = 3.42, p = .003; βAG1-CAD = 1.50, SE = 0.40, Wald z = 3.71, p = .001), manner (βAG1-AG2 = 0.84, SE = 0.29, Wald z = 2.86, p = .021; βAG1-AG3 = 1.13, SE = 0.30, Wald z = 3.68, p = .001; βAG1-CAD = 1.39, SE = 0.36, Wald z = 3.84, p < .001), manner + path (βAG1-AG2 = –1.60, SE = 0.54, Wald z = 2.93, p = .016; βAG1-CAD = 2.07, SE = 0.61, Wald z = 3.39, p = .003) but zero more frequently (βAG1-AG2 = –0.51, SE = 0.10, Wald z = –4.95, p < .001; βAG1-AG3 = –0.40, SE = 0.11, Wald z = –3.44, p = .002; βAG1-CAD = –0.78, SE = 0.19, Wald z = –3.92, p < .001).

Proportion of semantic information encoded in OTH locus by group and condition.
Semantic density
Figure 3 displays semantic density across age groups by condition. The model revealed a main effect of condition (χ2(10) = 474.13, p < .001), which was qualified by an interaction between semantic density and age (NBC: χ2(3) = 300.02, p < .001; BC: χ2(3) = 121.27, p < .001). In the NBC condition, the decline of SD1 descriptions from AG1 to AG2 was significant (β= –0.93, SE = 0.13, Wald z = –6.79, p < .001), while both these groups produced such descriptions more frequently than monolinguals (βAG1-CAD = –2.04, SE = 0.32, Wald z = –6.22, p < .001; βAG2-CAD = –1.10, SE = 0.33, Wald z = –3.30, p = .004); AG1 produced SD2 descriptions less frequently than all other groups (βAG1-AG2 = 0.68, SE = 0.84, Wald z = 8.10, p < .001; βAG1-AG3 = 0.81, SE = 0.09, Wald z = 9.01, p < .001; βAG1-CAD = 0.73, SE = 0.11, Wald z = 6.28, p < .001). In the BC condition, the decrease of SD1 descriptions between AG1 and AG2 was significant (β= –0.50, SE = 0.11, Wald z = –4.54, p < .001), while both these groups produced such descriptions more frequently than monolinguals (βAG1-CAD = –1.34, SE = 0.27, Wald z = –4.86, p < .001; βAG2-CAD = –0.83, SE = 0.28, Wald z = –2.96, p = .014); correspondingly, although the increase of SD2 descriptions between AG1 and AG2 was significant (β = 0.80, SE = 0.14, Wald z = 5.59, p < .001), both these groups produced such descriptions less frequently than monolinguals (βAG1-CAD = 1.21, SE = 0.17, Wald z = 6.80, p < .001; βAG2-CAD = 0.40, SE = 0.15, Wald z = 2.62, p = .042).

Proportion of SD1 vs. SD2 descriptions by group and condition.
Syntactic packaging
Figure 4 illustrates the use of syntactic packaging strategies across age groups by condition. The model revealed a main effect of condition (χ2(17) = 541.79, p < .001), which was qualified by an interaction between syntactic packaging strategy and age for the BC condition (χ2(6) = 61.57, p < .001): AG1 produced tight-simple descriptions more frequently than AG2 (β= –0.27, SE = 0.09, Wald z = –2.76, p = .027) while AG1 produced tight-complex less frequently than all other groups (βAG1-AG2 = 1.38, SE = 0.31, Wald z = 4.47, p < .001; βAG1-AG2 = 1.46, SE = 0.32, Wald z = 4.49, p < .001; βAG1-AG2 = 1.91, SE = 0.34, Wald z = 5.46, p < .001). No age effect was found for the loose-simple strategy.

Proportion of three syntactic packaging strategies by group and condition.
The results can be summarised as follows. In terms of the VERB locus, AG1 expressed manner more frequently but manner + path less frequently than all others across the two conditions. However, expression of path differed across conditions: while all bilingual groups encoded path more frequently than monolinguals for the NBC condition, they matched the monolingual pattern for the BC condition from AG2 onwards. That is, bilinguals fully converged on the equipollent system from AG2 but CLI (i.e., more frequent expression of path) persisted into adulthood with the NBC condition. As to the OTH locus, the predominant pattern across speaker groups was expressing zero spatial information (via bare-verb constructions), thereby demonstrating the absence of CLI in the form of expressing additional path information in satellite devices. Nonetheless, there was a general decrease of the zero category and an increase of encoding some sort of spatial information (i.e., path, manner, and manner + path), although this pattern was more salient under the BC condition than the NBC condition. Irrespective of condition, bilinguals fully converged on the monolingual pattern for encoding spatial information in the OTH locus from AG2 onwards. With respect to semantic density, the decrease of SD1 descriptions from AG1 to AG2 was significant for both conditions, but across the two conditions, these two groups produced SD1 descriptions more frequently than AG3 and CAD. That is, bilinguals’ production of SD1 descriptions dropped to the monolingual level at AG3. For SD2 descriptions, bilinguals reached the monolingual pattern earlier in the NBC (at AG2) than the BC condition (at AG3). In terms of syntactic packaging, AG1 already followed the monolingual pattern under the NBC condition by exclusively using the tight-simple strategy. Under the BC condition, no between-group differences were found for the loose-simple strategy, and while AG1 used the tight-simple strategy more frequently and the tight-complex strategy less frequently than all others, they matched the monolingual patterns at AG2.
Discussion
We set out to establish whether and at what age Uyghur-Chinese early successive bilinguals acquire motion event construal in their L2 Chinese. Chinese is predominantly equipollently framed with verb-framing tendencies, whereas Uyghur is verb-framed. We were interested in whether this structural overlap would give rise to CLI, and whether bilinguals could eventually shake off this CLI and converge on the target system. To these ends, we engaged three groups of bilinguals, AG1 (4–6 years), AG2 (8–10 years), and AG3 (adult bilinguals) and Chinese monolinguals (CAD) in a cartoon narration task, and systematically analysed their descriptions in terms of the linguistic devices used to express event components (information locus), the number of event components expressed (semantic density) and how the components were organised syntactically (syntactically packaging). Given previous research that Chinese children were adult-like from age 3 (Ji et al., 2011), and based on bilinguals’ early age of onset (circa age 3), their systematic exposure to L2 (8 hours daily) in an immersion setting, and the fact that Uyghur and Chinese enjoy comparable societal dominance in Xinjiang, we entertained the possibility that bilinguals would, like their monolingual counterparts, fully acquire the Chinese system from AG1. As such, there would be no effects of age on any of three measures. However, in view of the structural overlap between Uyghur and Chinese, and of previous assertions that it is a predictor of CLI (e.g., Filipović, 2022; Serratrice, 2013), it was alternatively predicted that there would be an L1 to L2 influence. As to how CLI would unfold from childhood to adulthood, however, we entertained two possibilities: CLI would disappear over time once the target system is fully established (cf. Hulk, 2017), or that it would persist as it is part and parcel of being bilingual (cf. van Dijk et al., 2022). The first possibility would manifest in an effect of age on the use of verb-framed constructions (path in the verb and manner in other devices) and the second possibility in the lack of such effects. We also predicted that CLI would be more pronounced in BC events than in NBC events, and that younger bilinguals would experience greater difficulty describing BC events than NBC events.
The prediction that bilinguals would fully establish the Chinese system from AG1 was disconfirmed. Our results showed that this happens much later and depends on the specific aspects of their motion descriptions. In terms of the more ‘structural aspects’, that is, lexicalisation in the VERB vs. OTH, and syntactic packaging, differences between bilinguals and monolinguals disappeared from AG2 onwards, and in this sense, one could conclude that bilinguals established the target system at AG2. However, results on semantic density showed that, with the exception of SD2 descriptions for the NBC events where the general AG2 pattern applied, bilinguals reached the adult level at AG3 only. That is, although there was no development in lexicalisation patterns and syntactic packaging from AG2 to AG3, bilinguals’ semantic density continued to develop from childhood to adulthood. These findings show that, despite their early and systematic exposure to and use of the L2, bilinguals took many years to develop the same capacity that Chinese children possess at the tender age of 3. They also underscore the distinct developmental trajectory that early successive bilinguals followed in acquiring their L2.
Overall, bilinguals’ establishing the Chinese equipollent system was a gradual process, and both CLI and other developmental factors made it so. As predicted, an L1 to L2 CLI manifested in bilinguals’ overuse of path verbs than monolinguals when describing NBC events, thereby corroborating previous findings that structural overlap is an important factor in the occurrence of CLI (Filipović, 2022; Serratrice, 2016). But the results also show that, when viewed from childhood to adulthood, CLI was both a developmental phenomenon and a stable bilingual trait. That encoding path in the verb decreased from AG1 to AG2 supports the claim that it is developmental, but that bilinguals at AG3 still did this more frequently than monolinguals suggests that CLI persisted until adulthood and is part and parcel of being a bilingual. This latter finding itself is not novel (cf. Daller et al., 2011; Hohenstein et al., 2006; van Dijk et al., 2022), but it is noteworthy because it shows that CLI also characterises bilinguals who have relatively balanced language profiles and who function in a bilingual setting where the line between the societally dominant versus non-dominant language is less clearly demarcated.
Importantly, contradicting findings of some previous studies (Alonso, 2016, 2020; Engemann, 2022; Hendriks & Hickmann, 2015), stronger CLI predicted for BC events did not occur. BC events are considered conceptually more complex (i.e., involving a categorical change of location), and presumably involve a more complex process of form-function mapping, rendering such events as especially vulnerable to CLI. A couple of factors may account for our divergent finding. First, in the studies on English-French bilinguals (e.g., Engemann, 2022; Hendriks & Hickmann, 2015), it was their French that was more susceptible to CLI. And we know from numerous previous studies using the same elicitation materials (and hence the same motion events) as the present study that there is much within-language variation in French, and this variability has been argued to create ‘noise’ in the language input, and to underlie the learner’s prolonged developmental progression (cf. Harr, 2012; Hendriks et al., 2022; Hickmann et al., 2009; Engemann, 2016; Tusun, in press). In contrast, extant research employing these materials to investigate Chinese has revealed high systematicity rather than within-language variability (cf. Ji, 2022; Ji et al., 2011), and therefore, it is possible that the attested effects of boundary crossing on CLI is modulated by whether the target language presents a relatively variable or a consistent system in the domain of motion event expression. Second, these studies, and those featuring Spanish-English bilinguals (e.g., Alonso, 2016; 2020; Larrañaga et al., 2011) either included child bilinguals or bilinguals at a lower level of proficiency. On the other hand, our bilingual speakers were relatively balanced in both their L1 and L2. It could be that the effects of boundary crossing on CLI reported in prior research are more detectable at the earlier stage of bilingual language development.
The impact of developmental factors is evident in several respects. For example, AG1 bilinguals expressed manner more frequently than all others for NBC events, and a closer look at the data established that such descriptions were given for upwards motion where the manner verb pa4 ‘to climb’ was used. Although such manner verbs do not encode directionality, they display a certain degree of path salience (cf. Lewandowski & Mateu, 2020), and child language research has found that young children tend to capitalise on such linguistic devices that enable them to simultaneously encode more semantic components (cf. Hendriks et al., 2022; Özçalışkan & Slobin, 2000). Similarly, AG1 encoded only manner in the verb more frequently than other bilinguals and monolinguals, and qualitative inspections of the data revealed that such responses represented the denoted event not as crossing a spatial boundary but as taking place within a general location. This result supports our prediction and accords with insights from child language research that children experience greater difficulty expressing events that involve a categorical change of location (e.g., Engemann, 2022; Hendriks et al., 2022). Indeed, that bilinguals reached adult levels of semantic density later for BC events (AG3) than NBC events (at AG2) may well be linked to this constraint associated with BC events.
The distinct developmental patterns characterising our bilinguals’ semantic density merit further comment. Recall that, regardless of condition, there was a decrease of SD1 descriptions and a corresponding increase of SD2 descriptions while full convergence on the monolingual pattern happened at AG3, that is, adulthood. The overall increase of semantic density with age is consistent with child language literature (cf. Hendriks et al., 2022) where the assumption is that semantic density is related to children’s developing cognitive capacities such as working memory (e.g., number of event components they can hold in memory for verbalisation). Although the same cognitive factors might have been at work in our bilinguals, this is unlikely because the fact that Chinese monolingual children already reached adult level from age 3 demonstrates how cognitive factors are also mediated by language-specific properties. That bilinguals converged on the monolingual pattern for all the ‘structural’ measures (i.e., information in VERB vs. OTH devices, syntactic packaging) at AG2 shows that they had the monolingual-like linguistic means for producing SD2 descriptions, but that the ability to habitually express manner and path together like Chinese monolinguals only emerged in adulthood points to the influence of other factors.
Thinking-for-speaking accounts of cognition and language use (cf. Gerwien & von Stutterheim, 2021) posit that speakers’ sustained experience of linguistic structures in a given community engenders tacit knowledge shared at a community level of which aspects of a motion event to make explicit or implicit, give salience to and select for expression. This ‘cognitive pragmatic knowledge’, that is, how frequently speakers of one’s own language community profile specific aspects of motion events by using certain linguistic structures under specific conditions, is extracted from speakers’ daily use of language, and is eventually stored as pragmatic principles in long-term memory. It is the automatic and effortless retrieval of such principles that is said to motivate crosslinguistic differences in motion construal (cf. von Stutterheim et al., 2020). Uyghur-Chinese early successive bilinguals lived in a community where their two languages arguably shared similar dominance status, and their daily exposure to and use of the L2 (≈8 hours in immersion programmes) remained constant from childhood to adulthood. The developmental asynchrony observed between information locus and semantic density indicates that, while this bilingual setting was sufficient for them to become perfectly target-like in using language-specific forms for encoding motion at AG2, many more years of language exposure and use was necessary to fully develop the pragmatic principles underlying event construal in the L2 (e.g., profiling manner and path simultaneously), firmly establish them in long-term memory, and automatically retrieve them in spontaneous language use (cf. Hendriks & Hickmann, 2015; von Stutterheim et al., 2021).
Conclusion
In this study, we set out to examine early successive bilinguals’ motion event construal in L2 Chinese from a developmental perspective. We wanted to establish whether and at what point on the developmental path bilinguals would establish the L2 equipollent system, and the role of structural overlap between their L1 and L2 in the acquisition process. We focused on three facets of bilinguals’ motion descriptions, that is, information locus, semantic density, and syntactic packaging, and compared them with those of monolinguals. Analyses showed that bilinguals fully converged on the monolingual patterns for information locus and syntactic packaging at AG2, but target-like performance on semantic density was achieved at AG3 (in adulthood). Overall, bilinguals’ motion event construal was shaped by both CLI and other developmental factors, but the developmental asynchrony between information locus and syntactic packaging on the one hand, and semantic density on the other reflects the relative independence of these aspects in the acquisition process. Specifically, while the linguistic forms for expressing motion events are acquired earlier in development, the pragmatic principles constraining event construal within a language community, which are formed on the basis of extensive exposure to and use of the L2 in context (cf. Gerwien & von Stutterheim, 2021), emerge much later (cf. Lambert et al., 2022).
Footnotes
Appendix 1
A full list of experimental items.
| Items in non-boundary-crossing (NBC) condition | Items in boundary-crossing (BC) condition |
|---|---|
| A mouse climbs/down up a table. | A baby crawls across a street. |
| A caterpillar crawls up/down a plant. | A man runs across the road. |
| A cat climbs up/down a telephone pole. | A boy slides across a river. |
| A bear climbs up/down a tree. | A boy swims across a river. |
| A squirrel runs up /down a tree. | A girl skates across a lake. |
| A monkey climbs up/down a tree. | A woman cycles across the train tracks. |
Acknowledgements
This paper was finalised when the first author Alimujiang Tusun was holding a British Academy Postdoctoral Fellowship (PFSS23\230033), and he wishes to thank the British Academy for their support. We are grateful to all the participants of this study, to an anonymous reviewer for their constructive feedback, and to Professor Ad Backus for his editorial support. Usual disclaims apply.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
