Abstract
This study investigates sources of phonological cross-linguistic influence (CLI) at the initial stages of third language (L3) acquisition in light of the predictions of the second language (L2) Status Factor Model, the Typological Primacy Model, the Cumulative Enhancement Model, the Linguistic Proximity Model and the Scalpel Model. The productions of L3 rhotic sounds, /w/ and final obstruent devoicing, elicited in a delayed repetition task, were analysed auditorily in two groups of adolescent instructed learners with L1-German–L2-English–L3-Polish and L1-Polish–L2-English–L3-German language backgrounds. The results showed that dominant articulatory routines from the L1 play an important role in determining the source(s) of phonological CLI in the initial stages of L3 acquisition, at least in a learning constellation when L2 articulations have not been mastered yet in a consistently target-like manner. Based on loglinear and multiple correspondence analyses, the sources of phonological CLI were found in this study to vary feature-by-feature, thus giving some support to the Linguistic Proximity Model and the Scalpel Model. However, the high inter- and intra-individual variation that was found is so far not accounted for by any of the existing models.
Keywords
I Introduction
The discipline of third language (L3) acquisition investigates multilingual language learning and use with a particular focus on the interaction of the different languages in the multilingual mind. While typical second language (L2) learners can rely on only one language (their first language, L1) in the acquisition of an L2, L3 learners have two previously learnt languages that potentially compete (or combine) as a source of cross-linguistic influence (CLI) during the acquisition of their L3. It is the goal of this article to examine how this complex interaction between a multilingual’s languages works in the domain of phonology. Focussing on adolescent learners who are just beginning to acquire an L3 in a formal learning setting and using a delayed repetition task, a range of phonetic and phonological features were elicited in all three of their languages. The main questions investigated are the source(s) of phonological cross-linguistic influence at this initial stage and the conditioning factors thereof, thus contributing to the larger question of linguistic representation in the multilingual mind.
II Existing L3 acquisition models
The past two decades have seen a growing body of research concerned with describing and explaining the mechanisms that determine transfer/cross-linguistic influence 1 at the initial (and later) stages of L3 acquisition. Thus far, the proposed theoretical models of transfer scenarios for early L3 acquisition have mostly stemmed from the morphosyntactic domain. These models differ in their predictions of the type and number of sources of transfer although they agree on the availability of all existing language systems. The L2 Status Factor (L2SF; Bardel and Falk, 2007, 2012) suggests a privileged position for the L2 as the prevailing source of transfer in early L3 learning. This is explained by the L3’s greater cognitive similarity to the L2 rather than the L1. Psycholinguistic experiments have shown that implicit and explicit linguistic knowledge are neurolinguistically distinct and have different memory sources (Paradis, 2009), the former being based on procedural knowledge and the latter on declarative knowledge. The implicitly learned L1 is thus stored in procedural memory, while the L2, if learned in an instructed setting, is based on explicit knowledge and stored in declarative memory. The L2SF predicts that the L3, if learned in the same way as the L2, will show transfer from the other non-native language as both are stored in the declarative memory; it therefore conceives of transfer at the initial L3 stage as determined by the order and type of acquisition of the previous languages.
In contrast, the Typological Primacy Model (TPM; Rothman, 2011, 2015) stipulates that this is determined by the typological proximity between the L3 and the previously acquired languages. According to the TPM, the language processor/parser initially (unconsciously) identifies the similarity between languages on the basis of lexical, phonological, morphological and syntactic overlap between them. This is conceptualized as a hierarchical process, which stops when sufficient similarity is found at one of the domains. The language that has been determined as most similar is then transferred wholesale to the L3. The main assumption of this model is the principle of general cognitive economy, which leads to a holistic transfer from one of the L3 learner’s previously acquired languages only.
Three other models of early L3 acquisition allow multiple sources of CLI at the initial stages of L3 acquisition. The Cumulative-Enhancement Model (CEM; Flynn et al., 2004) holds that language acquisition is cumulative in nature, i.e. all previously known languages are available for transfer, but only as long as this is facilitative. Based on the assumed lack of redundancy in linguistic representation, the model predicts that any instance of non-facilitative transfer from previously learned languages would be neutralized or blocked. In contrast, both facilitative and non-facilitative transfer are theorized in the Linguistic Proximity Model (LPM; Westergaard et al., 2016) and the Scalpel Model (SM; Slabakova, 2017). Both models predict that specific aspects of the L1 or the L2 transfer into the L3 property-by-property, depending on the perceived similarity between structures regardless of the order of acquisition or general typological grouping. While the LPM explicitly acknowledges the possibility of a combined transfer from the two previously learned languages onto the L3, the SM identifies factors such as processing complexity and structure frequency in the target L3 as affecting how similarity between the L3 and L1/L2 properties is assessed. As the two latter models do not specify which mechanisms are operative in the assessment of structural similarity and how the multitude of factors involved in determining the source of transfer are weighed, an objective testing of the models may be limited (for a discussion, see González Alonso and Rothman, 2017). For this reason, in the present study, we focus only on the structural similarities of each property analysed here to test the predictions of these selective transfer models.
In addition, although not having been formulated as a theoretical model as such, another scenario for CLI at the initial stages of L3 learning is possible. In the context of L3 morphosyntax, Hermas (2010) proposed the idea of an exclusive L1 transfer reflecting the order of acquisition, with L1 being the default language from which transfer originates. In the realm of L3 phonological production, this account would be in line with proposals suggesting that L1 phonetic settings represent a fundamental constraint on articulation that results from a speaker’s automatized neuro-motor routines (Hammarberg, 2001; Ringbom, 1987), which may be especially evident in early L3 speakers with low L2 proficiency (Patience and Qian, 2022). This was a learning scenario for multilinguals in the present study.
This study seeks to test the predictions of the existing models of early L3 acquisition and the L1 transfer proposal in the domains of phonetics and phonology. Although the theoretical frameworks have mostly stemmed from (morpho)syntactic data, their authors did not limit their predictions to syntax only but rather offered a general theoretical perspective without indicating explicitly, at least in our understanding, which specific language domain they are limited to. For example, Rothman (2015) in his TPM proposes an implicational hierarchy of input cue weighting (i.e. lexicon > phonology > morphology > syntax) before the parser decides which source of transfer to select, thus referring to different language domains. Further, according to Rothman et al. (2019), the core of the disagreement between particular L3 models is the conflicting interpretation of psycholinguistic literature on inhibition and activation, since both wholesale transfer and property-by-property transfer models are motivated by language experience and cognitive economy. As such, these are broadly understood issues, related to a general rather than domain-specific perspective on language acquisition. Thus, it can be argued that these models are designed to explain L3 acquisition in its broader scope rather than a selected domain. Also, in their keynote paper ‘The Full Transfer/Full Access model and L3 cognitive states’ Schwartz and Sprouse (2021) present a thorough overview of L3 acquisition transfer models by pinpointing their strong points as well as flaws of a conceptual and empirical nature. They conclude by stating that ‘there are many, many research avenues to take in L3 acquisition, all in pursuit of a better understanding of language acquisition overall’ (2021: 21). Consequently, the present study follows one of such avenues by testing the predictions of the existing L3 acquisition models with reference to the acquisition of L3 phonology.
Previous research on phonological CLI in L3 acquisition does not show conclusive evidence for any of the models individually: while Williams and Hammarberg’s (1998) and Hammarberg’s (2001) case study of the phonology of an L1 English and L2 German speaker producing her first words in L3 Swedish seems to support the L2SF, Wrembel (2012) and Lloyd-Smith et al. (2017) found partial confirmation of the TPM in their studies on foreign accentedness. Likewise, Cabrelli and Pichan (2021) found evidence for transfer driven by global structural similarity, at least in a language triad in which there is a clear similarity between the L3 and one of the source languages. In contrast, other studies have found some evidence consistent with the predictions of the CEM: Onishi (2016) on L3 phoneme perception, Sypiańska (2016) on L3 vowel production and Wrembel (2012) on foreign accent. Kopečková’s (2014), Patience’s (2018), and Chan and Chang’s (2019) studies further reported a privileged role of the L1 in initial L3 sound perception and production, whereas Kopečková et al. (2016) and Domene Moreno (2021) reported combined forms of phonological CLI.
There are a number of important issues to be considered with respect to the findings of these L3 phonology studies. For one, although L3 learners at early stages of acquisition were included in the studies, they did not always represent the very initial stage of L3 acquisition, having been exposed to the L3 for a couple of months rather than hours (but see Cabrelli and Pichan, 2021). Likewise, they mostly analysed the acquisition of a single structure rather than compared across several structures, which precludes an adequate testing of models such as the LPM or the SM (for a notable exception, see Domene Moreno, 2021). In the present study, therefore, the acquisition of three unrelated phonetic and phonological structures in initial stage L3 learners will be analysed. Moreover, in some studies the L2 under examination was the language typologically closest to the examined L3, so that a potential L2 status effect could not be teased apart from the effect of typological proximity (see Wrembel, 2010, 2012). To persuasively investigate separately the effect of the L1 and/or the L2 on initial L3 phonological acquisition, data from all of the multilingual’s languages at that stage must be collected, and, ideally, in a study design which allows for group comparisons for language status (see Geiss et al., 2021).
In the present study, we therefore aim to provide additional evidence from initial L3 phonological acquisition that probes the predictions of the L3 models and proposals discussed above in a manner that suitably tests their various predictions in this specific domain of language. To this end, L3 learners after 20 hours of exposure (5 weeks of instruction) were investigated. These learners consisted of two groups: L1 Polish speakers with L2 English and L3 German, as well as L1 German speakers with L2 English and L3 Polish. The following section presents the phonetic and phonological features under investigation and how they are expected to surface, based on the predictions of the introduced models.
III Selected phonetic and phonological features
Rhotic sounds, /w/, and final obstruent devoicing were selected as focal features for the empirical research reported here, as they have a relatively different standing in the phonological repertoire of the L3 learners in this study. As illustrated in Table 1, the rhotic sounds are realized in different ways in all of the three languages. In Polish, they are produced as an apical trill and intervocalically or in fast speech as a tap (Jassem, 2003). In British English, to which our participants were mostly exposed, as a post-alveolar approximant /ɹ/ and prevocalically [ɹ̣] (Ladefoged and Maddieson, 1996). In Standard German, the conservative uvular trill /ʀ/, present mainly in word-initial positions, is in most cases produced as the uvular fricative /ʁ/ (Kohler, 1999). It is to be noted, however, that there is substantial regional variation in the realization of the rhotic in native German (Wiese, 2003) with some of them being vocalized in intervocalic or word-final position. The second feature under examination, the labial-velar approximant /w/, exists in both Polish and English, while German only has as its closest equivalent the labiodental fricative /v/. This, however, is typically realized as a weak labiodental approximant [ʋ] in German (Hamann and Sennema, 2005). Finally, the three languages differ in the realization of coda obstruents. While English retains a voicing contrast in a syllable-final position, this opposition is neutralized in German and Polish (Gonet, 2001; Smith et al., 2009). However, in consonant clusters, Polish regressive voicing assimilation affects obstruents preceding voiced consonants (Rubach, 1996).
Phonetic and phonological features under investigation in the present study.
Considering typological relatedness at the level of phonetic and phonological features, for both learner groups, the rhotic sounds of each of their languages are unrelated. Yet, based on the acoustics of the rhotic sounds concerned, /ɹ/ is arguably more similar to /r/ than /ʁ/ is (Lindau, 1985: 166–67; see below, for more details). For L1 German speakers, the labial-velar approximant /w/ is shared in their two non-native languages, while for L1 Polish there is an absence of this sound in the target L3. Final obstruent devoicing is shared in the L1 and the L3 of both learner groups, although the L1 Polish learners have greater experience with voicing obstruent sounds word-finally due to the regressive voicing assimilation in their L1. For an overview of the typological relationships per feature and group, please see Appendix 1.
Thus, considering the predictions of the five models of L3 morphosyntax acquisition and the L1 transfer proposal presented above, for both groups of learners, the L2SF Model predicts that the learners of the present study transfer from their L2 English only. In contrast, the L1 transfer proposal predicts the two learner groups to transfer solely from their L1 in the production of the L3 rhotic sounds, /w/ (relevant only for the L1 German group) and final obstruent devoicing due to articulatory routines established in their L1. The TPM also predicts a whole sale transfer from only one of the learner’s previously acquired languages, but depending on which is typologically closer to their L3 as per an implicational hierarchy of similarity between lexicon, phonology, morphology and syntax. According to TPM, and based on the parser’s assessment of lexical similarity in the first instance, the preferred source of transfer for the L1 German group is more likely to be L1 German due to a vast number of loanwords from German in Polish (Czarnecki, 2014), whereas for the L1 Polish group it should be L2 English due to genetic relatedness (and hence lexical overlap) between German and English.
The CEM, conversely, predicts transfer from both the L1 and the L2, but only if this transfer is facilitative. For the L3 rhotic, the learners are predicted to neither transfer the rhotic of their L1 nor their L2 but to acquire it in the way it occurs in L1 acquisition (see Flynn et al., 2004: 14); /ʁ/ has been reported to be initially substituted with [h] and [ʔ] by L1 German children (Fox, 2015), while /r/ is substituted with [l] and less often [j] by L1 Polish children (Łobacz, 1996). Considering final obstruent devoicing, no voicing is predicted by the CEM to be produced in the learners’ L3. For the L1 German learners, and, if acquired in their L2 English, /w/ is predicted by the CEM to be transferred to their L3 Polish. Like the CEM, the LPM and the SM also predict a property-by-property transfer from either of the two previously acquired languages. In contrast to it, they would, however, also predict non-facilitative transfer such as the transfer of the L1 or L2 rhotic, depending on the learners’ perceived structural similarity between the respective sounds. For the L1 German group, the L3 rhotic sounds are predicted to reflect L2 transfer as /ɹ/ is likely to be perceived by the L1 German group as more similar to /r/ than /ʁ/ is in terms of place of articulation, while for the L1 Polish group, the L3 rhotic sounds are predicted to reflect L1 transfer due to similar pulsing patterns in alveolar and uvular trills in Polish and German, respectively (Lindau, 1985: 166–67). It is of note that the target L3 rhotic sounds can thus pose different degrees of perceptual as well as articulatory complexity (Colantoni and Steele, 2008; Patience, 2018; Wrembel et al., 2022). Regarding L3 Polish /w/, L1 German learners are predicted to transfer L2 English /w/ if acquired in the L2. Concerning final obstruent devoicing, the LPM and SM predict facilitative transfer from the learners’ L1, for the latter model it being additionally a case of a universally more frequent structure to be transferred.
Fundamentally, if all three of the tested features in this study are to reflect diverse transfer sources, we will have evidence of a property-by-property transfer in disfavour of wholesale approaches. If these transfer sources are additionally to be of both facilitative and non-facilitative nature, we will be able to offer some evidence in line with LPM and SM predictions. Table 2 and Table 3 below provide a summary for each learner group, respectively, regarding the predictions put forth by the models and the L1 transfer proposal as per the specific features under investigation.
Predictions of the tested models in terms of sources of phonological cross-linguistic influence (CLI) for the L1 German group and features under investigation.
Notes. L2SF = second language Status Factor. TPM = Typological Primacy Model. CEM = Cumulative-Enhancement Model. LPM = Linguistic Proximity Model. SM = Scalpel Model.
Predictions of the tested models in terms of sources of phonological cross-linguistic influence (CLI) for the L1 Polish group and features under investigation.
Notes. L2SF = second language Status Factor. TPM = Typological Primacy Model. CEM = Cumulative-Enhancement Model. LPM = Linguistic Proximity Model. SM = Scalpel Model.
IV Research questions
The aim of the present study is to investigate phonological CLI at the initial stages of L3 acquisition including three phonetic and phonological features in two parallel groups of young multilinguals.
The following research questions are posed:
Research question 1: What is/are the source/s of phonological CLI at the initial stages of L3 acquisition?
Research question 2: Are the CLI patterns similar across the investigated features?
Research question 3: Do patterns of CLI differ across the two groups with different L1s?
Research question 4: Which of the current L3 acquisition models is best supported with the data generated in the present study?
V Method
The study involved a design with two learner groups, in which one group’s L1 was the other group’s L3 and vice versa (i.e. Group 1: L1 German, L2 English, L3 Polish; Group 2: L1 Polish, L2 English, L3 German). The groups were matched for age, proficiency level in L2 English as well as the length of initial exposure to their respective L3s.
1 Participants
The participants were 35 adolescents aged 11–13 years (mean 12.03 years), who had been learning their L2 English for 5–6 years in a school setting (pre-intermediate level, 2–3 hours/week) and who had just started learning an L3 (beginner level, 4–5 hours/week). Group 1 included 21 L1 German speakers who had been learning L3 Polish for five weeks (~20 hours, native Polish teacher). Group 2 included 14 L1 Polish speakers who had been learning L3 German for five weeks (~20 hours, L1 Polish teacher of German). None of the learners spoke any other languages. Participants with any previous exposure to the respective L3s were excluded (as was the case for eight out of the original 22 participants in Group 2) to control for a genuinely initial stage of their L3.
2 Delayed repetition task
The participants performed a production task in their three languages in which they heard a stimulus word in a carrier phrase in the L1 and L2 (‘Ich sage X zu dir’ [ɪç zagə X tsu diɐ] in German, ‘Mówię X do ciebie’ [muvi̯e X dɔ tɕebi̯e] in Polish, and ‘I say X to you’ in English, spoken by a native speaker of these languages). After a prompt (e.g. ‘And what do you say?’ in English) spoken by a different speaker, the learners were to repeat the entire first phrase. In their L3, due to their minimal language competence, the learners heard only the target word without a carrier phrase followed by the prompt (an inter-stimulus interval of 1,000 ms was set between the end of the carrier phrase/target word and the beginning of the prompt). Hence, they also repeated only the target word in their L3. A stimulus was not replayed even if the learner did not repeat it after the prompt. Table 4 shows the test items included in the study.
Items elicited in the L1/L3 German, L2 English and L3/L1 Polish delayed repetition tasks in the present study.
The minor differences between the number of rhotic (n = 8 in German, n = 7 in English, n = 6 in Polish), /w/ (n = 6 in English, n = 4 in Polish) and final obstruent items (n = 6 in German, n = 6 in English, n = 6 in Polish) included in the task were unavoidable due to the testing of additional phonetic and phonological features (not reported here) present or absent in the respective languages, and the attempt to keep the total number of items in the task and across the three languages equal.
3 Language background interview
In order to collect information about the individual learner’s multilingual backgrounds, each participant in the study took part in a detailed interview in their L1 at the end of the testing session, eliciting biodata, information about their language learning history (age of learning, length and intensity of instruction), language use (declared percentage in varied situations/contexts), and attitudes towards foreign language learning in general and towards their new L3s in particular.
4 Procedure
The testing sessions were conducted with each participant individually at their school. The recordings were made with the use of a Lavalier lapel microphone and a portable digital recorder Roland R-26 at 44.1 kHz sampling rate with 16-bit quantization. The data was elicited on two different days: one session with a native speaker of English as instructor in which all of the L2 data was collected, and a second session carried out in the participants’ L1 with a native speaker of this language as instructor, in which first the L1 and subsequently the L3 data was collected. Due to the learners’ limited competence in the L3, instructions for the task were given in their L1.
5 Coding and analysis
All target sounds in the token words produced by the L3 learners in their three languages were auditorily analyzed and transcribed separately, using IPA symbols, by two trained phoneticians, who had either German or Polish as their native language as well as near-native competence in English. In case of disagreement, a third rater was consulted, who assessed the production by means of an auditory analysis and an inspection of the spectrogram, if necessary.
While the analysis of the rhotic and /w/ are generally categorical in nature, an analysis of voicing is not; therefore, the following procedure was followed regarding the learners’ productions of word-final obstruents, based on an auditory analysis: a sound was coded as ‘partially voiced’ if at least two raters decided that it was ‘in-between’ voicing and devoicing, i.e. neither clearly voiced nor clearly devoiced, regardless of the preceding vowel length. Cases with short or intermediate vowel length, but clear voicing were coded as ‘voiced’.
The L3 sounds were subsequently categorized as follows:
CLI from L1: when an L1 sound or process was produced, e.g. an L1 German speaker who produces [ʀ] in German also produces [ʀ] for /r/ in her L3 Polish; within this category, we also coded cases of L3 realizations that originated in the L1 of the speaker, but may be a result of a many-to-one type of association, i.e. ‘when two or more languages interact with one another and concur in influencing the target language, or when one language influences another, and the already influenced language in turn influences another language’ (de Angelis, 2007: 21). This category was labelled ‘L1-combined’, since L1 influenced both the speaker’s L2 and L3 (whether cumulatively or via the influenced L2), and can be exemplified by realizations of final obstruent devoicing by an L1 German / L1 Polish speaker in all of their languages.
CLI from L2: when an L2 sound was produced, e.g. an L1 Polish speaker who produces [ɹ] in English also produces [ɹ] for [ʁ] in her L3 German; or an L1 German speaker who produces [w] for /r/ in both her L2 English and L3 Polish.
Other: two types of ‘other’ sound categories were distinguished. First, when the target L3 sound, e.g. [r] in L3 Polish, was produced. Second, any other sound that was produced in the sense of sound substitution, e.g. [l] for /r/. These two cases of ‘other’ categories were labelled ‘O-L3’ and ‘O’, respectively.
Although the speakers in our study generally produced a single variant in their L1 and L2, in cases of intra-individual variation, the most frequent realization out of the total productions guided the assignment of their primary production, and, consequently, the coding of the CLI source in the speaker’s L3.
Both raw numbers and percentages for each feature realization in the L1, L2 and L3 per participant group were calculated, which allowed for the first level of interpretation of our dataset. In order to test the probability of the transfer source in our data, we ran a loglinear analysis to analyse structures of dependency for three categorical variables: ‘CLI source’ (3 levels, i.e. L1, L2, other), ‘feature’ (2 levels, i.e. rhotics and final devoicing, excluding /w/ to maintain comparability between learner groups), and ‘group’ (2 levels, i.e. L1 German and L1 Polish). This yielded a crosstabulation of categorical variables using chi-square tests for statistical significance and maximum likelihood estimation (Stevens, 2009). Finally, a multiple correspondence analysis was run to visualize the underlying associations between the three categorical variables in our dataset. The results were computed in StatSoft STATISTICA 10 (StatSoft, 2011), Version 10.
VI Results
Descriptive analyses of L1, L2 and L3 productions of the tested features by the two learner groups are presented first, followed by the results of the loglinear and multiple correspondence analysis.
1 Production of rhotics, /w/ and final obstruents by L1-German–L2-English–L3-Polish learners
As shown in Table 5, in their L1 German, the multilinguals produced the rhotic sounds overwhelmingly as a uvular fricative, with some uvular trills and very few a-schwas (= vocalized rhotics) in intervocalic position. In their L2 English, the learners realized the rhotic as the target sound, a post-alveolar approximant, and [w] to approximately the same degrees. Other occasional realizations included [r/w, j, v]. In their L3 Polish, the learners produced the rhotic mostly as a uvular fricative or uvular trill. In a fifth of all cases it was realized target-like as an alveolar trill. In only two instances did the L1 German speakers produce the Polish rhotic as the English post-alveolar approximant, with other less frequent realizations including [x], [v] and [w].
Rate of occurrence out of all tokens of the target feature (L1 German group, number of productions in brackets).
When it comes to the production of /w/, absent in their L1 inventory, the L1 German group mostly produced it target-like in their L2, yet some substitutions with [v] were also found. Other realizations included the post-alveolar approximant and [f], and in three cases the sound was not pronounced at all. In their L3 Polish, the learners realized /w/ even more accurately (target-like) and with a comparable amount of [v] substitutions and ‘other’ realizations as in their L2. Some of the non-target sounds produced by the learners seem to imitate the target sound in the manner of production rather than place. For instance, uvular trills were produced for the alveolar trill.
Devoicing was a prevalent process in the productions of all three languages of the L1 German group, alternating in varying proportions with deletion. In their L1 German and L2 English, the learners produced mainly devoiced obstruents with some of them not realized at all. In their L3 Polish, the L1 German group mostly devoiced word final obstruents, although a fifth of their obstruent productions showed (partial) voicing in this position. Again, the L1 German speakers deleted word-final obstruents in numerous cases in their L3 Polish too.
Table 6 summarizes the results for the L1 German group by way of categorization according to the sources of phonological CLI at the initial stages of L3 Polish acquisition. For the production of L3 rhotics, a strong influence of the L1 was found, with only two instances of transfer from the L2. In L3 Polish, target [r] productions occurred equally frequently as the production of other sounds, which included a variety of sound substitutions. For /w/, the identification of the source of CLI is challenging, as the production of [w] can constitute both transfer from L2 and a target L3 production. Only in four instances was it clearly a target production, as the speaker did not produce [w] in their L2 English. Final devoicing in L3 Polish presents an even greater challenge in that it can reflect both CLI from the L1 German, the L2 English and a target production (i.e. cases of final obstruent devoicing across three languages), if one-to-one type association is assumed. A concurrent influence is also possible, however; hence the distinction of an ‘L1-combined’ category for this scenario, for which we caution an attribution to a solely L1-based CLI, yet argue for many-to-one association (de Angelis, 2007: 21). A sizeable proportion of L1 German speakers’ productions of L3 obstruents (partially voiced, more specifically) could be related neither to their L1 nor their L2.
Sources of cross-linguistic influence (CLI) in percent for L3 Polish /r/, /w/ and final obstruent devoicing (total number of productions in brackets) in L1 German speakers.
Notes. L1 = L1-based CLI. L1-combined = the same realization in all three languages. L2 = L2-based CLI. O-L3 = L3 target realization. O = other sound substitutions.
An inspection of individual performances in the L1 German group revealed a relatively high degree of inter-individual variation in the production of all three features. As shown in Figure 1 on the example of the rhotic dataset, out of the 21 L1 German learners, 18 produced L1 sounds (with two of them exclusively), eight speakers produced at least some L3 targets (with two of them exclusively) and two speakers produced the L2 sound once when attempting to produce the L3 Polish rhotic, one of whom showing high intra-individual variation overall.

Individual learner (x-axis) differences in L3 Polish rhotic productions (y-axis, count): Sources of cross-linguistic influence (CLI).
2 Production of rhotics and final obstruents by L1-Polish–L2-English–L3-German learners
Table 7 shows that the L1 Polish group produced the rhotics in their L1 Polish mainly as an alveolar trill and to a much lesser degree as a tap. The rate of target-like L2 rhotic productions was found to be higher in this group than in the L1 German group (84.5% vs. 49% of all cases, respectively). In their L3 German, the learners mostly produced the rhotic either as an alveolar trill or target-like, and in some cases as a post-alveolar approximant. There was a considerable number of other realizations, which included alternative articulations as well as mispronunciations or deletions. The other articulations featured [v, w, x, l, dr, vr] as well as hybrid forms [R/x] and [R/r]. Again, some of these ‘other’ realizations match the target sound in manner rather than place of articulation, such as [v], [x] and [R/x].
Rate of occurrence out of all tokens of the target feature (L1 Polish group, number of productions in parentheses).
A strong tendency to devoice word-final obstruents in the L2 and L3 was also found in this learner group. Only 15% of English voiced obstruents were pronounced with some degree of voicing, whereas in the L3 German the voiceless obstruents were realized with (partial) voicing in a mere 6% of cases. The high proportion of voiced obstruents for L1 Polish possibly reflects the effect of the Polish carrier phrase, in which the target word to be elicited was followed by a voiced obstruent. Such an environment conditions regressive assimilation of voicing in Polish.
As far as the sources of CLI are concerned, Table 8 shows that for the L1 Polish speakers the L1 served as an important source of CLI in their L3 production of the two investigated features, although target-like realizations of German rhotics also ranked high. The results for the L1 Polish group show a greater degree of variation for the production of rhotics than those of the L1 German group, which had nearly equal distributions between the categories of L1-based, target-like sounds and other sound substitutions, with a fairly small part accounted for by L2-based CLI. Final devoicing in L3 German can again reflect both CLI from L1 Polish, L2 English and a target L3 production for speakers who devoice final obstruents in all three of their languages. Although most of the L1 Polish speakers in this study (9 out of 14 participants) were found to (partially) voice final obstruents in the L1 production task (i.e. in more than 50% of total L1 productions) and devoiced final obstruents in the L2 production task, their overwhelmingly devoiced L3 obstruent productions were interpreted as evidence for a combined CLI (‘L1-combined’) because of the L1 rule of final obstruent devoicing together with regressive voicing assimilation for this group, operative in the employed L1 Polish and L2 English delayed repetition tasks. Yet, the possibility of a target L3 production in the production task used in the present study cannot be ruled out.
Sources of cross-linguistic influence (CLI) in percent for L3 German /r/ and final obstruent devoicing (total number of productions in brackets) in L1 Polish speakers.
Notes. L1 = L1-based CLI. L1-combined = the same realization in all three languages. L2 = L2-based CLI. O-L3 = L3 target realization. O = other sound substitutions.
A closer exemplary look at the individual rhotic productions in the L1 Polish group reveals high inter-individual variation too. As shown in Figure 2, 10 out of 14 speakers replaced the L3 uvular rhotics by L1/L1-combined sounds, with participant 14 showing consistency in this regard. Another 9 participants exhibited at least some L3 target-like realizations, with participant 6 producing on-target rhotics consistently. Relatively little L2-based CLI was attested in the L1 Polish group as only three speakers produced some instances of L2-based articulations, with speaker 12 being particularly consistent. All but one participant misarticulated rhotics with various replacements.

Individual learner (x-axis) differences in L3 German rhotic productions (y-axis, count): Sources of cross-linguistic influence (CLI).
3 Sources of phonological cross-linguistic influence in the L3
In order to test the sources of phonological CLI in the two learner groups’ L3, we ran a loglinear analysis of the results for L3 rhotics and final devoicing, postulating L1/L1-combined, L2 or other (be it the L3 target, or a sound substitution) sources of CLI as fitting our data. As /w/ does not exist in both of the learners’ L3s, this feature was omitted from the statistical analyses for comparability.
The loglinear analysis produced a final model that retained all effects but that of an interaction between ‘learner group’ and ‘feature’ (for the tests of marginal and partial associations for all the variables, see Table 9). The highest-order interaction that yielded significant results was that of ‘CLI source’ × ‘group’ (χ2 (2) = 20.16, p < .001) and ‘CLI source’ × ‘feature’ (χ2 (2) = 31.58, p < .001). The likelihood ratio of this model was χ2 (0) = 0, p = 1, which suggests a good fit of the data.
Significance of the component parts of the saturated model.
Figures 3 and 4 help interpret these interactions by showing plots of the frequencies for the variables in terms of the percentage of the postulated CLI types out of the total productions. Figure 3 shows the association between type of phonological CLI and learner group. Even though the main source of CLI in the realization of the L3 by both learner groups was found to be their L1, there was a greater association between L1-based CLI and the L1 Polish group than between L1-based CLI and the L1 German group in the data. The same was true for L2-based CLI (excluding /w/) in the learners’ L3 even though a careful interpretation of this result is warranted due to the low number of occurrences overall. ‘Other’ realizations, in turn, were more frequently associated with the productions of the L1 German group; this category included L3 target and sound substitutions/hybrid forms.

Percentage of cross-linguistic influence (CLI) types in association with learner group.

Percentage of cross-linguistic influence (CLI) types in association with investigated features.
Figure 4 shows the association between type of phonological CLI and feature. Final obstruents showed a greater association with L1-based realization than rhotic sounds. CLI from the L2 was associated merely with rhotic productions, which further showed greater association with ‘Other’ sources of CLI than final devoicing.
Table 10 gives the effect of these interactions in terms of odds ratios. Regarding the L1 group, L1 Polish speakers were almost twice more likely to show CLI from their L1, seven times more likely to show CLI from their L2, and twice less likely to show CLI from other sources (OR = 0.52, 1:0.52 = 1.92) than L1 German speakers were. Regarding the investigated features, rhotic sounds showed L1 sources of CLI approximately two times less frequently than final obstruents. The odds of L2-based CLI were 21 times higher for rhotic sounds than for final obstruents. Finally, other sources of CLI were twice as likely to show for rhotic sounds than for final obstruents.
Effect size (odds ratios) for cross-linguistic influence (CLI) types per group and feature.
A multiple correspondence analysis was further performed to summarize and visualize the character of the associations between the three categorical variables in the dataset. For lack of space, we present in Figure 5 only a two-dimensional scatterplot in which 61.34% of the variance is explained.

A two-dimensional plot of multiple correspondence analysis.
Dimension 1, explaining 31.93% of the variance, discriminates between CLI sources: L1-based (right), L2-based (left) and Other (centre). Dimension 2 (29.41%) discriminates between L1 German group (top) and L1 Polish group (bottom) as well as the investigated features: rhotics (top) and final obstruent devoicing (bottom). The analysis revealed the following clusters of associations in the dataset: L1-based CLI was associated with the L1 Polish group and final devoicing, while ‘Other’ sources of CLI were associated with the L1 German group and rhotics. L2-based CLI was found to belong to neither cluster.
VII Discussion
This study’s main aim was to explore the sources of CLI at the initial stages of L3 phonological acquisition. Our findings show that, across both groups and all the features analysed, the main source of CLI was the speaker’s L1. However, in the case of the rhotics, some learners did not show any CLI, i.e. sounds drawn from their L1 or L2, at all but rather produced L3 target sounds. This is a finding that has not yet been reported in studies concerned with sources of CLI at the initial stages of L3 acquisition in the area of morphosyntax. Target-like production in these studies (e.g. Bardel and Falk, 2007; Rothman, 2011) is always interpreted as transfer from L1 or L2, either of which always has a structure identical with the L3 structure (however, for an exception, see Lloyd-Smyth, 2020). Our study seems to be the first on initial state learners in L3 phonology that has a different realization for each of the L1, L2, and L3, and thus allows the discovery of L3 productions that do not match structures (or sounds in our case) in either the L1 or L2. Yet, it seems counter-intuitive to assume that a sound of the L3 can be acquired, i.e. have a stable mental representation, at the very initial stages of learning. Possibly, our findings depend on the task involved, a delayed repetition task, where speakers might make use of their phonological memory span in order to mimic the sounds heard (for task-related effects, see also Patience and Qian, 2022). It is therefore necessary to replicate our findings with a different task that does not involve any prior listening to the sound in question. Moreover, our results call for further longitudinal investigations that test whether the learners are also able to produce those L3 target sounds at a later stage in the acquisition process.
In order to answer research question 1 on the sources of CLI at the initial stages of L3 learning fully, the high inter-individual variation found in this study needs to be taken into consideration as well. Our findings show that while some learners make use of the L1 only as a source of CLI, others use their L2 and yet others have multiple sources of CLI. The latter strategy also reflects the high intra-individual variation among those learners who produce the same target sound in a myriad of different ways. Such inter- and intra-individual variation has not been reported in studies concerned with CLI at the initial stages in the domain of morphosyntax and might thus be a characteristic of phonetic and phonological acquisition only.
Regarding research question 2, our findings show that the feature under investigation is a strong predictor of variation in sources of phonological CLI: the L1 German learners drew upon their L1 to produce the rhotic and final devoicing in their L3 Polish, while they relied predominantly on their L2 English to produce the /w/ in L3 Polish (but recall the challenge of distinguishing between L2 effects and task effects in the latter case). In the L1 Polish group, rhotics and final obstruent devoicing generated CLI from the L1 too, but also from other sources (reflected either in target realizations or various sound substitutions). There was one L1 Polish speaker who drew on her L2 in the production of L3 rhotics consistently, and two other L1 Polish speakers who also produced L2 sounds either more than or as frequently as L1 sounds for their L3 rhotics. The loglinear and multiple correspondence analysis further showed that CLI from other sources (both types of the category distinguished in this study) than the L1 or L2 was stronger for the production of rhotics than for final devoicing. In other words, the source(s) of CLI was found in this study to partially depend on the target phonological system and phonetic realization of the segments or processes in question, allowing for CLI from either or both of the prior languages of the multilingual, be it in a facilitative or non-facilitative fashion.
Research question 3 was concerned with group differences between speakers of different L1s. The performed loglinear analysis pointed to the L1 as the major predictor for patterns of the attested CLI in our data. Based on our findings with respect to the L3 production of rhotic sounds and final obstruent (de)voicing, CLI from the L1 was found to be almost two times more frequent in the L1 Polish group, CLI from the L2 was 21 times more frequent in the L1 Polish group, while CLI from other sources was twice as frequent in the L1 German group than in the L1 Polish group. This finding seems to suggest that upon learning an additional language, L3 learners initially rely on their L1 articulatory routines for sounds and processes that are perceived as similar, at least in contrast to the alternative counterparts from their previously acquired languages. For the L1 German group, this was evidenced by realizing Polish trills mostly as uvular fricatives or uvular trills and by devoicing final obstruents. For the L1 Polish group, L3 German uvular fricatives were mostly produced as alveolar trills (but also target-like and with a range of substitutions), while final obstruents were devoiced, despite the fact that this learner group voiced or partially voiced final obstruents in their L1 Polish (and devoiced final obstruents in their L2 English). Articulatory routines and perceived similarity of sounds as well as phonetic processes are not the same for each speaker; hence, the differences between our L1 groups as well as within the groups.
Which of the present L3 models that predict CLI at the initial stages of acquisition are supported by our findings? Our results seem to point at a limited extendibility of the existing models of L3 morphosyntax and their predictions on the source(s) of transfer at the initial stages to the domain of L3 phonetics and phonology. The L2SF (Bardel and Falk, 2007, 2012), which predicts CLI from the L2, fails to explain most of our results: L2 influence is rare, occurring only with one feature, and there is only one single learner who draws on the L2 exclusively for one feature under investigation. Overall, L2 sounds or phonological processes produced in the L3 have a much lower frequency than L1 and L3 sounds and processes. Like the L2SF, the TPM (Rothman, 2011, 2015) fails to predict or explain all of our results. It would predict solely L1-based CLI in the L1 German group and L2-based CLI in the L1 Polish group. Although a high frequency of L1-based CLI in the production of L3 Polish by the L1 German speakers was evidenced in the data, the learners’ realization of L3 Polish /w/ contradicts the predictions of the model. Also, while significantly more CLI from the L2 was found for the L1 Polish group than the L1 German group, the relative amount of CLI from the L2 was very low and much lower than the rate of CLI from the L1. Instead, the results of the present study are more consistent with studies observing that the L1 is the most likely source of transfer at the initial stages of L3 phonological acquisition (Kopečková, 2014; Llama and Cardoso, 2018; Patience, 2018; Pyun, 2005). One reason for why L1 transfer may persist in the phonetic and phonological domain in a way that is not the case in the morphosyntactic domain could be the neuro-motor routines which tend to be established according to the L1 articulatory patterns, at least for L3 learning contexts in which the learner’s L2 was acquired in a formal learning setting and when L2 articulations have not been mastered yet in a consistently target-like manner (see Hammarberg, 2001; Patience and Qian, 2022). The fact that we could find such an overwhelming evidence for final obstruent devoicing in the L3 of all of the learners in the study (as well as in their L2), i.e. for a phonological process that the learners were likely unaware of and thus arguably surfaced as an automatic articulatory routine in their productions, would seem to comply with this line of reasoning. It could also be that the L1-specific perceptual routines may initially guide the L3 learners’ processing (and consequently production) of their L3 speech learning (Chan and Chang, 2019; Onishi, 2016).
The CEM’s (Flynn et al., 2004) predictions cannot be substantiated with our findings either. According to this model, no negative transfer should be observed, but this was found for all features investigated and in both groups of learners. While positive transfer outweighed negative transfer in the production of Polish /w/ by the L1 German group and in the acquisition of final devoicing by both groups, negative transfer was still present to a substantial degree. The substitutions for target L3 sounds produced by our multilinguals did not correspond to those reported for L1 Polish / L1 German child phonological acquisition either (Fox, 2015; Łobacz, 1996).
Thus, the LPM and the SM (Westergaard et al., 2019), which claim that specific aspects of the L1 and L2 will transfer to the L3, seem to have gained best support by the results of the present study. For instance, the L1 German speakers’ productions of /r/ and /w/ in the L3 Polish show that CLI is clearly structure-dependent: for the production of /r/, CLI from the L1 dominates, while for /w/ the learners draw on their L2. Some of the L1 Polish speaker’s productions of /r/ in L3 German, in turn, reflected CLI from the L2 as well as cases of combined CLI, hybrid forms, sound substitutions and deleted realizations. This latter finding is in line with the predictions of the LPM about the possibility of combined CLI occurring from both previously learned languages simultaneously.
Nevertheless, even the LPM and the SM fail to account for the high intra-individual variation found in our study, although the SM can perhaps account for more of the data as it allows for factors influencing variation in the transfer source such as ‘processing complexity, misleading input, and construction frequency’ (Slabakova, 2017: 662). Similarly, our findings suggest that the complexity of perceptual sound assimilation and articulatory complexity of the target sound have a bearing on variation in phonological CLI in early L3 acquisition (see Wrembel et al., 2019, 2022). The evidence for our learners’ non-target rhotic productions that imitated the L3 sound in manner rather than place of articulation is in line with those L2 studies that theorize manner to be a more salient parameter than place of articulation and therefore acquired earlier (Colantoni and Steele, 2008). It is also noteworthy in this connection that the L1 German group, who was exposed to the speech of a native Polish teacher during their instruction, performed less target-like in the production of the articulatorily complex Polish /r/ than the L1 Polish group did, the latter of whom was exposed to the speech of an L1 Polish teacher of German showing an inconsistent realization of uvular fricatives. In other words, perceptual and articulatory complexity of the specific L3 sounds seem to have affected the specific source and degree of phonological CLI in the L3 learners’ productions.
Our findings thus do not match entirely the conclusions made by Puig-Mayenco et al. (2018) in their meta-analysis of findings from studies into the initial stages of L3 morphosyntactic acquisition (see Lloyd-Smith, 2020). In the phonological domain, the L1 influence appears to be stronger and the L2 influence weaker. While little support for the CEM is found in both domains, typological proximity does not seem to play as prime a role in phonology as it does in syntax (for previous evidence on the limited role of typology in L3 phonological acquisition, see Llama et al., 2010). Future studies should develop appropriate measures to elicit L3 learners’ comparisons of specific phonetic and phonological features to test their perception of structural similarity, and thus to address the predictions of the LPM and SM in a systematic manner.
A point of further consideration for future studies in this area is that the performance of the participants in the present study may have been prompted by the design and administration of the production task. The L1 Polish group showed a clear task effect in the large proportion of voicing productions in their L1 due to the rule of regressive voicing assimilation. In the same vein, their production of L2 English in the context of a carrier phrase, where the stimulus was followed by a voiceless consonant, may have had a bearing on their overall production of primarily voiceless targets in their L2. Also, the L3 production task was part of the L1 session to accommodate administration of and performance in an experimental task in a new language. We cannot thus exclude the possibility that the L1 mode of interaction affected the reported performance, at least to some extent. Yet, as demonstrated by the case of overwhelmingly target productions of L3 Polish /w/ by the German group, the potential effect of an L1 mode of processing was unlikely to affect their productions across the board. Future studies into the initial stages of L3 acquisition would ideally develop novel ways of introducing linguistic tasks in a new language without the need to rely on either of the previously learnt languages.
Another task-related issue is the nature of the task employed and the evidence of target realizations in the study. A repetition task may cause some speakers to perform target-like without engaging in any transfer strategy. Considering that target production of L3 rhotic sounds was reported in this study and neither L1 nor L2 could be operative, this seems a possibility. A large number of target /w/ realizations in L3 Polish by the L1 German speakers were also found. Arguably, /w/ is easier to articulate than rhotics and may thus have been an easy segment to repeat for some speakers. However, while previous work on L2 acquisition has shown that new sounds can be successfully produced by learners after a short training, even when sounds are articulatorily complex (e.g. Rafat, 2015), it is noteworthy that the production task used in the present study was a delayed repetition rather than a pure repetition task, with a relatively long inter-stimulus interval (set at 1,000 ms between the end of the carrier phrase and the beginning of the a distractor sentence), which helps to address this potential confound in the present data.
Finally, as shown in our investigation of voicing/devoicing as a process in which the L1 equals the L3 and the L2 exhibits a more marked process, it is paramount for future studies to observe the property under investigation in all of the multilingual’s languages. In this case, we aimed at examining whether the influence of the L2 may be stronger than a simple transfer from the L1 in a situation when the L1 transfer would be sufficient. Since the learners in the present study had not yet acquired the phonological process of final obstruent voicing in their L2 English, the aim has become unattainable within this project. In the future, more advanced L2 learners who have acquired word-final voicing contrast need to be examined. Even then, disentangling the exact sources of CLI is likely to remain a methodologically complex task, which cannot be resolved in one study alone. We hope that further CLI research into L3 phonology will engage in establishing suitable approaches to determining multiple sources of influence in order to contribute to novel conceptualizations of phonological CLI in L3 speech learning.
VIII Conclusions
The present study demonstrated that learners mainly, yet not exclusively, draw on their L1 in the initial stages of L3 phonological acquisition and that high inter- and intra-individual variation exists. Some support was found for those models predicting that the sources of CLI vary feature-by-feature, based on (perceived) structural similarity. Dominant articulatory routines from the L1 in the learning scenario of instructed L2 and L3 foreign language acquisition were identified as uniquely affecting the sources of phonological CLI in the initial stages. Our findings thus suggest that the initial stages of L3 phonological acquisition are marked by a complex interaction between cross-linguistic and universal effects, the latter of which may include articulatory configurations, markedness, phonological rules and processes. We suggest a dedicated model of L3 phonological acquisition (see Cabrelli-Amaro and Wrembel, 2016; Dziubalska-Kołaczyk and Wrembel, 2017) be developed in which inter- and intra-individual variation is given due consideration.
Footnotes
Appendix
Relationships between the phonetic and phonological features under investigation in the present study (per learner group).
| Feature | Learner group | L1–L3 | L2–L3 |
|---|---|---|---|
| Rhotic sounds | L1-German–L2-English–L3-Polish | Different | Different |
| L1-Polish–L2-English–L3-German | Different | Different | |
| /w/ | L1-German–L2-English–L3-Polish | Different | Same |
| L1-Polish–L2-English–L3-German | Absent | Absent | |
| Final obstruent devoicing | L1-German–L2-English–L3-Polish | Same | Different |
| L1-Polish–L2-English–L3-German | Same | Different |
Acknowledgements
We would like to thank all anonymous reviewers and the journal editor for their insightful feedback and helpful suggestions for revision on the earlier versions of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Polish-German Foundation of Science [project no. 2017-10].
