Abstract
Aims and Objectives:
This study investigates the role of formulaic sequences (FS) in the interlanguage development of secondary school learners of Italian as a second language, in their second year of instruction, with a specific focus on the existential/locative structure c’è/ci sono (“there is/there are”). The research aims to determine the extent to which learners rely on FS and how these structures contribute to their broader syntactic and morphological development. Using oral speech production data elicited through a spot-the-difference task, the study applies the framework of Processability Theory to identify developmental patterns in the acquisition of c’è/ci sono.
Methodology:
The methodology involves a detailed distributional analysis of learners’ oral speech data, combined with the emergence criterion, to differentiate between formulaic and productive use of linguistic structures. In a further step, implicational scaling is applied.
Findings:
The findings reveal significant variability in learners’ syntactic development. While most learners rely heavily on single words and simplified structures, a minority demonstrates productive use of more complex syntactic forms, including c’è/ci sono. The developmental trajectory of c’è/ci sono progresses from non-production or reliance on alternative strategies to the use of formulaic patterns, followed by productive use with morphological variation. This trajectory aligns with learners’ broader morphological and syntactic development.
Originality:
The study contributes to a better understanding of FS from a learner-internal processing perspective in L2 Italian learners.
Limitations:
The study is limited by the cross-sectional nature of the data, which offers no insight into early developmental stages during the first year of instruction.
Keywords
Introduction
The role of formulaic sequences (FS) in second language (L2) acquisition (SLA) has been a central topic in applied linguistics, particularly in understanding how learners progress from relying on fixed expressions to developing productive grammatical competence.
In the context of instructed second language acquisition (ISLA), a key question is the role that FS play in the development of the learners’ interlanguage. This question is particularly significant, as learners’ language use in the early stages is often marked by a reliance on fixed expressions. Classroom instruction is typically structured around specific topics and communicative situations, with learners practising and applying appropriate phrases and expressions tailored to these contexts.
In this paper, a processing perspective on FS is adopted, following the Processability Theory (PT) framework (see Lenzing, 2013, 2015; Pienemann, 1998; Roos, 2009). According to PT, formulaic structures appear as unanalyzed forms in the learners’ interlanguage (Lenzing, 2013, p. 162). PT hypothesizes that at the beginning of the L2 acquisition process learners lack the necessary processing mechanisms to assign a lexical category to words and to exchange grammatical information within or across constituents, and therefore learners’ structural choices are limited to “utterances based on (simple or more elaborated) lexical processes” (Lenzing, 2015, p. 106).
Accordingly, this paper takes a learner-internal perspective on FS and investigates to what extent 29 secondary school learners of L2 Italian, in their second year of study, rely on FS in oral speech production. More specifically, it explores how these learners deal with the Italian existential/locative structure c’è/ci sono {X} (“there is/there are{X}”) as a potential formulaic pattern. This structure is of particular interest because it is introduced early in Italian L2 instruction and is often taught as a fixed expression. Despite its early introduction, learners frequently overgeneralize or simplify its use, making it a valuable case study for examining the interplay between formulaic and productive language use. Previous studies (e.g., Bernini, 2005) have highlighted the importance of c’è/ci sono as a strategy in early interlanguage development to compensate for gaps in morphosyntactic knowledge. However, little is known about how learners transition from using c’è/ci sono as an unanalyzed chunk to employing it productively within more complex syntactic constructions. Furthermore, the present study examines how the use of this structure ties in with their broader syntactic and morphological development.
To answer these questions, oral speech production data elicited through a spot-the-difference task are used. By applying the framework of PT (Pienemann, & Lenzing, 2025; Pienemann, 1998), the study aims to identify developmental patterns in the acquisition of c’è/ci sono and its role in learners’ interlanguage.
In this paper, first, the theoretical background is presented, focusing on the role of FS in SLA and their relevance to the development of syntax. Second, the methodology section outlines the participants, design of the study and data analysis procedures used to investigate the use of FS, particularly the c’è/ci sono structure, in L2 Italian learners. Finally, the results and discussion sections analyze the developmental trajectory of c’è/ci sono, highlighting its implications for L2 instruction and the broader understanding of interlanguage development.
Theoretical background
Learner-external vs. learner-internal perspectives on FS
The term “formulaic sequence” is used with a variety of meanings in SLA research. One of the most widely used definitions is Wray’s (2002): A sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar. (p. 9)
Myles and Cordier (2017, p. 9), however, criticize Wray’s (2002) definition of an FS, which implies holistic storage and processing, for its lack of empirical accessibility and internal contradictions. They argue that Wray’s definition cannot be operationalized as researchers cannot directly access speakers’ internal linguistic representations. Furthermore, Myles and Cordier find Wray’s allowance for “discontinuous” sequences with variable slots (e.g., Nice to meet/see you) somehow contradictory to the claim of holistic storage, as discontinuous sequences imply some level of grammatical processing. To address these issues, the authors propose a “weaker” definition of FS: “A psycholinguistic FS [or processing unit] is a multiword semantic/functional unit that presents a processing advantage for a given speaker, either because it is stored whole in their lexicon or because it is highly automatised” (p. 10).
While some definitions overlap, others do not, and it is often unclear which specific aspects researchers are focusing on when studying formulaic language (see Wray, 2002, 2012). To clarify (some of) these issues, Myles and Cordier (2017) propose a conceptual and methodological framework for defining and identifying FS. This framework distinguishes between two approaches: (1) linguistic, learner-external approaches, which focus on formulaic language present in the input the learner is exposed to (e.g., idioms, idiomatic expressions, collocations and lexical bundles), and (2) psycholinguistic, learner-internal approaches, which investigate the aspects of an individual learner’s interlanguage that are formulaic in nature. Clearly, there is overlap “in what is formulaic in a given speaker and what is formulaic in the language around this speaker” (p. 5). However, Myles and Cordier (2017) claim that the two approaches investigate “different phenomena and must be investigated as such” (p. 5) to avoid misunderstandings and misleading conclusions.
Most studies on L2 learners have focused on examining how externally defined formulaic language is used by L2 speakers (Myles & Cordier, 2017, p. 6). These studies have consistently found that such formulaic language is particularly challenging for L2 learners to master, even at advanced levels of proficiency (e.g., Bardovi-Harlig & Stringer, 2017; Boers & Lindstromberg, 2012). Furthermore, several researchers have investigated if these externally defined FS are psycholinguistically real, in other words, if they are processed holistically or preferentially (e.g., Siyanova-Chanturia et al., 2011; Underwood et al., 2004). The results of the studies indicated that L1 and L2 learners process FS differently. As Myles and Cordier (2017, p. 8) point out: “What these studies show is that, for NSs, idiomaticity usually goes hand in hand with processing advantage, whereas, for L2 learners, only transparent and/or very common FS show a processing advantage”. Studies investigating learner-external FS can only explain one part of the role FS play in SLA. To better understand the development of formulaicity in L2 learners, a speaker-internal perspective, that is, the analysis of learners’ interlanguage, is needed. A seminal research project that investigated the use of FS from a learner-internal perspective is the one by Myles et al. (1998, 1999). They explored the development of several verbal and interrogative chunks in beginner L2 learners of French over a 2-year period. Examples of these FS deal with the exchange of personal information, such as comment t’appelles-tu? (what’s your name?), and où habites-tu? (where do you live?) or j’aime (I like), and j’adore (I love). Their research revealed that learners initially relied heavily on FS when they lacked the linguistic resources to engage in the types of conversations required in the classroom. Over time, triggered by communicative needs, learners “gradually ‘unpacked’” (Myles et al., 1998, p. 32) FS and began using their components more productively. The first step in this process involved keeping the FS intact while adding elements, for example, a lexical noun phrase for clarification (e.g., Richard j’aime le musée [Richard I love the museum] intending “Richard loves museums”). When modifying the FS, learners applied it as “a database for hypothesis testing” (Myles et al., 1998, p. 359). In a second phase, as third-person formulas entered the learners’ repertoires, the breakdown process started and fed directly into the emergence of the pronoun system. The authors therefore conclude that learning formulas and the construction of rules are interdependent processes.
The role of FS in the acquisition of syntax
The role of FS in SLA has been a topic of intense debate since the 1970s and continues to be discussed in more recent research (Bardovi-Harlig & Stringer, 2017). From the outset, one of the main research questions has been whether FS serve as the source from which learners derive syntactic rules, and distinct theoretical positions have emerged on this issue. These positions range from the view that FS are the primary driver of L2 development to the position that L2 syntax development occurs independently of FS usage. In what follows, a brief summary of the most prominent perspectives on this issue will be presented.
Researchers like Hakuta (1974) and Wong Fillmore (1976) argue that FS serve as the primary source for the L2 learners’ acquisition of syntax. For instance, Wong Fillmore (1976) showed that children acquiring L2 English initially memorize FS to build and maintain social relations and subsequently use these sequences as foundational linguistic material to facilitate their grammar acquisition. Since the early 21st century, the formula-based approach to acquisition has gained significant influence in SLA research and has become an important part of usage-based approaches (see N. Ellis, 2003, 2012). Proponents of this view argue in favor of constructions within the framework of Construction Grammar that “are categorized, generalized, and, ultimately, analysed into constitutive forms” (Beckner et al., 2009, p. 10) This allows syntax to emerge from the input without relying on any predefined rules or structures.
Researchers such as Hatch (1972) and Krashen and Scarcella (1978) acknowledge that memorised chunks are used for communicative purposes. However, they argue that the development of syntax occurs independently of the use of FS. In a comprehensive review article Krashen and Scarcella (1978) sum up their perspective: Routines appear to be immune to rules at first. This clearly implies that routines are part of a system that is separate from the process generating rule-governed, propositional language. It is also evidence that automatic speech does not ‘turn into’ creative constructions. Rather, the creative construction process evolves independently. (p. 286)
More recently this view is supported by, for example, Bardovi-Harlig and Stringer (2013, 2017) whose data suggest that – at least in the case of conventional expressions 1 – L2 learners’ use of formulaic language is not a bootstrapping mechanism into syntax but reflects the autonomous syntactic development: „rather the acquisition of syntax as an independent process drives changes in the production of conventional expressions” (p. 61). Despite these significant contributions, the role of FS in SLA remains a topic of ongoing discussion and investigation.
In this paper, a processing perspective on FS is adopted, following the PT framework (see Lenzing, 2013, 2015; Pienemann, 1998; Roos, 2009). According to PT, formulaic structures appear as unanalyzed forms in the learners’ interlanguage (Lenzing, 2013, p. 162). PT argues that the developmental trajectory for specific aspects of the L2 system is determined by the architecture of the human language processor. A core claim of PT is that this processor is not fully developed at the beginning of the L2 acquisition process and therefore constrains SLA. In other words, learners can produce and understand only those linguistic structures that can be processed by their language processor at a given point in their L2 development (Pienemann, 1998). L2 learners can draw on general cognitive resources, but have to acquire language-specific processing routines to comprehend and produce the L2. The processing routines contain mechanisms to access words in the mental lexicon, to assign these words to syntactic categories and unify features on the phrasal level as well as between phrases and sentences. They are acquired in a hierarchically and implicationally related order, captured by a processability hierarchy that can be applied to typologically different languages (Pienemann, 1998; Pienemann & Lenzing, 2025).
The PT hypothesis states that at the beginning of the L2 acquisition process, learners lack the necessary processing mechanisms to assign a lexical category to words and to exchange grammatical information within or across constituents and therefore learners’ structural choices are limited to “utterances based on (simple or more elaborated) lexical processes” (Lenzing, 2015, p. 106). In other words, unanalyzed, fixed FS “are used as a strategy by L2 learners to overcome the communicative problems that arise due to their lack of appropriate L2-specific processing procedures at this stage” (Lenzing, 2013, p. 163). Furthermore, Lenzing distinguishes between simple lexicalized items and more elaborate lexical structures. These consist of (1) FS, on the one hand, and (2) idiosyncratic utterances, on the other hand (Lenzing, 2015, p. 106). Lenzing shows that both types are solely based on lexical processes, stored holistically in the learners’ mental lexicon and lack an underlying constituent structure.
Lenzing (2013, 2015) employs the term FS as an overarching concept to describe two distinct types of formulaic expressions: formulae and formulaic patterns. Formulae are fixed expressions that appear in learners’ textbooks and are produced by learners in an invariant form (e.g., What’s your name?, How old are you?). Lenzing argues that these structures are memorized as unanalyzed units and stored holistically in the learners’ mental lexicon, requiring no additional processing during production. Formulaic patterns are defined as “partly ‘creative’ and partly memorized wholes” following Krashen and Scarcella (1978, p. 283). They consist of chunks combined with an open slot that can be filled with variable material (e.g., a word or a phrase). According to Lenzing, these patterns represent an individual strategy employed by learners to solve communicative problems. She differentiates between formulaic patterns that are well-formed (e.g., Is it the snake? [pattern: Is it X?]) or nontarget-like (e.g., It’s a pink?), with the latter being semantically and/or syntactically ill-formed. In the present study, Lenzing’s definition of formulaic patterns and formulae is applied.
From a methodological point of view, in PT-research a linguistic structure is considered to be an unanalyzed chunk stored holistically in the learner’s mental lexicon, if it is not used productively. For syntax, productive use is defined as the occurrence of a specific structure with lexical variation in the speech sample. Following Pienemann (2015, p. 109), and Lenzing (2021, p. 230), a syntactic structure is considered to be acquired when the learner’s speech sample contains at least three different realizations of the respective structure. This operationalization of emergence helps to separate the use of productive structures from FS. In morphology, a similar procedure is applied, requiring both lexical and morphological variation in the learner’s data for a structure to be identified as having emerged.
To determine whether a learner uses a structure productively or as an invariant form, 2 Lenzing proposes a combination of distributional analysis of all appearing morphosyntactic structures in oral speech production data and a so-called null hypothesis testing. For example, the exclusive occurrence of How are you? would suggest that the structure is stored as an invariant form in the learner’s mental lexicon. However, if variations such as How is she?, *How are he?, or *How is they? are observed, it is more likely that the learner has acquired the necessary processing procedures to produce the corresponding structure.
Lenzing’s (2013) comprehensive analysis of spontaneous oral L2 English learner data in primary schools, which involved distributional analysis and null hypothesis testing for all structures and learners, revealed that after one year of instruction, most learners were at stage 1 of the PT hierarchy. As expected, the majority of learner utterances consisted of single words (77%). Nevertheless, FS also played a significant role, accounting for approximately 15% of all utterances (Lenzing, 2015, p. 110). After 2 years of instruction, the proportion of FS decreased to 9%, while the use of single words also declined. At the same time, learners began producing more SV(O) structures. This progression suggests that as learners advance in their language acquisition, they develop the ability to use the language more productively (Lenzing, 2015, p. 110; Pienemann & Lenzing, 2025, pp. 40–44).
From chunks to morphological and syntactical processing in Italian L2
The development from chunks to more complex morphological and syntactical processing in L2 Italian, as proposed by Di Biase and Bettoni (2015, p. 121) within the PT framework, is summarized in Table 1. In morphology, after an initial stage where learners produce single words and FS via lemma access, they advance to the category procedure, which involves no information exchange with other elements in the phrase or clause. The next stage (phrasal procedure) allows for agreement processes within the phrase, such as nominal and verbal agreement in Italian. At the S-procedure, learners can unify different categories of constituents at the sentence or clause level (e.g., predicative adjective agreement). Finally, at the S-Bar-level, subordination phenomena can determine morphological form (e.g., in Italian, the use of the subjunctive).
Processability hierarchy for Italian (Di Biase & Bettoni, 2015, p. 121) including c’è/ci sono.
Following the proposed developmental hierarchy for L2 Italian syntax, at the initial stage (lexical access), learners map single concepts to single words or formulas. At the next stage (default mapping), they arrange words in the language’s unmarked canonical order (SVO in Italian). To move beyond the canonical word order, learners must assign grammatical functions to sentence constituents. For Italian, it is hypothesized that learners first extend the canonical order by adding an adjective. With further development, they will be able to disrupt canonical order, initially by placing the subject in a postverbal position and later by producing non-canonical orders with a postverbal subject and a preverbal object (Di Biase & Bettoni, 2015, p. 136).
Existential constructions, such as c’è/ci sono [“there is/there are”]), are characterized by this non-canonical morphosyntax, which diverge from canonical SVO with respect to their marked post-copular position (Bentley et al., 2013, p. 1). The pivot (the entity whose existence is asserted) is the obligatory component in c’è/ci sono, while other elements, such as the copula and the locative clitic, are language-specific. In Italian, a copula is needed (the verb essere [“to be”]) which is controlled for consistent agreement. An additional phrase, that is, the coda (e.g., a locative phrase) is often found, but not requested in all circumstances. The locative clitic ci in the subject position is obligatory in Italian and provides a focal point for introducing something new into the discourse. Without ci, the construction would imply prior knowledge of the entity’s existence, for example, C’era un elefante grande così (“There was an elephant this big”) versus Era un elefante grande così (“It was an elephant this big”) (see Table 2). From a processability perspective, the structure requires processing routines located at stage 4 of the processability hierarchy, as it follows a non-canonical word order with a potential topicalized adverbial, a clitic pronoun (ci) and a post copula subject (see Table 1). However, within the PT framework, this structure has not received much attention, neither in theoretical nor empirical work. In early L2 Italian classes, it is taught as a fixed expression (see chapter below), and it seems to function as an unanalyzed chunk and a learner strategy in order to compensate for the absence of target-like morphosyntactic processes, making it a valuable case study for examining the interplay between formulaic and productive language use (see Mocciaro, 2019).
Existential constructions in Italian with non-canonical morphosyntax, with and without topicalized adjective.
Research questions of the study
As shown, the status and role of FS in SLA have not yet been fully clarified.
Further investigation of different structures and languages is therefore clearly warranted (see Bardovi-Harlig & Stringer, 2017, p. 84). In the present study, I investigate
to what extent 29 secondary school learners of L2 Italian, in their second year of study, rely on FS as defined by Lenzing (2013) in an oral spot-the-difference task.
how these learners deal with the Italian structure c’è/ci sono {X} (“there is/there are”) as a potential formulaic pattern and how this structure feeds into their broader morphological and syntactic development.
Methodology
Participants
The participants in this study were 29 learners of L2 Italian in an upper secondary school in Austria. Italian is their second foreign language introduced at school, with English being the first. The group was predominantly female (96.6%) with a mean age of 15.6 years. 23 of the 29 learners grew up with German as their L1, while six (20.7%) grew up with two or more languages, such as Chechen-German-Russian (1), German-English (1), Hungarian-Serbian-German-English (2), Hungarian-German (1), and Turkish-German (1) as their home languages. All of them had learned English at school, some learners also had experience in other Romance languages, such as French and Spanish (2), or Latin (3), Dutch (1), and Hungarian (1). According to the national curriculum provided by the Austrian Ministry (BMB, 2015), their language proficiency can be located at the A2 level of the Common European Framework of Reference for Languages (CEFR). Learners were in their second year of Italian instruction, all participants were taught by the same teacher and attended three 50-minute lessons per week, using the textbooks Espresso Ragazzi 1 (CEFR-level A1) and 2 (CEFR-level A2) (Orlandino et al., 2017, 2018). According to the teacher, they strictly followed the order in which the units were introduced. In this textbook c’è/ci sono is introduced in Unit 6, titled In giro per l’Italia (“Traveling Around Italy”). In this unit, learners are taught how to describe a city and its tourist attractions, as well as how to provide information about locations and directions. Learners listen to a conversation and complete a comic with c’è / ci sono. Then they are supposed to complete the “rule” how to apply c’è / ci sono (Uso c’è con parole singolari. [“I use c’è with singular words”] / Uso ci sono con parole plurali. [“I use ci sono with plural words”]). In a follow-up activity learners are asked to do mini-dialogues, similar to pattern-drill activities, using the following pattern: A: “Che cosa c’è in questa città?” (“What is there in this city?”) – B: “Ci sono monumenti antichi.” (“There are ancient monuments.”)/ “C’è una piazza famosa” (“There is a famous square.”). In addition, the textbook offers a less guided activity, that is, an oral task with an information gap, in which learners in pairs need to find out the location of the hospital, the theatre, the gelateria, etc. on different maps. In the workbook section learners have to complete a cloze and form sentences with c’è/ci sono.
Design and data collection
Data elicitation for this study was conducted as part of a broader data collection that took place in Spring 2019 and included several tasks designed to elicit spontaneous oral speech production data (see Schmiderer, 2023; Schmiderer & Hinger, 2023). Participants provided written informed consent prior to their participation in the study, including consent for the study’s findings to be published. For this specific study, learners were asked to work in pairs and complete a two-way information gap task – an adapted version of the spot-the-difference task described in Di Biase (2007, p. 237) (see Appendix 1). The task was aligned with the vocabulary introduced in the textbook and piloted with a similar group of learners. It resulted in dialogues ranging from a minimum of 10:09 minutes to a maximum of 16:27 minutes per group. The learner corpus comprises a total of 26,876 tokens and is accessible upon request.
Data analysis
The spontaneous spoken data were transcribed in MAXQDA 2018 (VERBI Software, 2018). To track the morphological and syntactic development of L2 Italian learners after one and a half years of instruction their stage of acquisition according to the PT hierarchy was determined. For this purpose, I first conducted a detailed distributional analysis of various relevant morphological and syntactic structures in the learners’ oral speech production data.
In a distributional analysis, it is indicated whether a specific feature (e.g., [TopAdv] + ci + V + NPsubj or attributive adjective agreement) is used (‘+’) or not used (‘-’) in an obligatory context for the corresponding structure.
In a second step, the emergence criterion – rather than an accuracy criterion – was applied. Following Pienemann (1998), a linguistic structure is considered to be acquired, or to have emerged, if it occurs productively in the learner’s interlanguage. Productive use refers to the first non-formulaic occurrence of a certain structure (Pienemann, 1998). The underlying logic is that the productive use of a structure reflects the presence of the corresponding processing procedure in the learner. In practice, it is defined in such a way that it must be checked whether a certain structure is used in the learner data only in unanalyzed units, stored holistically in the learner’s mental lexicon, or in contexts that are lexically and morphologically different.
Following Pienemann (2015, p. 109), and Lenzing (2021, p. 230), a syntactic structure is considered to be acquired when the learner’s speech sample contains at least three different realizations of the respective structure (e.g., SVO structure: L4: no / io solo ho undici [no / I only have eleven]; okay / io visto tre [okay / I seen three]; no / questo è blu [no / this is blue]). In morphology, a similar procedure is applied, requiring both lexical and morphological variation in the learner’s data for a structure to be identified as having emerged (e.g., plural marking: gatto [cat] – gatti [cats], case [houses]). A comprehensive coding manual for all relevant morphological and syntactic structures was prepared (see Schmiderer, 2023) and ambiguous cases were discussed in regular data meetings within the author’s research group.
Another key methodological principle in this analysis is implicational scaling, a tool in SLA research for identifying developmental patterns in cross-sectional as well as longitudinal studies. The central idea underlying implicational scaling is that “variables can be ordered in such a way that the presence of variable x [structure x] in a data sample implies the presence of variable y [structure y], but not vice versa” (Lenzing, 2019, p. 29) (also see Table 3). Perfect tables such as in Table 3, however, are rare with natural data. Therefore, it needs to be clarified when an implicational scale can be considered to be valid. Deviations from the ideal pattern are usually captured by the coefficient of reproducibility (CR) (Rickford, 2002, p. 149), calculated by dividing the total number of errors identified in the matrix by the total number of opportunities for error; the CR should be higher than .96 (Hatch & Lazaraton, 1991, p. 212).
Sample implicational scaling in a cross-sectional study (based on Lenzing, 2019, p. 29).
Lenzing adopts the same analytical approach just described to identify formulaic patterns. She conducts a distributional analysis of the speech sample of each learner. This analysis allows to identify whether the structure under investigation occurs only in an invariant form or whether it is also used productively. In particular, the test of the null hypothesis helps to determine whether a specific formulaic pattern (e.g., “It’s a”) appears with lexical and/or morphological variation (e.g., “they are the,” “*they is a,” etc.). The null hypothesis describes the situation in which the tested hypothesis (“the learner uses the structure in a formulaic manner”) is incorrect. If lexical and morphological variation occurs in the data, it indicates that the structure is created productively by the learner. In Table 4, there isn’t any variation in the pattern. Learner 6 produces the pattern “TopAdv *è X [TopAdv *is X]”, where the clitic “ci” is missing, five times in her speech. However, the pattern occurs without variation in the learner’s data and is therefore identified as a formulaic pattern used as a communicative strategy to approach the task. In contrast, Learner 5 (Table 5) uses the same pattern with some variation (TopAdv *è X [TopAdv *is X] and TopAdv *sono X [TopAdv *are X]), which indicates that the structure is being used productively.
Distributional analysis of ‘TopAdv è (X)’ – Learner 6.
Distributional analysis of ‘TopAdv è (X)’ – Learner 5.
Following Lenzing (2013, p. 171), utterances that are predominantly in German and contain only one or two lexical items expressed in Italian were excluded from the analysis. The same applies to structures in which the verb is expressed in German.
Results
Overall morphological and syntactic development
First, the overall morphological and syntactic development of the 29 learners is presented in terms of their developmental stage following PT. Regarding their morphological development (see Table 6), it can be said that all learners have reached stage 2 (category procedure). According to the emergence criterion applied, ten learners (43.48%) have reached stage 3 (phrasal procedure), and another six learners (20.69%) are at stage 4 (S-procedure) (see Schmiderer & Hinger, 2023).
Overview of morphological development per learner (n = 29) (Schmiderer & Hinger, 2023), emerged stages are shaded in grey.
The syntactic development is slightly different. Not all learners have reached Stage 2 (default mapping); however, most have (89.66%). Stage 3 does not show sufficient data for this stage. The small number of contexts for default mapping with an additional argument is most likely due to the task. Five learners (17.24%) have reached non-default mapping on stage 4. Interestingly, only three of these learners reached the highest stage in morphology; the other two reached lower developmental stages in syntax. The coefficient of scalability amounts to 1.0 for morphological developmental stages and to .965 for syntactic development. Therefore, Tables 6 and 7 are considered valuable implicational scales.
Overview of syntactic development per learner (n = 29), emerged stages are shaded in grey.
FS in the learner data
Oral speech production data of L2 Italian learners in their second year of formal instruction reveal that, on average, 58% of their utterances 3 consist of either simple lexicalized items, that is, single words (e.g., sì [yes, learner 19]; quanto? [how many?, learner 4]); anche? [also?, learner 4]) or elliptic structures (e.g., mia marrone [mine brown, learner 15]), uno blanco [one white, learner 25]). These structures can be morphologically more complex, as there might be feature unification at the phrase level, but they remain syntactically simplified. Notably, the maximum usage of single words and elliptic structures reaches 52% (learner 3) and 70% (learner 10), respectively. However, there is considerable variation in learners’ reliance on these structures, as illustrated in Table 8. The data further indicate that the most frequently used syntactic structure beyond simple lexicalized items and elliptic structures is the canonical SVO order, which accounts for an average of 28% of all utterances in the data set. This structure also shows significant variability, with a maximum usage of 78% (learner 28) and a minimum of 4% (learner 10).
Overview of syntactic structures in L2 Italian learners in their second year of instruction (in %).
More complex syntactic structures, such as (TopAdv) ci V (S), are used less frequently, with an average of 12%. However, some learners demonstrate a higher degree of reliance on this structure, with a maximum usage of 34%, while others do not use it at all (minimum 0%). Similarly, TopAdv S V O constructions are rare, with an average usage of only 1%, a maximum of 10%, and a minimum of 0%. This is likely attributable to the task, which did not explicitly elicit this structure. Not surprisingly, relative clauses are the least frequently used structure, with an average of 1%. One learner uses them in 13% of cases, while most learners do not employ them at all (minimum 0%).
These findings highlight the variability in syntactic complexity in the utterances of L2 learners and suggest that while some learners begin to experiment with more advanced structures, the majority of learners relies heavily on single words or syntactically simplified structures.
When examining the use of fixed expressions that are produced by the learners in an invariant form and appear as such in the learners’ textbook (formulae, see definition above), it can be seen that only a few learners apply these structures in the specific task under consideration. The formulae presented in the textbook Espresso Ragazzi (Orlandino et al., 2017, 2018) and used by the learners in an invariant form include locative adverbs (e.g., a destra (on the right), a sinistra (on the left), in centro (in the middle), vicino a (next to), in fondo (at the back), the conventional expression non lo so (I don’t know), descriptions of clothing (e.g., a quadri [checked], a righe [striped]), the question di che colore è? (what colore is it?), as well as phrases like tutti e due (both) and the compound negozio di vestiti (clothing store). Interestingly, the latter phrase appears to function as a pattern with an open slot for one learner (learner 4), who produces variations such as negozio di pizza (store of pizza) and negozio di moda (store of fashion).
Formulaic patterns in the developmental path of “C’è/Ci sono”
A detailed analysis of the syntactic structures occurring in the learners’ interlanguage are presented below. The distributional analysis focuses on the syntactic features identified for Italian within the PT framework (Di Biase & Bettoni, 2015; Di Biase & Kawaguchi, 2002) and is expanded in this paper to include the existential/locative c’è/ci sono structure. These findings are summarized in Table 9. The use of a structure (in an obligatory context) is indicated with a (+), while the non-use of the structure is marked with a (-) for the syntactic structures in all 29 learners.
Distributional analysis of syntactic structures per learner.
At first glance, one gets the impression that the learners produce structures from various stages. However, the application of the emergence criterion reveals that most learners appear to have acquired stage 2 (canonical word order). Due to a lack of contexts for stage 3 – likely attributable to the nature of the task – little can be concluded about the emergence of this stage. Nevertheless, four learners seem to have reached stage 4, as they produce structures such as (TopAdv) + ci + V + NPsubj in various realizations. In addition, one of these learners also demonstrates the ability to topicalize a direct object with a clitic object pronoun.
A more detailed analysis of the learners’ ability to produce c’è/ci sono reveals that learners can be divided into several groups. The least advanced learners (in terms of syntactic and morphological development) seem to (1) not produce any contexts for this particular structure (3.45%); (2) adopt an alternative strategy for solving the picture description task, using phrases such as “ho” (I have), “*visto” (I seen) instead of “c’è/ci sono” (there is/are) (e.g., L28, total: 10.34%); or (3) apply two different types of formulaic patterns (51.72%). These two patterns are (A) the use of a topicalized adjective and a subject, omitting both the locative clitic and the copula (as in lila in Table 10), (13.79%), or (B) the use (with or without TopAdv) of è, the third person singular form of the verb essere (“to be”) and a noun, while omitting the locative clitic (as in yellow in Table 10, 37.93%). Interestingly, one learner (learner 17, see Table 11) applies the pattern consistently with the second person SG form (sei) throughout her data. This learner, however, also uses “sei” in other copula contexts in his data (e.g., eh semaforo *
Distributional analysis of ‘(TopAdv) + ci + V + NPsubj’ including formulaic patterns per learner.
The learner mainly pursues different strategies: “visto” (seen), “ho” (I have). bUtterance could have been copied from the conversation partner (L3: è otto persone? [is eight people]; L10: è dieci personi / persone [is ten people]?. cL17 uses the pattern with sei [are, second person singular] and not è [is, third person singular].
Distributional analysis of ‘TopAdv è (X)’ – Learner 17.
More advanced learners, (20.69%) still omit the clitic pronoun ci. However, their utterances exhibit some degree of morphological variation and therefore, these learners do not appear to fully rely on the patterns identified in the least advanced learner groups (as in green in Table 10, for example, learner 5).
Finally, as shown in Table 10, in the oral speech production data of four learners (13.79%) the structure c’è/ci sono has emerged productively. These learners use the patterns with morphological and/or lexical variation (as in blue in Table 10, see example in Table 12). A closer examination of the morphological development of these learners reveals that learner 16 and learner 12 are also among the most advanced learners in terms of morphological development (see Table 6). For learner 2 and learner 23, however, this is not the case. They appear to have developed more in terms of their syntax than in terms of their morphology, whereas for most of the other learners, the opposite seems to be the case.
Distributional analysis of ‘TopAdv c’è (X)’ – Learner 16.
In summary, based on the cross-sectional data from this study on Italian L2 learners, a developmental path for the structure c’è/ci sono can be identified (see Table 13). This path progresses from the non-production of relevant contexts to the use of formulaic patterns or alternative structures as communicative strategies, followed by the use of c’è/ci sono with morphological variation but omitting the clitic pronoun, and finally to the productive use of the structure. This developmental trajectory appears to parallel the learners’ morphological development.
Summary of the developmental path for the structure c’è/ci sono in Italian L2 learners.
Discussion and conclusion
The analysis of oral speech production data from second-year L2 Italian learners in an upper secondary school context reveals significant variability with regard to their syntactic development. In terms of developmental stages, following PT, 89.66% of all learners have reached stage 2 (default mapping), 17.24% can be located at stage 4 (non-default mapping).
In their second year of instruction, on average, 58% of learners’ utterances consist of either single words, or elliptical structures and phrases, with some learners relying on these simplified forms in up to 52% and 70% of their speech production, respectively. These results are similar to the findings by Myles (2012, p. 82). In her study on L2 French development in primary school learners, she observes that when learners lack access to FS to meet their communicative needs, they frequently resort to the simple juxtaposition of noun and/or prepositional phrases, doing so in over 80% of cases. The findings of the present study align with this observation, as learners similarly employ this strategy to compensate for gaps in their linguistic repertoire.
Beyond simplified structures, in the learner data, the canonical SVO order emerges as the most frequently used syntactic structure, accounting for 28% of utterances on average. However, its usage varies widely among learners, ranging from 4% to 78% of their utterances. More complex syntactic constructions, such as TopAdv CI V S and TopAdv SVO, are used less frequently, with average usage rates of 12% and 1%, respectively.
These findings highlight the reliance of most learners on simpler structures, while a minority begins to experiment with more advanced syntactic forms. Interestingly, the use of textbook formulae appears to be relatively limited, with only a few learners incorporating these into their speech. Some learners, however, also adapt textbook formulae creatively, as seen in variations of the phrase negozio di vestiti (“shop of clothes”) (e.g., negozio di pizza (“shop of pizza”), negozio di moda [“shop of fashion”]).
The developmental trajectory of the existential/locative structure c’è/ci sono offers valuable insights into learners’ syntactic progression. Although this structure is introduced as a chunk early in Italian L2 instruction and is presented in the textbook at an early stage, only a few learners in their second year appear to have acquired it according to the emergence criterion. A detailed distributional analysis shows that learners apply different strategies in their use of this structure. Four groups of learners can be identified: (1) learners who don’t produce the structure or rely on alternative strategies (e.g., ho (“I have”) or *visto (“seen” > “I see”) instead of c’è/ci sono (“there is/there are”)); (2) learners who use formulaic patterns that omit the locative clitic ci, the copula or both; (3) more advanced learners who demonstrate morphological variation but still omit the clitic ci; and (4) learners who productively use c’è/ci sono.
Compared to other studies (e.g., Bernini, 2005), the learners in this study do not frequently produce c’è as an interlanguage construction or as an overgeneralization. As shown in Table 6, the learners in their second year of instruction are already demonstrating morphosyntactic development, which may explain the decline in the use of interlanguage constructions. However, to fully understand the developmental trajectory data from the very early stages of the acquisition process, particularly from the first year of learning, would be necessary. The finding that formulaic patterns are initially used as unanalyzed units in learners’ interlanguage but are gradually segmented, analysed, and made available for the acquisition process over time is supported by studies such as those by Wong Fillmore (1976) and Myles et al. (1998). These studies report that the children they observed gradually deconstructed FS during the first 2 years of foreign language learning.
Myles et al. (1998) emphasize the importance of opportunities for interaction in the target language for this process to occur: “The more opportunity he [a pupil] had to engage in spoken interaction, the more likely he was to begin to deconstruct the chunks” (p. 358).
Similarly, Roos (2009), in her study on FS in early EFL primary school learners, concludes that communicative interaction is essential for fostering the development of productive, rather than solely formulaic, language use also in beginner classes. She advocates for didactic approaches such as task-based language teaching (R. Ellis, 2003; Long, 2015; Nunan, 2004) to promote productive language use. On one hand, tasks could offer plenty of opportunities for L2 interaction and creative use of the L2 in this interaction; on the other hand, they allow for a focus on formulaic language in certain phases.
Supplemental Material
sj-docx-1-ijb-10.1177_13670069261438580 – Supplemental material for Formulaic patterns in L2 Italian secondary school learners’ interlanguage development
Supplemental material, sj-docx-1-ijb-10.1177_13670069261438580 for Formulaic patterns in L2 Italian secondary school learners’ interlanguage development by Katrin Schmiderer in International Journal of Bilingualism
Footnotes
Acknowledgements
I would like to express my gratitude to Anke Lenzing for her valuable comments and insightful discussions on an earlier version of the manuscript. Furthermore, I would like to thank all the students and the teacher who participated in the study for their time and engagement.
Ethical Considerations
All students provided written informed consent prior to their participation in the study.
Consent to publication
Informed written consent for publication was provided by all students.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
