Abstract
This study investigates how adult learners perceive segmental length in L2 Italian. We tested 104 learners from 5 L1 backgrounds that differ in their phonological treatment of quantity: Finnish (vowel and consonant length), Czech and Slovak (vowel length), German (restricted vowel length), and Spanish (no phonemic length). These groups were compared with 34 native speakers of Italian, a language with distinctive consonant length (e.g., papa ‘Pope’ vs. pappa ‘porridge’). Using an AX discrimination task with 45 pseudowords, we examined the influence of L1, stress position, consonant type, and proficiency. The results reveal that L1 strongly predicts perceptual sensitivity to consonantal length contrasts: Finnish learners showed the highest sensitivity, followed by Slovak, Czech, German, and Spanish learners. Discrimination of vowel-length contrasts was weaker across all groups and showed more variation, likely because in Italian, vowel duration does not cue phoneme identity. Stress position also played a significant role: discrimination of consonantal quantity contrasts was most robust in post- and pre-stressed syllables and weakest in unstressed contexts. Consonant type further influenced performance, with quantity being easier to discriminate in sonorants than in obstruents, although L1-specific effects emerged. Proficiency did not emerge as a uniform global predictor, but descriptive patterns indicated higher performance among advanced learners than among beginner-level and intermediate-level learners. Overall, the multifactorial analysis highlights the role of L1 in shaping learners’ perception of L2 length, alongside prosodic effects, segmental properties, and proficiency. The study also underscores the importance of perceptual training in acquiring new phonological contrasts.
1 Introduction
Perception plays a central role in the acquisition of second language (L2) speech and provides a crucial window into how learners build new sound systems (e.g., Chládková et al., 2022, 2025; Escudero, 2005, 2007; Flege, 1995; Flege et al., 1997, 1999; Saito & Van Poeteren, 2018). 1 Before learners can reliably produce new categories, they must be able to detect, interpret, and integrate relevant acoustic cues from the input. Sensitivity to L2 segmental or suprasegmental contrasts is shaped by multiple influences: the availability of contrasts in the L1 (e.g., Best & Tyler, 2007; Escudero et al., 2008; McAllister et al., 2002), phonetic variability in the input (e.g., Bradlow et al., 2017), input frequency (e.g., Flege & Bohn, 2021), and individual cognitive factors (e.g., Lengeris, 2009; Perrachione et al., 2011). In addition, learners’ complex linguistic biographies—encompassing age of acquisition, amount and quality of exposure to L2, aptitudes or motivation—play an important role (e.g., Colantoni et al., 2015; Derwing & Munro, 2015; Piske et al., 2001; for overviews). These findings point to the need for approaches that integrate multiple linguistic and extra-linguistic factors, rather than attributing outcomes to a single source (e.g., Flege & Bohn, 2021; Hanulíková et al., 2012).
In light of this, the present paper examines how adult learners of Italian perceive segmental length, an understudied phenomenon in L2 acquisition compared to segmental quality. Italian is one of the few languages worldwide that employs contrastive consonant length (e.g., Blevins, 2004; Ladefoged & Maddieson, 1996; Maddieson, 1984). Since long consonants (or geminates) are attested in only a small proportion of the world’s languages, 2 they are generally considered marked consonants and, according to Eckman’s (1977) Markedness Differential Hypothesis, are predicted to pose particular challenges for L2 learners. Previous studies confirm that learners of Italian show clear L1-based patterns in production, often struggling with quantity (e.g., De Clercq et al., 2014; Einfeldt et al., 2019; Giannini & Costamagna, 1998; Kabak et al., 2011; Sorianello, 2014). In perception, too, L2 learners generally show weaker sensitivity to consonant length than L1 Italian speakers (e.g., Altmann et al., 2012). Language-internal factors such as manner and place of articulation, voicing, and stress position further affect both perception and production (e.g., Sorianello, 2014, for production of geminates in L2 Italian; Dmitrieva, 2012, 2017 for the perception of the contrast between short and long consonants by different L1 listeners). Orthography has also been shown to play a role in L2 speech perception more generally, as visual cues can influence learners’ phonological representations (e.g., Bassetti, 2017; Hamann & Colombo, 2017; Nimz, 2016; Pešková et al., 2017; Repetti, 1993); this may be particularly relevant for Italian, where consonant length is generally marked in the orthography.
The persistent difficulty of geminates in both production and perception highlights the importance of theoretical models of cross-linguistic perception. Trubetzkoy’s (1939/1969) notion of the ‘phonological sieve’ suggests that L2 sounds are filtered through L1 categories. Major models of L2 speech perception make similar predictions: according to the Speech Learning Model (SLM, Flege, 1995), the Perceptual Assimilation Model (PAM; Best et al., 1988) and the PAM-L2 (Best & Tyler, 2007), if phonological length does not exist in the L1, non-native listeners tend to assimilate short and long consonants into a single category (e.g., perceiving the geminate [nː] in nonno ‘grandfather’ as a short [n]). The Second Language Linguistic Perception (L2LP) model by Escudero (2005) further predicts that learners can establish new phonological categories only through sufficient and informative exposure to L2 contrasts, which allows them to update their initial L1-based perception grammar (see also van Leussen & Escudero, 2015).
Segmental length provides an ideal testing ground for multifactorial approaches to L2 acquisition. Although consonant-length contrasts are often discussed in terms of acoustic duration, their phonetic realization is multidimensional and language-specific. Cross-linguistically, gemination interacts with vowel duration, prosodic structure, and articulatory timing in different ways, even though increased consonant duration remains the most consistent cross-linguistic cue (e.g., Burroni et al., 2025; Ridouane, 2010). In the present study, the learners’ L1s differ typologically in their treatment of quantity: Finnish has robust vowel and consonant length; German, Czech, and Slovak restrict length to vowels; and Spanish lacks phonemic length altogether. 3 Building on this variation, the present study examines how learners from these L1 backgrounds perceive length contrasts in Italian. Using an AX discrimination task, we compare their performance with an L1 control group. By combining a large and diverse learner sample with a systematic analysis of L1, stress position, consonant type, and proficiency, the study captures the multifactorial nature of L2 length perception. The findings contribute to integrated models of L2 phonological development and highlight the importance of perception as a prerequisite for effective pronunciation training (e.g., Colantoni et al., 2021; McGill, 2024; Nagle, 2018, 2021).
The paper is structured as follows. Section 2 compares the segmental length patterns of the languages under study, outlines relevant L2 acquisition models and formulates the research questions and hypotheses. Section 3 describes the experimental design, participants, and data analysis procedures. Results are presented in Section 4 and followed by discussion in Section 5. Section 6 concludes the paper.
2 Theoretical Background
To address the question of how length is perceived in L2 Italian, this study compares learners from languages with and without phonological length contrasts. While the main focus lies on consonantal length, vowel duration is also assessed to examine sensitivity to segmental duration and potential differences in the perception of durational cues across consonants and vowels.
2.1 Segmental Length in Cross-Linguistic Perspective
Segmental quantity (or length) refers to the use of segment duration (short vs. long) to distinguish lexical meaning. The sections below outline how these length contrasts are implemented differently across the languages under study, including their interaction with vowel duration, prosodic structure, and segmental properties (see also Pešková, 2025, for a summary of their cross-linguistic differences and similarities in the consonantal inventory).
2.1.1 Italian
Italian contrasts singleton and geminate consonants in more than 1.800 words (e.g., pala [ˈpaːla] ‘shovel’ vs. palla [ˈpalːa] ‘ball’) (Mairano, 2024; Mairano & Calabrò, 2016). It features 15 consonants that can occur either as singletons or geminates and also includes five inherently long consonants in word-internal position: /ɲː/, /ʎː/, /ʃː/, /tːs/, and /dːz/ (e.g., di Canepari, 2009; Marotta & Vanelli, 2021). Moreover, Italian has four intrinsic singletons /z/, /j/, /w/ and /ʒ/ that occur only in loanwords (glide geminate /jː/ is mentioned in some studies as a regional exception, see, e.g., Bertinetto & Loporcaro, 2005; Kaschny, 2011) (Figure 1).

Consonant inventory of Italian (blue = singletons or geminates, pink = intrinsic geminates). 4
Although some Northern Italo-Romance varieties exhibit processes of degemination (e.g., Bertinetto & Loporcaro, 2005; Krämer, 2009), geminate consonants are part of all regional Italian varieties (e.g., Chang, 2000; Mairano & De Iacovo, 2019). The reported ratio between the duration of long and short consonants depends in general on variety, speaker, and consonant (Farnetani & Kori, 1986, p. 32; Šimko et al., 2014, p. 130; see also Payne, 2005). For Italian, Kingston et al. (2009, p. 299) report duration ratios ranging from 1.65 to 2.35. According to Chang (2000, p. 60), geminates in Southern Italian are nearly twice as long as singletons (mean ratio ≈ 1:1.95), while in Northern Italian, the ratio is slightly lower (≈ 1:1.69), at least in the case of /t/ vs. /tː/.
While orthography marks lexical or ‘true’ geminates with double letters, offering a useful visual cue for L2 learners (e.g., sette ‘seven’), intrinsic geminates are less consistently marked in writing, which may reduce learners’ performance in both production and perception (e.g., grazie [ˈɡratːsje] ‘thank you’). Phonologically, geminates are ambisyllabic (e.g., bello /ˈbɛl.lo/ ‘beautiful’) and appear in both stressed and unstressed positions (e.g., arrabbiato [arːaˈbːjaːto] ‘angry’). In addition to marking lexical contrasts, Italian gemination also conveys a grammatical function (e.g., parleremo [m] ‘we will talk’ vs. parleremmo [mː] ‘we would talk’).
Moreover, (Standard) Italian exhibits the post-lexical process of raddoppiamento fonosintattico (RF), whereby word-initial consonants are lengthened across word boundaries under specific prosodic or lexical conditions, as in caffè freddo [kaˈfːɛˈfːredːo] ‘iced coffee’ (e.g., Loporcaro, 1997; Marotta, 2011; Passino, 2013). While RF is not directly relevant for the present study, it illustrates that geminates represent a very frequent and functionally significant feature of Italian phonology.
Vocalic length is not distinctive in Italian, but vowels are systematically long in non-final open stressed syllables (e.g., casa [ˈkaːza] ‘house’) (e.g., Bertinetto & Loporcaro, 2005; D’Imperio & Rosenthall, 1999; Loporcaro, 1996). Many native speakers might be unaware of this automatic process, yet vowel duration serves as a crucial cue to prosodic structure, stress placement (e.g., papa [ˈpaːpa] ‘Pope’ vs. papà [paˈpa] ‘Dad’) (e.g., Bertinetto, 1981; Burroni & Greca, 2024; Eriksson et al., 2016). Importantly, long vowels never precede geminate consonants (e.g., Loporcaro, 1996). The overall ratio between short and long stressed vowels is approximately 0.62 in Italian (Loporcaro, 2015, pp. 183, 190); however, the ratio also depends on syllable, word structure, and speaking rate (Farnetani & Kori, 1986, p. 32). Interestingly, experimental studies have reported that Italian listeners are able to discriminate durational differences similarly for both consonants and vowels, either as a cue to phonological length, stress, or both (e.g., Altmann et al., 2012).
It should be noted that the perception of short and long consonants in Italian may depend not only on absolute segment duration but also on the length of the consonant relative to the preceding vowel; the subsequent segments appear to play a limited role in perception (Pickett et al., 1999).
2.1.2 German
Standard German does not exhibit phonemic consonant-length contrasts. It allows only fake or quasi-geminates in morphologically complex words, particularly at affixation boundaries (e.g., ummelden [mm] ‘to re-register’; Baletttänzerin [tt] ‘female ballet dancer’) (e.g., Kotzor et al., 2016). These fake geminates are mostly longer than singletons; for instance, stops show extended closure durations (Mikuteit, 2007).
In orthography, double consonant letters do not indicate consonant length but either mark morphological boundaries or signal a contrast in vowel length (e.g., muss [mʊs] ‘(I / (s)he) must’ vs. Mus [muːs] ‘mash’; Ratte [ˈʁatə] ‘rat’ vs. Rate [ˈʁaːtə] ‘ratio’). Like Italian, German features contrastive lexical stress. However, unlike Italian, it has a phonemic opposition between short and long vowels. This contrast is closely tied to stress position and typically involves pairs of short lax vowels (/ɪ ʏ ʊ ɛ œ ɔ/) and their long tense counterparts (/iː yː uː eː øː oː/) (e.g., Ramers, 1988; Weiss, 1974, 2000). Perception studies indicate that listeners tend to be sensitive to vowel duration, especially in the case of the /a/–/aː/ contrast, whereas for the other vowels, quality often plays a more prominent role than length (Sendelmeier, 1981). With respect to the consonant inventory, German lacks certain segments that are present in Italian, such as /ɲ/, /ʎ/, /dz/, and /dʒ/, which occur only in a very limited set of loanwords (e.g., Dschungel [ˈdʒʊŋl] ‘jungle’). Although intrinsic geminates are not tested in the present study, they are worth noting as they may pose specific challenges for German-speaking learners of Italian.
2.1.3 Czech and Slovak
Czech and Slovak are closely related West Slavic languages that function as quantity languages, displaying high sensitivity to vowel length (e.g., Palková, 1994; Šimáčková et al., 2012; Skarnitzl et al., 2016 for Czech; Hanulíková & Hamann, 2010; Pavlík, 2004 for Slovak). Both languages allow short and long vowels in both stressed and unstressed syllables (e.g., Czech lovy [ˈlovɪ] ‘(a) hunt.PL’ vs. loví [ˈloviː] ‘(s)he hunts’, Slovak rad [rat] ‘line’ vs. rád [raːt] ‘glad.M’).
In orthography, long vowels are consistently marked with an acute accent (e.g., Berlín [ˈbɛrliːn] ‘Berlin’), or with a ring above the letter < u > in Czech (e.g., růže [ˈruːʒɛ] ‘rose’). Both languages are fixed-stress languages, where the primary stress is on the first syllable, but it is not cued by duration. Crucially, vowel quality remains relatively stable across most short–long pairs. Standard Czech has five short vowels /a ɛ ɪ o u/ and five long counterparts /aː ɛː iː oː uː/, and the same holds for Slovak with /a ɛ i o u/ and /aː ɛː iː oː uː/. In Czech, reported long–short duration ratios for vowels range according to the vowel quality: from 1.66 for /iː/–/ɪ/ to 1.97 for /oː/–/o/, with intermediate values for /ɛː/–/ɛ/ (≈ 1.78), /aː/–/a/ (≈ 1.73), and /uː/–/u/ (≈ 1.65) (Paillereau & Chládková, 2019). However, the short–long distinction in the high front vowel pair also involves a qualitative difference in Bohemian Czech (Western Czech), whereas for Moravian speakers (Eastern Czech) and in Slovak the distinction is realized primarily through duration (e.g., Šimáčková et al., 2012, p. 229 for Czech; Beňuš & Mády, 2012, p. 516 for Slovak). In Slovak, long vowels are up to about 1.5 times longer than their short counterparts (Beňuš & Mády, 2012, p. 517).
Like German, Czech and Slovak exhibit numerous fake geminates that arise at morpheme boundaries (e.g., ochranná [ˈoxranaː] ‘protective.F’). These are typically produced as short consonants. However, in certain cases where a near-minimal pair would otherwise lead to ambiguity, double consonants are supposed to be pronounced as two identical segments, as in raci [ˈratsi] ‘crayfish.PL’ versus racci [ˈratstsi] ‘seagulls’. In terms of consonant quality, the two languages do not differ substantially from each other or from Italian. However, the palatal lateral /ʎ/ is absent and the affricate /dz/ is very marginal in Czech. The affricate /dʒ/ is restricted mostly to loanwords in both languages (e.g., Czech džus [dʒus]; Slovak džús [dʒuːs] ‘juice’).
Despite their overall phonological similarity, Slovak differs from Czech in one important respect, relevant to the present study: the existence of syllabic long sonorants /rː/ and /lː/, which are written with an acute accent (e.g., vŕba ‘willow’; vĺča ‘wolf cub’). Their short syllabic counterparts /r/ and /l/ also exist (e.g., prst ‘finger’, vlk ‘wolf’). While some scholars have treated the long liquids as allophones of /r/ and /l/ (e.g., Ďurovič, 1975), others argue they should be considered separate phonemes, since their distribution is not fully complementary: the long sonorants occur only in syllable nuclei, whereas the short ones may appear both in nuclei and in other positions (Bujalka et al., 1996, p. 42, cited in Hanulíková & Hamann, 2010, p. 374). We adopt this analysis, treating /r rː l lː/ as four distinct phonemes in Slovak. This phonological distinction is particularly relevant to the present study, as it introduces an additional source of cross-linguistic variation in segmental duration, one that may shape differences between Czech and Slovak learners in perception of length contrasts in L2 Italian.
2.1.4 Finnish
Finnish is one of the few languages worldwide that employs both vowel and consonant length contrastively and systematically. The quantity system is highly productive and can distinguish lexical meanings through vowel and consonant duration in both stressed and unstressed syllables (e.g., muta [ˈmutɑ] ‘mud’ vs. muuta [ˈmuːtɑ] ‘other.PART’ vs. mutta [ˈmutːɑ] ‘but’) (e.g., Lehtonen, 1970; Nakai et al., 2012; Suomi, 1980). Unlike Italian, Finnish geminates may follow long vowels (e.g., undulaatti [ˈundulɑːtːi] ‘parrot’). Finnish orthography consistently reflects the length opposition using double letters for both vowels and consonants (e.g., tuli ‘fire’ vs. tulli ‘customs’). Stress is fixed on the first syllable, and vowel length is independent of stress.
Finnish short vowels /æ e i ø y ɑ o u/ do not differ in quality from their long counterparts/æː eː iː øː yː ɑː oː uː/. Actual durations are highly variable and context-dependent, influenced by speech rate, word length, and speaker (e.g., Harrikari, 2000). Reported durational ratios range from 1:1.5 to as high as 1:3.5 (see Eerola et al., 2012 for an overview). Suomi (2007) further suggests that Finnish has a fine-grained durational system beyond a simple binary contrast, with up to four degrees of duration for single vowels and three for long vowels, depending on prosodic context. These findings underscore the gradient and multidimensional nature of Finnish quantity.
For consonants, measurements show that obstruents /p t k s/ have smaller singleton–geminate ratios (≈ 1:1.99), whereas nasals and liquids (/n/, /l r/) exhibit larger ratios (≈ 1:2.50) (Lehtonen, 1970, p. 97). Almost all Finnish geminates occur intervocalically and include /lː/, /rː/, /mː/, /nː/, /pː/, /kː/, /sː/, and /ŋː/ (e.g., alla ‘below’, tarra ‘sticker’, lammas ‘sheep’, kissa ‘cat’, kengät [ˈkeŋːæt] ‘shoes’). In addition, voiceless stops /p t k/ and the fricative /s/ can be geminated after sonorants, a pattern not permitted in Italian (e.g., kansa ‘people’ vs. kanssa ‘with’). By contrast, the gemination of less common voiced stops (/b d ɡ/), fricatives (/f ʃ/), and approximants (/ʋ j/) is marginal, mostly limited to dialects or loanwords. The glottal fricative /h/ appears geminated only in rare cases, such as hihhuli ‘religious fanatic’ (Suomi et al., 2008, p. 41).
In addition, Finnish allows morphophonological alternation such as degemination in suffixation (e.g., kaappi ‘cupboard’ vs. kaapissa ‘in the cupboard’) or gemination at morphological boundaries (e.g., mene ‘go!’ vs. mene pois [ˈmeneppois] ‘go away’) (Spahr, 2011, pp. 8–10).
The Finnish consonant inventory lacks some Italian segments, including the palatal nasal /ɲ/, palatal lateral /ʎ/, the fricative /z/ and the affricates /dz/, /ts/, /dʒ/, /tʃ/. Nevertheless, the robust quantity system and consistent orthographic marking likely provide Finnish learners with an advantage in perceiving durational contrasts compared to learners from languages without such features.
2.1.5 Spanish
Spanish is the only non-quantity language in this study, as its phoneme inventory lacks both vowel and consonant length. Segment duration does not carry lexical meaning, and vowel or consonant lengthening does not change phoneme identity. As a result, Spanish speakers may have limited perceptual sensitivity to length distinctions, which could affect their acquisition of Italian quantity. Alternatively, Spanish speakers may be able to effectively exploit the durational dimension as a blank-slate dimension and may be revert to it in L2 even for L2 contrasts for which duration is not the primary cue (Escudero & Boersma, 2004).
Phonetically, Spanish vowels vary in duration depending on stress and syllable structure, but these differences are not phonemic. Alfano et al. (2009) report that stressed vowels are significantly longer than unstressed ones only in oxytones (words stressed on the final syllable), whereas paroxytones and proparoxytones show no systematic durational differences. Other studies confirm that stressed syllables tend to be longer than unstressed syllables independently of the word position (Hualde, 2005, p. 244; Ortega-Llebaría, 2006, p. 111), but duration is only a secondary cue to lexical stress. All five oral vowels /a e i o u/ are realized with a relatively stable quality across contexts and show no systematic tense–lax or long–short opposition.
Spanish also lacks contrastive consonant length, except for the length contrast in the rhotics (see below). Fake geminates may occur at morpheme or word boundaries (e.g., subbloque ‘block segment’, innato ‘innate’, conmovido ‘moved’), but they are not phonologically contrastive. Nasal clusters like [nn] or laterals [ll] are typically realized as a single long consonant, while stops and rhotics tend to reduce to singletons (Hualde, 2005, p. 97). Although Spanish lacks phonemic consonant-length contrasts, it has two contrastive rhotics, the trill /r/ and the tap /ɾ/, which stand in opposition only in word-internal intervocalic position (e.g., perro ‘dog’ vs. pero ‘but’). The two differ substantially in duration (ca. 82–88 ms vs. 23–30 ms; Martínez Celdrán & Fernández Planas, 2007; Quilis, 1993) and in the number of brief occlusions (trill: typically, two to three or more; tap: a single occlusion; Hualde, 2005, p. 181). Here Spanish listeners might have a relative advantage in perceiving the short–long contrast of Italian rhotics, which alternate between singleton and geminate. Meisenburg (2007) further observed that rhotic sequences across word boundaries may be produced with increased duration, though Spanish native listeners often do not perceive them as such.
In terms of consonant inventory, Spanish generally lacks segments such as /ts/, /dz/, /dʒ/, and /ʃ/, although the latter two may occur marginally in some American varieties (Argentina, Paraguay, Uruguay). The palatal lateral /ʎ/ is disappearing in many dialects due to yeísmo, that is, a process in which the /ʎ/ merges with the palatal fricative [ʝ].
2.2 Acquisition of Segmental Length in a Second Language
2.2.1 Conceptual Background
Current models consistently suggest that learners’ perception and production reflect the interaction between L1 phonological structure and the perceptual salience of durational cues. As outlined in the preceding section, segmental length serves distinct phonological functions across languages. For L2 learners, acquiring this feature involves not only detecting durational differences but also reinterpreting their phonological relevance. Moreover, geminates entail complex temporal coordination across syllable boundaries and require precise articulatory timing (e.g., Burroni et al., 2024; Dmitrieva, 2017; Ridouane, 2010). This poses challenges for learners unfamiliar with quantity contrasts, who must form new perceptual categories or reweight familiar acoustic cues (e.g., Best & Tyler, 2007; Bohn, 1995; Escudero, 2005; Flege, 1995). In sum, acquiring length involves both phonetic sensitivity (detecting durational differences) and phonological categorization (assigning functional meaning).
Speech perception research has a long tradition, providing numerous models that aim to explain how listeners interpret and categorize new or familiar contrasts (see Strange, 1995 for a historical overview). The acquisition of length in L2 can be viewed within general models of L2 speech perception, which assume that new contrasts are assimilated to existing L1 categories (SLM, Flege, 1995, 2003) or perceived in terms of L1–L2 similarity (PAM-L2, Best & Tyler, 2007). When durational differences do not correspond to any L1 contrast, learners may shift their attention to secondary cues such as intensity, vowel quality in the case of vowels, and closure duration, spectral correlates, or adjacent vowel duration in the case of consonants (e.g., Esposito & Di Benedetto, 1999). Conversely, when learner’s L1 includes quantity contrasts, they can transfer the functional short–long distinction, though the exact phonetic realization may differ (e.g., Bohn, 2020; Guillemot, 2018).
L2 perception of Italian consonant length can be understood through complementary theoretical perspectives. For example, according to Feature Hypothesis (see McAllister et al., 2002), learners are less sensitive to acoustic cue dimensions that are not used contrastively in their L1. Consequently, speakers of languages without phonemic consonant length are expected to be less attuned to durational differences and may instead rely on alternative, more robust cues, such as consonant quality or contextual information. Learners can also re-use or adapt phonological features available in their L1 (Brown, 1998); for instance, they may extend the L1 feature [± long], which encodes vowel length, to mark consonant length in L2 (Mah & Archibald, 2003, p. 211; De Clercq et al., 2014, p. 3). In contrast are models positing that a phonetic cue not used in the L1, such as duration cueing vowel identity in Spanish, represents an uncolonized, blank-slate continuum to which L2 learners may revert (and possibly overuse) when acquiring L2 contrasts (Escudero & Boersma, 2004).
The L2LP model (Escudero, 2005) predicts that learners perceive L2 sounds in terms of their L1 categories. L2 sounds similar to an L1 category may be assimilated, while distinct sounds may form new categories. Thus, learners without geminate contrasts in their L1 may initially map singletons and geminates onto a single category, whereas learners with L1 length contrasts can form separate L2 categories, facilitating more accurate perception and production.
Empirical evidence from different L2 contexts supports this view (Section 2.2.2) while also revealing its limitations. Guillemot (2018), for example, examined the production of Japanese geminate consonants by Italian, French, and English learners and found that L1 timing patterns strongly influenced L2 realization. All learner groups produced a clear durational contrast between singletons and geminates, comparable to that of native speakers, but Italian learners transferred their L1 vowel–consonant timing, producing shorter vowels before geminates—the reverse pattern of the Japanese, where the preceding vowel is longer before geminates. Learners without L1 length contrasts (French, English) showed no significant variation in vowel duration. Thus, the presence of an L1 length feature may facilitate category formation but hinder native-like timing control, in terms of phonetic implemenation.
2.2.2 Evidence From L2 Studies
Compared to vowel length, the perception of consonantal length has received relatively little attention in L2 research (Altmann et al., 2012, p. 392). Difficulties in perceiving or producing consonant quantity have been reported for several learner groups, such as L1 American English learners of Finnish (e.g., McGill, 2024), and learners of Japanese from various L1 backgrounds (e.g., Guillemot, 2018; Han, 1992; Hardison & Saigo, 2010; Hayes, 2002; Hayes-Harb, 2005; Lee & Mok, 2017; Mah & Archibald, 2003; Tsukada & Yurong, 2022). Existing studies addressing the perception and production of geminates in L2 Italian are still relatively scarce but encompass learners from a range of L1 backgrounds, as outlined below.
Altmann et al. (2012) examined the perception of non-native vowel and consonant-length contrasts in Italian by L1 Italian speakers, German learners of Italian, and German non-learners. The stimuli, which were presented in a speeded same–different discrimination task, consisted of several nonwords drawn from the GEMMA project (Di Benedetto, 2000). Results showed asymmetries in the perception of non-native consonantal and vocalic length contrasts: non-native vowel-length contrasts were perceived almost as reliably as native consonant-length contrasts, whereas non-native consonant contrasts were substantially more difficult. Sensitivity was highest among L1 Italian speakers, followed by German learners, with non-learners performing least accurately. Reaction-time data indicated that prior experience with consonant quantity affects the type rather than the degree of perceptual difficulty, suggesting that learners’ processing had not yet reached native-like efficiency.
De Clercq et al. (2014) focused on the perception and production of Italian geminates by L1-Dutch learners. In an AXB discrimination task, participants perceived the singleton–geminate contrast with relatively high accuracy and produced longer consonants in geminate contexts, although the durational contrast was smaller than in native Italian speech. The authors interpret this success as a redeployment of the L1 vowel-length contrast, phonemically relevant in Dutch, to the consonantal domain (as reported in Mah & Archibald, 2003). The higher accuracy relative to Altmann et al. (2012) likely reflects methodological and lexical factors: De Clercq et al. used real, frequent words (e.g., sette ‘seven’—sete ‘thirst’), which provided lexical support, whereas Altmann et al. employed items, requiring listeners to rely solely on acoustic cues. Further factors such as language experience are not excluded.
Extending this line of research to learners from typologically diverse backgrounds, Tsukada et al. (2018) compared the identification of Japanese and Italian geminates by Korean and Australian learners of Japanese, as well as by native Japanese and Italian listeners. Korean and Australian (English) speakers, who had studied Japanese as a foreign language but had no knowledge of Italian, achieved over 80% accuracy even in the Italian task, suggesting a possible transfer of perceptual sensitivity to consonant length from one foreign language to another, previously unknown, language. In contrast, native Japanese and Italian listeners, although familiar with the feature in their L1, were slightly less accurate when perceiving geminates in the other, unfamiliar language. These findings indicate that the ability to perceive quantity contrasts is subject to limitations, even when the feature [+ long] exists in the listener’s native phonological system.
Finally, Feng and Busà (2022) examined Mandarin learners’ perception and production of Italian geminates. Thirty Chinese students at different proficiency levels and 10 native Italian speakers participated in a perception task and a reading task, involving minimal pairs. The results showed that, in both perception and production, Chinese learners could distinguish short and long consonants to some extent, but not in a native-like manner. Increased learning experience did not appear to enhance accuracy, suggesting that mastering Italian timing requires more than extended exposure to the target language.
Taken together, these studies indicate that the perception of consonantal length in L2 Italian is shaped not only by L1-specific phonological experience but also by a range of other factors, including task, consonant type, lexical familiarity, and the amount and quality of language exposure.
Production data largely mirror perceptual tendencies, while also showing that accurate perception does not automatically result in accurate production. Learners without an L1 quantity system often shorten geminates or fail to maintain stable singleton–geminate ratios, relying instead on prosodic or contextual cues (e.g., Pezzella, 2020, on L1-Albanian learners of Italian). By contrast, learners from quantity languages and advanced L2 speakers tend to approximate native-like durational contrasts earlier, although subtle phonetic deviations frequently persist.
Evidence from L1-German learners of Italian shows that while both naïve and advanced learners establish a durational contrast, native-like timing remains difficult to achieve even at higher proficiency levels (Kabak et al., 2011). Importantly, learners primarily expand the contrast by adjusting existing L1 categories (e.g., shortening singletons) rather than by developing a fully target-like representation of gemination. Similar partial adaptation is reported for L1-Estonian learners, whose production reflects transfer from a ternary length system (short–long–extra-long), resulting in overlengthening in specific consonant classes (Celata & Costamagna, 2012)
The role of proficiency and contextual demands is further supported by studies on L1-French learners, showing that advanced learners approach native-like realizations, whereas lower-proficiency learners exhibit greater variability, especially in cognitively richer contextual conditions (D’Apolito & Gili Fivela, 2019). Beyond controlled speech, spontaneous production data reveal generally modest accuracy rates across learner groups (English, German, Spanish), with performance modulated by manner of articulation, voicing, stress, and prosodic salience (Sorianello, 2014). Prosodic structure emerges as a crucial factor shaping durational control, as confirmed also by evidence from L1-Chinese learners, for whom stress and tonal prominence guide geminate production (Costamagna et al., 2014).
Sociolinguistic and contact-related factors further modulate durational patterns. In multilingual urban settings, segmental length can be acquired through dialect contact, as shown by the adoption of RF by Nigerian speakers in Turin via accommodation to southern Italian varieties (Romano & Mazzaferro, 2014). In heritage contexts, perception and production of gemination reflect reduced or variable exposure: while first-generation immigrants retain more robust contrasts than later generations (Celata & Cancila, 2010), heritage speakers (HSs) often pattern intermediate between L1 and L2 speakers in perception (De Iacovo et al., 2025), with mixed evidence for cross-linguistic influence in production (Einfeldt et al., 2019).
Overall, production evidence confirms that the acquisition of Italian consonant quantity is shaped by the interaction of L1 phonology, prosodic structure, proficiency, and exposure conditions. While learners may redeploy existing durational or prosodic resources to approximate native patterns, achieving target-like gemination requires fine-grained integration of segmental and prosodic timing, a challenge that often persists even in advanced L2 speakers and heritage varieties.
2.3 Research Questions and Hypotheses
Previous research on the acquisition of length in L2 Italian has highlighted several interacting factors that shape learners’ performance. Studies on production have shown that L2 speakers frequently reduce or neutralize consonant gemination, with variation depending on stress position, consonant type, and proficiency level. Research on perception has further demonstrated that durational cues are not equally salient across contexts, but are modulated by prosodic prominence and segmental properties. Against this background, the present study addresses four core research questions (RQ), focusing on the impact of L1 background (RQ1), stress position (RQ2), consonant type (RQ3), and proficiency level (RQ4) on the perception of length contrasts in L2 Italian. For each research question, we formulate a hypothesis grounded in previous findings, which guides the empirical investigation presented in this paper.
For vowels, perceptual studies have shown that that differences in (short–long) vowel duration are comparatively easy to perceive, even for learners from non-quantity languages (Bohn, 1995). Accordingly, we do not expect major group differences in pure discrimination of vowel duration. Furthermore, the task in this study does not involve judging whether two items are acoustically the same or different, but whether they belong to the same or to different lexical items (see Section 3). In this respect, learners from languages with phonemic vowel quantity (Finnish, Slovak, Czech, German) may be more prone to misinterpreting vowel-duration differences as lexically contrastive, whereas Spanish learners—having no vowel quantity like Italian—are expected to be closest to target. In addition, German learners might show a slightly different tendency because vowel quantity in German is largely stress-dependent and often accompanied by quality differences (e.g., tense vs. lax vowels), while Czech, Slovak, and Finnish share a more systematic quantity and quality. For vowel-length discrimination, we therefore predict the following hierarchy: (L1 Italian >) Spanish > German > Czech / Slovak / Finnish.
Overall, we expect L1 background to be one of the strongest predictors of learners’ perceptual performance on length contrasts in L2 Italian.
In addition, to ensure that listeners relied on consonant rather than vowel duration, a manipulated condition was created in which the vowel in post-stress position was shortened (e.g., sà
In addition, we report accuracy patterns across individual consonant categories (laterals, nasals, rhotics, stops, fricatives, and affricates) and learner groups. This allows us to assess whether cross-linguistic differences in segmental inventories modulate the perceptibility of length contrasts.
3 Methods
3.1 Participants
A total of 104 adult learners of Italian participated in the study. The sample comprised 25 Czech speakers (6M, 19F, mean age = 29), 23 Finnish speakers (2M, 20F, 1 non-binary, mean age = 35), 28 German speakers (6M, 22F, mean age = 27), 20 Spanish speakers (2M, 18F, mean age = 26), and 8 Slovak speakers (8F, mean age = 27). 5 Almost all participants had grown up in monolingual families, except 14 individuals who reported knowledge of an additional, non-dominant L1 in the sense used by, for example, Grosjean (1982, 2008) or Montrul (2011, 2013). These included six German participants, two with French (Ge02, Ge15), one with Serbo-Croatian (Ge10), one with Hindi (Ge07), one with Russian (Ge06) and one with English (Ge12); one Finnish participant with English (Fi18), and three Spanish participants, two with Galician (Sp10, Sp11) and one with Catalan (Sp16). Only Hindi and Serbo-Croatian are quantitative languages and can provide an advantage to our learners. 6 In addition, two German (Ge01, Ge04), one Finnish (Fi22), and one Slovak (Skv06) participant reported a direct family connection to Italian, through an Italian parent or grandparents. One of these HSs (Ge01) acquired Neapolitan during his childhood, while the other three reported exposure to Southern or Central Italian variety. All four HSs, university students of Italian, reached B2 (Ge01) or C1 level (Ge04, Fi22, Svk06), showed high motivation, and displayed balanced competences across pronunciation, writing, and vocabulary. We acknowledge that HSs require special consideration (e.g., Montrul, 2011; Polinsky, 2015). Nevertheless, they were included in the sample to enable meaningful comparison with other L2 learners, and to explore whether growing up with Italian or another quantity language—albeit with reduced input—offers any perceptual advantage in acquiring geminate consonants. It should be noted at the outset that the HSs did not reach the highest accuracy levels within the learner group, indicating that they did not display a specific perceptual advantage compared to other L2 learners. Their results will be reported separately and transparently.
Furthermore, a control group of 35 native speakers of Italian was recruited: 21 participated in a laboratory and 14 online; in both cases, the Labvanced platform (Finger et al., 2017) was applied. No systematic differences were observed between the two testing modes based on descriptive inspection of accuracy and response-time measures. However, one online participant was excluded from the final analysis because her response-time profile indicated irregular task engagement, leaving 34 control participants in the final dataset. All control participants had a high level of education and came from different regions of Italy, providing an authentic and representative baseline of native perception. This regional diversity is important, since it reflects the natural variation learners are likely to encounter in real communicative situations.
The participants reported normal hearing and no history of speech or language disorders. Learners were recruited through university courses and social networks. Their linguistic background was documented via a detailed biographical questionnaire (Pešková, 2025), and proficiency was assessed with the DIALANG vocabulary size test (Alderson, 2005; Chapelle, 2006). 7 Proficiency levels ranged from A (basic user) to C (proficient user) according to the CEFR. Participation in the study was voluntary, and all participants provided written informed consent prior to the experiment. The study adhered to the ethical guidelines of the Free University of Berlin and was approved by the relevant ethics committee. Participants received a small financial compensation for their time (1 hour).
The Czech participants were originally from central Bohemia and southern Moravia, the German participants from northern Germany, the Finnish participants from Helsinki and its surroundings, the Slovak participants from western Slovakia, and the Spanish participants from various regions of the Iberian Peninsula, with three additional participants originating from Latin America, who had been living in Spain for more than 5 years (one from Colombia and two from Venezuela, all from areas that do not exhibit any gemination). 8
Further details on the linguistic biographies of the learners are provided here. All L2 learners studied Italian primarily in their home countries, either at university or in language schools. Their learning histories typically involved 1–5 years of classroom instruction, mostly with fewer than 10 hours per week. Many described themselves as visual or auditory learners with high motivation. Within this group, a study-abroad subgroup had spent several months in Italy, most commonly as Erasmus students. Their proficiency was typically B2–C1, and their self-ratings indicated comparatively strong pronunciation and writing skills. Some learners were motivated by personal relationships with Italians (friends or partners).
Overall, the dataset reflects diverse trajectories, which is very typical of home-country learners of any L2 and highlights their strong heterogeneity and variability. It should be noted that it was very difficult to find learners who would build a homogeneous group in terms of a single L1 variety, time spent abroad, and other factors, so the experiment was open to everyone interested in participation. The participants differ not only in age, onset, and length of learning but also in their learning settings and learning styles. This heterogeneity extended to the varieties of Italian to which learners had been exposed. A substantial subgroup responded ‘I don’t know’ (N = 67) when asked about Italian variety they learned or spoke, suggesting that many classroom-based learners conceived of Italian primarily as an abstract standard. Where specified, the most common variety was Standard (N = 11) or Central Italian (N = 13), while Southern (N = 6) and Northern (N = 7) varieties were less frequent. Many learners reported exposure to different Italian varieties through teachers of diverse regional origins, contacts with Italians from different parts of Italy, and travel to various regions. Despite this heterogeneity, we consider the results representative, since the diversity of learner backgrounds offers a realistic picture of the conditions under which Italian is typically learned outside Italy.
3.2 Stimuli
To assess sensitivity to changes in Italian consonant length and differences in the duration of vowels, we designed an AX discrimination task that is easy for the listeners, time-efficient, and that ‘reduces the load on auditory memory’ (Gerrits & Schouten, 2004, p. 364). Audio stimuli consisted of 45 three-syllable pseudoword targets, prerecorded by a male native speaker of (Standard) Italian and a trained phonetician. In this task, the expert read a list of isolated pseudowords using the same declarative (falling) intonation throughout. The stimuli were presented with a grave accent on the stressed syllable to the speaker (e.g., tècino, sàpolo, nògere). Participants were explicitly instructed that they would hear words in Standard Italian. All pseudowords conformed to Italian orthography and phonotactic constraints and did not correspond to any meaningful referent (see, e.g., Stark & McClelland, 2000). We also ensured that the words did not exist in any of the other languages under investigation.
To verify the durational differences of the stimuli in the length conditions, the durations of the respective long and short consonants were calculated based on L1 speaker productions. The critical consonants showed clear durational contrasts between geminate and singleton consonants across all items, with mean duration of 185 ms for geminates and 96 ms for singletons (for details, see Pešková, 2025).
Each target pseudoword pair differed only in consonant length (singleton vs. geminate, cued by short vs. long consonant interval). Nine consonant categories were tested—voiced and voiceless stops, affricates, fricatives, nasals, laterals, and rhotics (1)—and the contrasts were systematically embedded in three prosodic conditions (2). We tested only segments that present lexical geminates in Italian.
(1) Voiced stop [bː]/[b] sabburo / saburo Voiceless affricate [tːʃ]/[tʃ] teccino / tecino Voiced affricate [dːʒ]/[dʒ] noggere / nogere Voiceless fricative [fː]/[f] caffaro / cafaro Voiced fricative [vː]/[v] dovveda / doveda Rhotic [rː]/[r] murrovi / murovi Nasal [mː]/[m] pammodo / pamodo (2) a. Post-stress (e.g., sàppolo vs. sàpolo) ˈV+Cː vs. ˈVː+C b. Pre-stress (e.g., sappòlo vs. sapòlo) V+Cː vs. V+C c. Unstressed (e.g., sappolò vs. sapolò) V+Cː vs. V+C
Mean consonant durations (excluding rhotics) revealed a consistent singleton–geminate contrast across all prosodic conditions: Geminates were longer than singletons in post-stress (184 ms vs. 98 ms), pre-stress (203 ms vs. 101 ms), and unstressed positions (168 ms vs. 93 ms). The rhotics differed in the number of occlusions, with one occlusion in all singleton realizations, and three occlusions in geminate realizations in pre-stress and unstressed positions, and four occlusions in post-stress position.
To control for the potential influence of vowel duration in post-stress syllables (2a), an additional set of manipulated items was created. Recall that in Italian, non-final stressed open syllables typically contain a long vowel before singletons (2a) (e.g., sàpolo [ˈsaːpolo]); if left unadjusted, learners might rely on vowel duration rather than on consonant duration. These manipulated items were not intended to represent licit Italian contrasts, but to test how learners interpret consonant duration when the vowel cue is missing. To minimize this potential bias, long consonants in this condition were manually shortened in Praat (Boersma & Weenink, 1992–2025) by deleting several full waveform cycles from the central part of the segment (3a). This yielded mean durations of 184 ms for long consonants and 87 ms for short consonants.
In addition, a separate manipulated condition targeting only vowel duration (e.g., sàpolo, [ˈsaːpolo] vs. [ˈsapolo]) was incorporated into the experiment (3b). In this condition, long vowels had a mean duration of 201 ms, and their short counterparts had a mean duration of 126 ms.
(3) a. Consonant duration only ˈV+Cː vs. ˈV+C b. Vowel duration only ˈVː+C vs. ˈV+C
In total, the discrimination test comprised 45 target pairs (36 consonant-length contrasts and 9 vowel-length contrasts) and 30 filler pairs with identical duration to reduce response bias. The experiment was implemented in the Labvanced platform (Finger et al., 2017) and stimuli were presented via Sony MDR-7506 professional headphones, which provide a flat frequency response and ensure faithful reproduction of fine acoustic detail.
3.3 Procedure and Data Analysis
All participants were tested individually in soundproof laboratories or silent cabins at the participating universities (Berlin, Brno, Prague, Helsinki, Salamanca). The experiment was run on a laptop, and each participant completed 75 randomized trials (45 target pairs and 30 fillers). A trial began with 500 ms of silence, followed by the first stimulus, then 1,600 ms of silence, and finally the second stimulus (Altmann et al., 2012, p. 398). A further 500 ms of silence concluded the trial. Participants indicated whether the two items corresponded to the same or to different word by clicking ‘YES’ or ‘NO’ on the screen using a mouse. Before each response, they first positioned the mouse on a designated circle in the center of the screen; the audio playback started once the mouse was in place. In addition to discrimination responses (‘YES’ vs. ‘NO’), reaction times (RTs) and mouse-tracking (MT) trajectories were recorded to capture processing and decision dynamics. RTs and MTs were time-locked and measured simultaneously. A short training phase with feedback preceded the main task to ensure that participants fully understood the task. All answers and timing data were logged with a button box. Participants were not given a response deadline. In the present study, we focus on the 45 target pairs that differed in either vowel or consonant length and report only discrimination rates and RTs.
Participants’ discrimination accuracy was coded as 1 (correct) and 0 (incorrect). These binomial accuracy scores were analyzed using generalized linear mixed-effects models in R (R Core Team, 2017) with the lme4 package (Bates et al., 2015). The ggeffects package (Lüdecke, 2018) was used to estimate means and 95% confidence intervals for pairwise comparisons. The first model analyzed discrimination accuracy for the consonantal length contrasts, modeling the main effects of the fixed factors L1 (reference level = Italian, five contrasts with Finnish, Czech, Slovak, German, Spanish), Consonant category (reference level = voiced obstruents, two contrasts with voiceless obstruents and sonorants), and Stress position (reference level = post-stress, two contrasts with pre-stress and unstressed), as well as the two-way interactions L1: Consonant category and L1:Stress position. The random-effects structure modeled per participant intercepts. We attempted to fit models including random slopes for within-participant predictors (Stress position and Consonant category). However, maximal models resulted in convergence failures. Following standard recommendations (Matuschek et al., 2017), we adopted the maximal random-effects structure supported by the data. Note that L1 is a between-subjects factor and therefore cannot be estimated as a by-participant random slope.
The second model analyzed discrimination accuracy for vowel-duration differences, with L1 as a fixed factor and per participant random intercepts. The third model analyzed only the L2 learner data and tested the fixed effects of L1, Proficiency (numeric, 0.25 representing A-level, 0.5 for B-level, and 0.75 for C-level), and Sound category (sum-coded, vowel vs. consonant length).
Two other models, linear mixed-effects models (LMERS), were used to analyze RTs with log-transformed RTs to normalize distributions. The fixed- and random-effects structures of these LMERS were identical to those in the first two GLMERS described above. Significance of fixed effects was determined using α = .05.
4 Results
4.1 General Picture
Across all participants (both L2 Italian learners and L1 controls), the overall discrimination rate was 69.3%. This means, in about 70% of all cases (N = 6,210), the length contrast between two stimuli was detected. Individual discrimination scores and density plots of the data for consonant length and vowel-duration discrimination, per participant L1, are shown in Supplemental Figures S1 and S2 and the output of the models is given in Supplemental Tables S1–S5 in the Appendix.
There is large variation observed in the data that can be attributed to several sources. First, learners’ L1 background determined to what extent they could rely on pre-existing length categories when perceiving Italian contrasts (4.2). Second, stress position influenced perceptual discrimination, with some stress position contexts facilitating discrimination compared to other contexts (4.3). Third, consonant type played a role, as lengths contrast were easier to process in certain types of consonants than in others (4.4). Finally, proficiency level contributed to interindividual differences, with more advanced learners performing more accurately than beginners (4.5).
4.2 Language of Learners
We first examined whether the learners’ L1 influenced their ability to discriminate singleton–geminate contrasts in Italian. The mixed-effects logistic regression revealed a main effect of L1 background (all L1 contrasts, slopes were estimated between -1.7 for Finnish to -3.3 for Spanish, all p = .001 or smaller). Numerically, the mean discrimination scores suggested a non-significant trend from Finnish - Slovak - Czech - German - Spanish, which seems to follow the hierarchy of the role of phonemic length in those languages (Figure 2).

Modeled results for consonant-length discrimination.
Finnish participants, whose L1 includes both vowel and consonant length, were numerically closest to Italians, but still had significantly lower accuracy than the Italians. The confidence intervals suggest that L1 German and L1 Spanish learners had lower accuracy than the Finnish, Czech, and Slovak learners. This result is consistent with the learners’ experience with length in their L1s. Spanish lacks length contrasts (apart from the rhotics), and although German possesses a phonemic vowel-length contrast, duration does not occur as a cue consistently across all vowel or consonant categories, unlike in Finnish, Czech, and Slovak.
RT analyses provided further evidence of L1 effects. L1 speakers of Italian responded faster than all learner groups, suggesting that processing of consonant length is automatic for them but effortful for L2 learners.
A different picture emerged with vowel duration. Unlike for consonants, the Italian control group showed very low sensitivity, confirming that vowel quantity is not used contrastively in Italian (see Bertinetto & Vivalda, 1978; Rochet & Rochet, 1995). Recall that participants were asked whether two stimuli represented the same word, not whether they merely sounded same or different. Thus, lower discrimination rates do not signal auditory insensitivity, but rather an adjustment to the functional role of length in the target language.
Learners seem to have performed less well on ignoring the length contrast in vowels than they performed on discriminating the length contrast in consonants. The results of the GLMER (Supplemental Table S3) show that all learner groups differed from L1 Italian speakers (estimated slopes were between +1.7 and +2.4, all p < .001). The estimated mean accuracy of vowel-duration discrimination ranged approximately between 50% and 65% across groups (Figure 3), that is, either at chance level or significantly above it (for Finnish and Czech learners), indicating that learners did not learn to ignore the length contrast in Italian vowels.

Modeled results for vowel-length discrimination.
RTs further underscored the difficulty of this condition. Responses were observed to be consistently slower in vowel-length than in consonant-length trials, indicating greater uncertainty and less efficient processing, even in native Italian listeners (mean RT approx. 950 ms for consonant and 1,500 ms for vowels).
Taken together, the status of phonemic length in the L2 learners’ L1 seems to relatively well predict discrimination of Italian consonantal length. For the discrimination of Italian vowel length, it seems that the existence of the L1 length cue at least partially hinders the learners’ ability to ignore the vowel-duration differences in Italian.
Let us now briefly report the specific results for individual differences. The average discrimination rate by every listener is summarized in Figure 4, with HSs highlighted. The two German HSs (Ge01, Ge04) performed rather similarly and reflected the general tendency: their average discrimination rates (60%) were lower than those of the Finnish and Slovak HSs. Although the Finnish participant (Fi22) reached a rate of 87%, she was outperformed by seven other intermediate or advanced learners. The Slovak HS (Skv06) showed a high score (84%) but was surpassed by another B2 learner (Skv01, 97.8%). Interestingly, some variation was also observed among the L1-Italian control group. The participant Ita15 (male, Tuscany–Florence) and two young female speakers from Rome (Ita02, Ita03), a variety with clear gemination, showed markedly lower scores (particularly in the discrimination of consonantal length) compared to the rest of the L1 group. While no dialectal or age-related explanation seems likely, such variability could stem from task-related, attentional or individual perceptual factors, though this remains speculative at present.

Observed average discrimination rates per participant (HSs in blue).
4.3 Prosodic Condition
The ability to perceive differences in length according to stress position follows the hierarchy Post-stress > Pre-stress > Unstressed (cf. Dmitrieva, 2017; Sorianello, 2014). The mixed-effects logistic regression revealed a main effect of Stress position, showing that consonant length was more difficult to discriminate in the unstressed position than in post-stress position (slope = -1.704, p < .001) (see Figure 5). Interactions between stress position and learner L1 (Supplemental Table S1) further showed that German listeners also exhibited significantly lower discrimination accuracy in pre-stress position compared to post-stress position. By contrast, for Finnish, Czech, and Spanish learners, the decrease in discrimination accuracy associated with the unstressed position was smaller than that for native Italian listeners.

Estimated accuracy by participant L1 and stress position (consonant length only).
Overall, the pattern of results confirms that prosodic prominence facilitates discrimination, with unstressed syllables being the weakest context overall.
4.4 Type of Consonant
The type of consonant also influenced the perception of length contrasts and reflects previous findings on the perception of consonant duration (Dmitrieva, 2017): length in sonorants (nasals, laterals, rhotics) was significantly easier to discriminate than length in voiceless obstruents (plosives, fricatives, affricates; mean slope = 0.864, p = .006), while no significant difference between voiceless and voiced obstruents was detected (see Supplemental Table S1 and Figure 6).

Estimated accuracy per Participant L1 and Consonant category (consonant length only).
Language-specific effects were also detected: the two-way interaction of L1 group and Sound type reveals that the difference between sonorants and obstruents was in L1 Finnish learners (slope = 1.314, p = .006), and marginally in L1 Spanish learners (slope = -.676, p = .073), larger than in native Italian listeners.
Examining the individual consonant types (Figure 7), Finnish learners reported a difference in 97% in the voiceless plosives, a category that is very frequently geminated in Finnish. Spanish learners, in turn, achieved even 100% in rhotics, most likely because tap and trill exist in their L1 (e.g., pero ‘but’ vs. perro ‘dog’). The Italian listeners showed almost uniformly high sensitivity across all consonant types, confirming that they can reliably detect durational differences irrespective of segmental class. In addition, voiced affricates and fricatives were perceived with higher accuracy than their voiceless counterparts, whereas the opposite pattern emerged for stops, with voiced stops eliciting lower accuracy than voiceless stops.

Observed discrimination rate in the raw data (%) by Consonant category and L1 group.
Figure 7 shows that sonorants and voiceless plosives seemed to be easiest to discriminate overall, affricates remained the most challenging. Importantly, the relative advantage of individual consonant categories varied across learner groups, suggesting that transfer from the L1 inventory can facilitate perception (as voiceless plosives in Finnish) and the lack of sounds can provide obstacles (as affricates in Finnish).
The findings indicate that manner of articulation plays a systematic role in shaping the perception of consonant length and further underscore the interaction between universal perceptual tendencies and language-specific transfer: learners draw on contrasts familiar from their L1 whenever possible, yet they continue to experience difficulties with segments that carry little functional load, that is, a limited role in distinguishing meanings, in their native language.
4.5 Proficiency
For consonant length, the variable proficiency suggests a gradual improvement in discrimination ability with higher proficiency. Within some groups, intermediate learners seemed to have performed on par with advanced learners, and in others, differences across proficiency levels were minimal. In the Finnish group, even lower-proficiency learners already reached relatively high discrimination rates, in comparison to the German and Spanish learners.
As for vowel-duration discrimination, here we see more variation and a tendency toward native-like range (see Figure 3) only in Slovak, Czech, and Spanish C-groups.
It is worth noting that the distribution of proficiency levels is unbalanced (Table 1), with level B clearly overrepresented across all language groups, while level A occurs least frequently.
Distribution of Proficiency Levels by Language.
The GLMER results shown in Figure 8 and Supplemental Table S5 reveal a significant positive effect of proficiency, in the baseline condition, indicating an association between higher proficiency and better discrimination of consonantal length in the Finnish group (i.e., the reference levels for sound type and L1, respectively). There was also a significant interaction between proficiency and sound type, suggesting that (again, in the Finnish group) proficiency modulated consonant-length discrimination more strongly than vowel-duration discrimination. Significant three-way interactions between proficiency, sound type, and L1 indicated that the sound type-specific effect of proficiency in the Czech and German groups differed from that of the Finns. As seen in Figure 8, in German speakers, the effect of proficiency appears to be only slightly larger for vowels than for consonants. Crucially, however, in Czechs, proficiency affects duration discrimination in vowels and consonants in opposite ways: higher proficiency improves Czechs’ discrimination of Italian consonant length and at the same time improves their ability to ignore differences in Italian vowel duration (yielding a pattern closer to native Italian durational discrimination).

Estimated accuracy per Learner L1, Proficiency and Sound category (consonant vs. vowel).
Considering all learner groups, the pattern of results indicates that global proficiency can be a partial predictor of perceptual success in distinguishing consonantal length but is somewhat less reliable for predicting success in ignoring the non-contrastive durational variation. These findings reflect earlier observations that perceptual development in L2 cannot be reduced to proficiency level alone (e.g., Feng & Busà, 2022).
5 Discussion
5.1 General Discussion
The first hypothesis (H1) predicted that learners’ sensitivity to segmental length in Italian would reflect the structure of their L1 phonological systems. This prediction was confirmed. The overall hierarchy observed in the discrimination of consonantal length (L1 Italian > Finnish, Slovak, Czech > German, Spanish) demonstrates that learners whose native language encodes a contrastive length feature, whether in vowels or consonants, are better able to detect durational differences in the L2. Finnish learners, whose L1 distinguishes both vowel and consonant quantity, came closest to native performance and responded fastest, suggesting that they could transfer a well-established durational feature and adapt it to the consonantal domain in Italian. Several Finnish participants even outperformed the native speakers, particularly for stops and laterals.
Czech and Slovak learners, familiar with vowel quantity but not consonant gemination, achieved an intermediate position, supporting the view that L1 vowel-length distinctions can facilitate the perception of L2 consonant length through partial feature redeployment (De Clercq et al., 2014; Mah & Archibald, 2003; Tsukada et al., 2018). In contrast, Spanish learners, whose L1 lacks a systematic consonant-length contrast, displayed the lowest overall accuracy. However, their exceptionally high performance with rhotics (/r/–/rː/) suggests that positive transfer can occur when the L1 provides a comparable opposition within the same segmental class, reflected here in sensitivity to durational cues in rhotics. Although the phonological status of the Spanish tap–trill distinction remains debated—whether it reflects a length-based contrast or a segmental opposition (Hualde, 2005, pp. 182, 184)—this ambiguity does not affect the present interpretation, as the stimuli differed systematically in the number of occlusions (three to four brief occlusions in geminates vs. a single brief occlusion in singletons; see Pešková, 2025, pp. 172–173). Interestingly, Slovak learners did not show a comparable advantage and did not outperform Czech learners, despite the presence of long liquids in their L1. A plausible explanation lies in the phonological distribution of these segments: Slovak long /lː/ and /rː/ occur only in syllable nuclei, and therefore in a prosodic context that differs from Italian geminates and from Spanish trills, which are restricted to onset position.
As for the German learners, they performed less accurately than the Czech and Slovak groups despite the presence of vowel quantity in German. The finding suggests that German vowel length, being closely tied to vowel quality and stress (e.g., Sendelmeier, 1981), does not provide a suitable perceptual analog for Italian gemination and may even obscure the relevant durational cue. Whereas Spanish learners must establish consonantal length as a new phonological feature, German learners may experience interference from the interaction of vowel length and vowel quality in their L1. This may hinder the establishment of a clear consonant-duration category in their Italian.
Turning to the discrimination of vowel duration, the expected hierarchy could not be confirmed as there were no reliable differences between the learner groups (partially due to high variation); the data showed the pattern L1 Italian > German, Spanish, Slovak, Czech, Finnish. Considering the observed means in the data, the German group followed the L1 controls more closely than anticipated, while the remaining groups performed comparably. Notably, RTs were longer in the vowel-length than in the consonant-length condition, suggesting that learners, as well as the native controls, required more processing time when evaluating non-lexical durational cues. This delay likely reflects the additional cognitive effort needed when duration is not phonologically contrastive or when cues are inconsistent across the learners’ L1 and the target language (Altmann et al., 2012). The wide individual variation, ranging from near-ceiling to floor-level performance, further indicates that vowel-length perception may be unstable for L2 listeners. This is plausibly due to the allophonic status of vowel duration in Italian, which may confuse learners accustomed to its phonemic use in their L1.
More generally, the present findings underscore the role of the L1 background in shaping perceptual sensitivity to durational contrasts. Another recent cross-linguistic study by Lee et al. (2026) suggests that the role of duration in L1 phonology alone is not sufficient to predict the perception of non-native quantity contrasts. Although Japanese listeners, whose L1 has systematic phonemic quantity contrasts, outperformed in discrimination and identification tasks, French listeners, despite lacking phonemic quantity, did not perform worse than Cantonese or English listeners, whose languages make more limited use of durational contrasts. One important difference from the present study lies in the experimental design. Lee et al. tested naïve listeners using resynthesized stimuli with a restricted set of consonants and duration as the sole cue. In contrast, our study examined L2 learners in a lexical discrimination task and included a broader range of consonant types. Also, as our results show (see H3), discrimination performance varied across consonant classes. This greater segmental diversity and phonological complexity may partly account for the small discrepancy between the two studies.
As for the second hypothesis (H2), the effects of stress position were confirmed. Listeners discriminated length contrasts most successfully in pre- and post-stress positions, whereas accuracy dropped in unstressed, less salient positions. This pattern also aligns with our test materials: the stimuli showed smaller geminate–singleton ratios in most cases precisely in the unstressed context (e.g., sappolò/sapolò, see Pešková, 2025). This outcome is compatible with Dmitrieva (2017), who found that consonant-length perception is facilitated after stressed vowels and in contexts that enhance durational contrasts. The manipulated post-stress condition ([ˈV]+ Cː vs. [ˈV]+ C), in which vowel duration was shortened, yielded the best results, suggesting that excessive vowel length can mask the relevant consonantal cue. Once vowel duration was neutralized, discrimination improved markedly across all five learner groups (with accuracy rising from observed discrimination rates in the raw data of 40%–82% in the unmanipulated condition to 75%–95% in the manipulated one). This pattern may be explained by the fact that in the natural, unmanipulated condition (ˈV + Cː vs. ˈVː + C) both items contain a long segment, either a vowel or a consonant. Learners may therefore perceive the two words as equally long overall and struggle to decide whether they are the same or different. When vowel duration is neutralized, the contrast is clearer, since the items differ only in one, consonant, duration. Importantly, this effect was not observed in the L1 Italian group, who showed almost equally high performance across both conditions (95.4% post-stress vs. 93.5% post-stress-manipulated). This indicates that L1 Italian listeners prioritize consonant duration in lexical discrimination, without implying that vowel duration plays no role in Italian prosodic processing more generally. In line with the L2LP framework, these findings suggest that learners gradually adjust the weighting of perceptual cues, reallocating attention from vowel to consonant duration as they acquire the Italian contrast (Escudero, 2005; van Leussen & Escudero, 2015).
The third hypothesis (H3) was also confirmed. The effect of consonant type revealed that sonorants and voiceless plosives yielded the highest discrimination scores, whereas fricatives and affricates posed the greatest difficulty. This pattern is broadly consistent with previous studies showing that durational differences are more perceptible for segments with continuous acoustic energy (e.g., nasals, laterals, vibrants) or with clearly defined closure–release transitions (e.g., voiceless stops), and less so for affricates or fricatives, whose complex temporal structure can obscure durational cues (cf. Dmitrieva, 2017). Interestingly, voiced stops (e.g., sabburo/saburo) elicited lower accuracy (with the exception of Finnish learners and native listeners). One plausible explanation is that voicing reduces the perceptual salience of closure duration, a pattern consistent with Esposito and Di Benedetto (1999), who reported smaller geminate–singleton contrasts and weaker perceptual distinctiveness for voiced stops in Italian. In this line, our naturally produced test items (Pešková, 2025, pp. 172–173) showed that the geminate–singleton ratio was substantially larger for voiceless than for voiced stops (approximately 2.5–2.8:1 vs. 1.6–1.7:1). Sonorants exhibited relatively large durational ratios (around 2.5:1), whereas fricatives and affricates displayed more variable patterns—sometimes with higher nominal ratios in voiced contexts, though their complex temporal structure likely masked the perceptual cue.
L1-based effects also emerged. Slovak learners, whose L1 includes several affricates, performed more accurately than Finnish learners, whose phonological system lacks this consonant type. This indicates that familiarity with specific segmental categories facilitates the perception of durational differences even within the same manner class; that is, learners transfer fine-grained phonetic knowledge from similar L1 segments to the corresponding L2 categories (Chládková et al., 2013). Conversely, Finnish learners excelled with voiceless plosives, a gemination type that is both frequent and phonologically contrastive in Finnish, while Spanish learners showed a relative advantage with rhotics, reflecting the functional tap–trill opposition in their L1, as already discussed above.
Finally, the fourth hypothesis (H4), predicting an effect of proficiency, was partially supported. Proficiency did not emerge as a robust main predictor across learner groups and sound types; however, higher proficiency was associated with improved performance in specific conditions. In particular, C-level learners tended to show higher accuracy than A-level learners, especially for consonantal length discrimination (Altmann et al., 2012; Hardison & Saigo, 2010; Hayes-Harb, 2005; Tsukada & Yurong, 2022). For vowels, L1-specific patterns were observed, most notably in the Slovak, Spanish, and Czech groups, where increasing proficiency was associated with reduced sensitivity to non-contrastive durational variation, yielding perceptual behavior more consistent with Italian phonological contrasts.
The limited effect of proficiency and individual patterns align with Feng and Busà (2022), who observed that increased exposure or advanced proficiency does not necessarily lead to more native-like sensitivity to timing contrasts. It appears that perceptual tuning develops relatively early and stabilizes thereafter, while later gains in lexical or grammatical proficiency have smaller or instable impact on segmental perception. The finding also raises the question of how well vocabulary-based tests, such as DIALANG, capture phonological competence. To our knowledge, there are no standardized measures that reliably assess phonological skills in L2. Since phonological representations depend crucially on the amount and quality of auditory input, limited or inconsistent exposure may lead learners to fossilize suboptimal patterns early in acquisition (see, e.g., Derwing & Munro, 2015). These findings highlight the need for perceptual training at the initial stages of L2 learning and underscore the benefits of explicit focus on segmental perception. Individual variation was substantial across all groups, pointing to the influence of extra-linguistic factors such as prior language experience, weekly study time, exposure to Italian outside the classroom, time spent in Italy, social contact with Italian speakers, motivation, and general aptitude (e.g., Abu El Adas et al., 2025; Colantoni et al., 2015; Wen et al., 2023). These variables merit systematic investigation in future research, as they may modulate perceptual sensitivity together with the L1 background.
Our findings can be interpreted within a feature-based perspective on L2 phonological acquisition. According to the theory of phonological interference in Brown (1998, 2000) learners transfer individual phonological features from their L1 to the L2 when these features are perceptually salient and active in the native system. In the present study, the feature [± long] facilitated learning when it is contrastive and associated with the same segmental class. Finnish learners, whose L1 encodes length in both vowels and consonants, transferred this feature successfully to Italian. Czech and Slovak learners, familiar only with vowel quantity but not consonant gemination, could only partially redeploy the feature across segment types. In contrast, German learners showed limited benefit despite having vowel quantity, likely because in their L1 the feature interacts with vowel quality and stress, which reduces its transparency. Spanish learners, whose L1 lacks quantity contrasts altogether, had to acquire the feature anew. This gradient pattern illustrates how the availability and distribution of phonological features constrain the acquisition of new contrasts.
Interpreting these results within the SLM (Flege, 1995, 2003; Flege & Bohn, 2021), perceptual similarity between L1 and L2 categories determines whether new phonetic categories can be established or whether existing ones are merely adapted. When an L2 contrast is perceived as similar rather than new, it tends to be merged with an existing L1 category, hindering the formation of a distinct perceptual boundary. This helps explain why learners from non-quantity languages, or from systems where duration interacts with other cues, showed reduced accuracy and slower processing.
The L2LP model (Escudero, 2005; van Leussen & Escudero, 2015) offers a complementary explanation by focusing on developmental reorganization. Learners begin from the Full Copying hypothesis: their initial L2 perceptual system is a replica of their L1 grammar, attuned to native acoustic and phonological patterns. With exposure to the L2, this copied grammar is gradually restructured, so that connections between acoustic input, phonetic representation, and lexical meaning become appropriate for the new system. In Italian, this entails learning to interpret duration as a property of the consonant itself rather than of the syllable or the preceding vowel. The strong improvement observed in the manipulated post-stress condition supports this account: once vowel duration was neutralized, learners focused more successfully on consonantal timing. This shift illustrates the reorganization of perceptual mappings predicted by L2LP and shows how learners progressively reshape their L1-based perception system into an L2-specific one. Moreover, because each learner’s perceptual grammar reflects their individual L1 and experience, differences in prior exposure, proficiency, and auditory sensitivity can lead to diverse developmental trajectories toward the same target contrast.
5.2 Limitations of the Study
The present study has several limitations that should be considered when interpreting the results. First, the experimental design focused exclusively on duration, although segmental length contrasts are known to be multidimensional and may involve additional phonetic and prosodic cues. Cross-linguistically, the implementation of quantity differs not only in whether duration is contrastive, but also in how durational patterns interact with vowel quality, stress, and syllable structure.
Relatedly, the inclusion of a vowel-duration manipulation necessarily introduced a degree of artificiality into the stimuli. In particular, stressed vowels in open syllables were manipulated in ways that do not correspond to canonical Italian realizations. While these items were intentionally designed as diagnostic probes of cue weighting rather than as models of Italian phonology, we acknowledge that such manipulations may deviate from listeners’ expectations and could have contributed to increased uncertainty or variability in performance.
A further limitation concerns factors that were not systematically examined in the present analyses. Frequency of consonant types and response-time patterns across different contrast types may play a role in perceptual performance. However, obtaining reliable cross-linguistic frequency estimates would require dedicated corpus-based analyses that go beyond the scope of the present study. Similarly, detailed response-time analyses by consonant category were not part of the original RQ and are left for future work.
Finally, the selection of test items was necessarily constrained, as the AX discrimination task formed part of a larger experimental project that also included an identification (perception) task, reading tasks, and short semi-directed interviews (Pešková, 2025). As a result, the set of pseudowords does not capture the full range of segmental and prosodic variation found in Italian. Nevertheless, the design ensured strict comparability across learner groups and the Italian control group, providing a robust baseline for cross-linguistic comparison. An additional advantage of this integrated design is that the perceptual results can later be directly related to production data collected from the same participants.
More generally, the study highlights the difficulty of categorizing languages into clear-cut typological classes based on the presence or absence of ‘length contrasts’. Production data from the learners’ L1s would be required to assess how durational patterns are realized across languages and how these realizations relate to perceptual strategies in L2 Italian. Addressing these issues represents an important direction for future research.
6 Conclusions
The present study yields several empirical findings on the perception of segmental length in L2 Italian. Perception of consonant length was strongly modulated by learners’ L1 background: learners from languages with phonemic consonant quantity (Finnish) achieved the highest accuracy, followed by learners with vowel quantity systems (Czech, Slovak), while learners from non-quantity (Spanish) or quantity-limited languages (German) showed the lowest overall performance. Perceptual accuracy further depended on prosodic position, with consonant-length contrasts being most reliably discriminated in post-stress conditions and least reliably in unstressed positions. Sensitivity to length also varied across consonant categories, with these patterns partly interacting with the presence or absence of corresponding segments in the learners’ L1 inventories. Finally, the factor Proficiency suggests a tendency toward higher perceptual accuracy among more advanced learners.
By contrast, durational manipulations of vowels did not yield stable group differences. These conditions were associated with longer response times and substantial individual variability across both learners and native controls, consistent with the non-contrastive status of vowel duration in Italian. Overall, the results indicate that perception of length in L2 Italian is shaped by an interaction of L1-specific phonological representations, prosodic structure, and segmental properties, rather than by sensitivity to duration alone as a uniform acoustic cue.
Taken together, these findings support a multidimensional view of L2 phonological acquisition in which perceptual sensitivity reflects both cross-linguistic influence and adaptation to the functional role of phonological cues in the target language. From a pedagogical perspective, the results point to the relevance of perceptual experience in acquiring segmental contrasts. While perceptual training alone cannot be assumed to guarantee accurate production, targeted attention to phonological distinctions may help learners establish more appropriate cue weighting in L2 perception. Nevertheless, the precise dynamics between perception and production remain an open question. Our future research will examine the relationship between perception and production of geminates to clarify whether the two domains develop symmetrically or asymmetrically and how targeted interventions can facilitate the integration of length contrasts into learners’ phonological systems.
Supplemental Material
sj-docx-1-las-10.1177_00238309261440514 – Supplemental material for A Cross-Language Comparison of Length Perception in Italian-Learning Adults
Supplemental material, sj-docx-1-las-10.1177_00238309261440514 for A Cross-Language Comparison of Length Perception in Italian-Learning Adults by Andrea Pešková and Kateřina Chládková in Language and Speech
Footnotes
Acknowledgements
We would like to thank the editors and the anonymous reviewers for their helpful comments and suggestions. We are also grateful to the speakers who participated in the study, and to colleagues in Berlin, Brno, Prague, Helsinki, and Salamanca for providing laboratory facilities and recording opportunities, as well as for their feedback at various stages of the project.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: DFG-project ‘Production and perception of geminate consonants in Italian as a foreign language: Czech, Finnish, German and Spanish learners in contrast’ is funded by the German Research Foundation (Project number 521229214).
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
