Abstract
We examined differences in speech timing between the Hadari and Bedouin Kuwaiti Arabic dialects using speech cycling. In our version of the paradigm, speakers repeated six-syllable phrases, with the start of each repetition aligned in time with a metronome beat. Previous speech cycling work finds that stressed vowel onsets tend to occur at harmonic phases (e.g., 1/2) within the Phrase Repetition Cycle (PRC), potentially reflecting coordination between prosodic units. Hadari has phonetically stronger stress contrast, with greater unstressed syllable reduction than Bedouin, which may afford closer alignment to harmonic phases. Six trochaic and six iambic sentences were read aloud by 18 Hadari and 18 Bedouin speakers at three metronome trial rates: slow, medium, and fast. The phases of the final (external) and medial (internal) stressed syllables—heavy, CVV(C), or light, CVC—relative to the PRC were analyzed. For the external phase, Hadari tended to align light syllables earlier in the PRC and more similarly to heavy syllables than Bedouin. Vowel duration analysis suggested that this pattern may be due to greater compression of preceding unstressed vowels in Hadari. This suggests that local timing patterns, in particular variation in durational stress contrast, modulate rhythmic coordination between prosodic units.
1 Introduction
We examine aspects of the durational realization of lexical stress in Kuwaiti Arabic Hadari (urban) and Bedouin (nomadic) dialects, two understudied dialects, particularly as regards their prosodic systems. The timing of stress beats in these dialects is explored via speech cycling (Cummins & Port, 1998; Tajima, 1999), a constrained task whereby speakers coordinate stress beats with metronomes. This task provides a means to study the effect of local timing cues, in particular, durationally realized stress contrast, on the coordination between different levels of the prosodic hierarchy.
We first provide an overview of studies of cross-linguistic and cross-dialectal variation in the durational marking of prominence and then consider the status and realization of lexical stress in Arabic dialects generally, before focusing on Hadari and Bedouin dialects of Kuwaiti Arabic, in particular. Finally, we review previous studies that have employed the “speech cycling” paradigm (Cummins & Port, 1998; Tajima, 1999). We use this time-constrained speech production task to examine how dialect-specific constraints on durational realization of stress in Hadari and Bedouin may influence surface timing patterns.
1.1 Durational correlates of prominence across languages and dialects
Speech timing is affected in various ways by the prosodic hierarchy, which refers to the hierarchical organization of nested speech constituents (e.g., Bolinger, 1965; Liberman, 1975). In particular, in some languages, prosodic heads (lexically stressed syllables, phrasally stressed words) and constituent edges may be lengthened (Price et al., 1991; Turk & Shattuck-Hufnagel, 2000, 2007; e.g., Fletcher, 2010, for a review). The magnitude of durational contrast between stressed and unstressed syllables is, however, highly variable cross-linguistically (e.g., Delattre, 1966). Moreover, some languages lack prominence levels altogether, at the lexical and/or phrasal levels (e.g., White, 2014, for a review).
It has been hypothesized that the degree of prominence-related durational contrast, in particular, stress contrast between strong and weak syllables, covaries with use of duration to cue higher prosodic structure. In languages like English and Dutch, with a strong durational contrast between stressed and unstressed syllables, the magnitude of accentual lengthening and phrase-final lengthening is also strong (White, 2014; White & Mattys, 2007b; Wightman et al., 1992). By contrast, in Spanish, which has relatively weak temporal stress contrast, pitch accents and phrase boundaries are not strongly marked by duration (Frota et al., 2007; Ortega-Llebaria & Prieto, 2007).
Studies of cross-linguistic timing differences showed that language-specific phonotactics and vowel reduction influence the degree of temporal stress contrast, with—for example—some Germanic languages having more complex syllable structures, longer vowels, and greater vowel reduction than some Romance languages (Dauer, 1983; Roach, 1982). For example (using the then extant but now widely discredited “rhythm class” terminology), so-called “stress-timed” languages were reported to have greater vocalic and consonantal interval standard deviations (ΔV and ΔC, respectively) than languages then described as “syllable-timed” (Dellwo & Wagner, 2003; Ramus et al., 1999; White & Mattys, 2007a).
Gradient temporal stress contrast also exists within languages. For instance, White and Mattys (2007b) found that Southern Standard British English (SSBE) utterances had lower proportions of vocalic (as opposed to intervocalic) intervals (%V) and higher vocalic interval standard deviation (ΔV) than Welsh Valleys English and Bristolian English, reflecting greater vowel reduction and stronger stressed vowel lengthening in SSBE. Hamdi et al. (2004) found that western Arabic dialects (Moroccan, Algerian, and Tunisian) had lower %V but higher ΔC than eastern dialects (Jordanian, Lebanese, and Egyptian). Such data do not necessarily relate directly to durational stress contrast, however, taking into account the tendency of western dialects to have overall shorter vowel durations, both stressed and unstressed, and more widespread occurrence of onset consonant clusters (Hamdi et al., 2005). Indeed, eastern dialects tend to have strong lengthening of stressed (long) vowels but with relatively weak unstressed vowel reduction (for Jordanian, see De Jong & Zawaydeh, 2002 and Vogel et al., 2017; for Lebanese, see Kelly Niamh, 2021).
The various degrees of durational stress contrast across languages and dialects have been suggested to arise from the interaction between hierarchical prosodic units. The coupled oscillators model (O’Dell & Nieminen, 1999) views timing differences between languages (and potentially between dialects) as arising from the mutual influence of hierarchically nested oscillators representing timing units. In languages with strong stress contrast, such as English, the oscillator representing inter-stress intervals is said to be more dominant than the syllable oscillator, while the opposite is true for languages with lower durational stress contrast, such as Spanish. Where the stress foot oscillator dominates, widespread compression effects should be observed, wherein increasing the number of syllables in inter-stress intervals leads to compensatory decreases in syllabic duration (Kim & Cole, 2005; Krivokapić, 2013; O’Dell & Nieminen, 2009). When the syllabic oscillator is dominant, increasing the number of syllables in the inter-stress interval should be associated with increases in inter-stress interval duration rather than with syllabic compression effects (see also Asu & Nolan, 2006).
This view on the timing interaction between prosodic levels has been criticized, however, based on a lack of empirical support for syllable compression effects, also referred to as polysyllabic shortening (e.g., White, 2014). As noted by White and Turk (2010), most studies that evidenced word-level or foot-level polysyllabic shortening (e.g., Nakatani et al., 1981; Port, 1981) only measured the duration of stressed syllables in phrasally accented words, e.g., “I say [target word] again.” White and Turk suggest that what appears as polysyllabic shortening may be the result of the sharing out of prosodic timing effects according to word length (cf. Beckman, 1992; Turk & White, 1999; White, 2002). In particular, pitch accents lengthen stressed syllables and, to a lesser extent, unstressed syllables (e.g., Turk & White, 1999, for English). Because stressed syllables in monosyllabic words receive all of the lengthening due to accent, they will be longer than the equivalent primary stressed syllable in bi- or trisyllabic words, where some of the accentual lengthening is distributed to the unstressed syllables. In non-accented words, by contrast, White and Turk report that stressed syllables do not shorten as a function of word length, although there may be positional effects (initial or final lengthening) on subconstituents of stressed syllables.
The contrasting claims of these views of speech timing (hierarchically interacting timing units vs. local lengthening effects) may be explored in various tasks that require responses to coordinate with stress beats. For example, Allen (1972a, 1972b) showed that English speakers tapped their fingers to align near stressed vocalic onsets of English utterances and also adjusted metronomes (in a related task) to similarly align with vocalic onsets (see also Rathcke et al., 2021, for English; Wagner et al., 2019, for German). Such entrainment, between listeners’ taps and stress beats, may be characterized by hierarchically coupled oscillators in which listeners’ taps are phase-locked to stress beats (cf. Wagner at al., 2012). Local timing effects may also play a role in such entrainment tasks; however, as the increased duration of stressed syllables, along with other acoustic cues, serves to boost the salience of stress beats (see Malisz et al., 2016, for theoretical modeling of converging aspects of the local and the hierarchical coupling view).
1.2 Lexical stress in Arabic dialects
Stress-based lengthening in Arabic dialects is influenced by phonological vowel length, which has a phonemic lexical function (McCarthy & Prince, 1990). De Jong and Zawaydeh (2002) showed that, in Jordanian Arabic, the difference in duration between stressed long vowels and unstressed long vowels was larger (28 ms, 19%) than the difference between stressed short and unstressed short vowels (10 ms, 16%), and the difference between stressed long vowels and unstressed short vowels was 98 ms, 66%. The greater stress-related lengthening of long vowels may relate to functional load effects: specifically, stress-related strong lengthening of stressed short vowels could obscure the phonemic contrast between short and long vowels (cf. Ahn, 2002). This is even more clearly demonstrated in Vogel et al. (2017), where Jordanian Arabic stressed short vowels were only 2 ms (4%) longer than unstressed short vowels. Similarly, for Palestinian Arabic, Kelly Niamh (2023) showed that only long vowels were lengthened when stressed, while stressed short vowels were only differentiated from unstressed vowels via other acoustic cues such as intensity. Likewise, Bruggeman (2018) found that, in Moroccan Arabic, duration did not differentiate between short stressed and unstressed vowels, taking this as indicative of the lack of lexical stress in Moroccan, and Bouchhioua (2008) found that duration was not a reliable correlate of stress for short vowels in Tunisian Arabic.
Thus, Arabic dialects tend to have low degrees of durational stress contrast, particularly between stressed short vowels and unstressed vowels, compared with, for example, Germanic languages, such as English and Dutch (cf. Dauer, 1983; Delattre, 1966).
1.3 Hadari and Bedouin Kuwaiti Arabic dialects
Bedouin speakers in Kuwait have origins in the Najd region in Saudi Arabia, and they belong to (originally) nomadic Arabian tribes (cf. Ingham, 1994). The Hadari dialect includes speakers that migrated from the central part of the Najd region but are described to be Hadari (i.e., sedentary rather than nomadic) even before migration. Other groups that constitute the Hadari community migrated from Iran, the eastern part of Saudi Arabia (ħasa), Bahrain, and Iraq (Holes, 2006). The two dialects have similar stress assignment rules, whereby stress is weight sensitive. Super-heavy syllables, CVVC/CVCC, attract stress, and if there are no super-heavy syllables, heavy syllables, CVV/CVC, receive stress. Typically, where there are two heavy syllables in the same word, for example, CVV.CVC, the heavy syllable with the long vowel, CVV, takes stress (see also Section 2.2). If two heavy syllables with similar structures occur in the same word, CVC.CVC, the penultimate receives stress. Unlike many other Arabic dialects (e.g., Egyptian or Levantine), Kuwaiti dialects, as well as some Gulf dialects, assign stress to the final heavy CVC in structures of the sequence CV.CVC (cf. Watson, 2011a, 2011b). Below, we provide examples of stress assignment in different word structures in Kuwaiti Arabic dialects.
There are, however, some structural differences between Hadari and Bedouin dialects. Bedouin dialects, in general, are said to be more conservative in their phonological features than Hadari, because the former preserve features from classical Arabic that have been subjected to phonological changes in Hadari (Abu-Haider, 2006). Based on differences in phonological properties, different degrees of temporal stress contrast are expected. The Kuwaiti Hadari dialect allows long syllables to surface word-medially (Examples 1.1; 2.1), CVCC and CVVC, while the Bedouin dialect tends to restructure these syllables with epenthesis (Examples 1.2; 2.2; cf. Versteegh, 2001); hence, temporal contrast may be stronger in Hadari than in Bedouin. (For brevity, from this point, we will use “Hadari” and “Bedouin” to refer just to the Kuwaiti versions of these dialects; other Hadari or Bedouin dialects may not share the timing features discussed.)
1.darb + na “our road” 1.1. Hadari: ˈdarb.na 1.2. Bedouin: ˈdar.ba.na 2.beet + na “our house” 2.1. Hadari: ˈbeet.na 2.2. Bedouin: ˈbee.ta.na
Also, in Bedouin, unstressed open syllables are usually produced with a full low vowel /a/ (Examples 3.2; 4.2; cf. Ingham, 1994), while these vowels are more centralized in Hadari (Examples 3.1; 4.1; cf. Holes, 2006).
3. xa.latˤ “he mixed” 3.1. Hadari: xə.ˈ latˤ 3.2. Bedouin: xa.ˈlatˤ 4.ga.lab “he overturned: 4.1. Hadari: gə.ˈlab 4.2. Bedouin: ga.ˈlab
Both of these differences are expected to lead to stronger durational stress contrast in Hadari than Bedouin, given that Hadari tends to have a more widespread occurrence of heavy stressed syllables and tends to reduce unstressed syllables.
1.4 Speech cycling
Speech cycling is an experimental paradigm developed by Cummins and Port (1998) and Tajima (1999) to investigate temporal coordination of hierarchically nested speech units.
In speech cycling, speakers repeat a phrase together with metronome beeps. The interval between repetitions is called the phrase repetition cycle (PRC). It was shown that acoustically salient points, that is, vowel onsets of strong syllables, tend to lie at certain privileged phases within the PRC (Figure 1). These phases divide the PRC into simple integer ratios (“rhythmic modes”), such as 1/3, 1/2, and 2/3, that reflect a certain metrical structure within the PRC. The simple phase angles, also known as harmonic phases, are considered as attractors to prominences in the PRC that emerge from task-specific demands, particularly, repeating utterances at a constant period. Importantly, the alignment of stress beats at simple phases reflects a hierarchical structure, wherein the lower-level units, that is, stress beats, are constrained to lie at privileged time intervals within the higher-level units, that is, the PRC (Cummins & Port, 1998).

Schematic representation of speech cycling task. Interval a, defined as the interval from the first (Phase 0) to the final (here, second) stress beat, is divided by interval b—the phrase repetition cycle—to calculate the phase angle of the final stress.
Languages have been found to vary in the alignment of prominent syllables with simple phases within the PRC. For instance, Cummins (2002) asked English, Spanish, and Italian speakers to read sentences with two lexically stressed syllables, each followed by an unstressed syllable, for example, “MANning the MIDdle,” and to align, that is, produce, the first stressed syllable to a high-tone beep and the second stressed syllable to a low-tone beep, the timing of which was varied, so that it occurred at different phases within the cycle defined by the high-tone beep. English speakers found the task easy to perform and showed close and consistent alignment with simple harmonic phases. On the contrary, Italian and Spanish speakers found it more difficult, even after more than 30 min of practice: the realized phase alignments were not close to simple harmonic phase angles; moreover, speakers exhibited extreme variation in their alignment. Cummins suggested that the more consistent performance of English speakers was due to (what he characterizes as) the relevance of stress feet as a unit in speech timing in English, by contrast with Italian and Spanish.
Another possible explanation for these reported cross-linguistic differences may lie in variation in the magnitude of durational contrast between strong and weak syllables. English has stronger stress contrast, due to substantial stress-based lengthening and vowel reduction, than Spanish and Italian (e.g., Oller, 1973; White & Mattys, 2007a). The repetition of metronomes and the placement at simple phases, for example, 1/2, lead to the emergence of prominence attractors (Cummins & Port, 1998). Therefore, the close alignment of stressed syllables to metronome beats is more natural in a language like English, where stress is salient due to strong durational contrast with unstressed syllables. On the contrary, in Spanish and Italian, the lower stress contrast tends to make prominences (stressed syllables) less acoustically salient and thus implies less compelling coordination of prominences with attractors.
In a speech cycling study of Medumba, a Grassfields Bantu language, Franich (2021) showed that phrasally accented syllables, which are associated with greater lengthening, were more closely aligned with simple phase angles than non-accented syllables. This timing pattern in Medumba supports the notion that more prominent (hence more perceptually salient) syllables afford closer coordination with external metrical attractors.
Another factor that may influence speakers’ performances in speech cycling is phonetic compressibility. In an unconstrained version of speech cycling, where speakers repeat sentences with metronomes at the beginning of phrases only, Tajima (1999) examined how phase alignment in English and Japanese was affected by manipulation of metronome rate, from slow to fast. English speakers demonstrated consistent alignment of stressed vowel onsets with simple phase angles, for example, 1/2, across different metronome rates, while Japanese speakers’ alignment showed incremental changes to distinct rhythmic modes as the metronome rate increased. The more consistent phase alignment in English may be facilitated through greater relative tolerance of unstressed syllable compression with increasing metronome rate. Thus, phonetic compressibility is also influenced by the magnitude of durational stress contrast. Here, we tentatively characterize two somewhat separable influences of durational stress contrast: first, the “top-down” effect of greater phonetic salience of stressed syllables, which increases the affordance for alignment of stresses at harmonic phase angles, and second, the “bottom-up” effect of increased unstressed syllable reduction, allowing more compressibility of segments toward more flexible alignment of stressed syllables with harmonic phase angles.
Zawaydeh et al. (2002) compared the speech cycling performance of speakers of American English to those of Jordanian Arabic. English speakers tended to align stressed syllables closer to a simple phase of 1/2 than Arabic speakers (who tended to later alignment). As discussed in Section 1.2, vowel reduction in Jordanian Arabic is relatively circumscribed (e.g., Vogel et al., 2017), which may influence this later stressed syllable alignment. Thus, this study appears to accord with our interpretation of one mechanism (i.e., variation in unstressed syllable compression) through which stress realization may affect phase alignment.
It should be noted that even languages that lack lexical stress may exhibit temporal coordination patterns with attractors. Thus, Chung and Arvaniti (2013) showed that within accentual phrases in Korean (which lacks lexical stress), initial syllables were consistently aligned at simple phase angles. Also, Medumba lacks lexical stress, but Franich (2021) showed that phrasally accented Medumba syllables were more closely aligned with simple phase angles than non-accented syllables.
1.5 Aims and motivation of the study
The aims of this study are two-fold:
To compare patterns of phase alignment (i.e., the temporal coordination of stressed syllables) in speech cycling in Hadari and Bedouin Kuwaiti Arabic dialects.
To investigate how the durational realization of lexical stress contrast may influence the observed variations in phase alignment between dialects.
Here, we elicit spoken utterances using speech cycling, which has been suggested to be informative as regards the timing influences arising from hierarchical prosodic units. We also consider how cross-dialectal differences in the realization of local prominence (specifically, lexical stress) influence the observed alignment of syllables within externally imposed PRCs. Cross-dialectal comparisons using speech cycling ought to be more controlled than cross-linguistic studies, as dialects (such as Kuwaiti Hadari and Bedouin) tend to be similar in many structural aspects of speech (morphology, lexicon, syntax, etc.), thereby allowing the same phrases to be spoken across dialects and allowing more focus on the impact of the phonetics of lexical stress (cf. Smith & Rathcke, 2020). The differences between Hadari and Bedouin Kuwaiti dialects in temporal stress contrast, as demonstrated in Section 1.3 using similar syllable structures, motivate their use to study how stress realization differences affect the temporal organization of hierarchical structures.
Speech cycling potentially offers a means to explicitly examine the nature and interaction of language-specific or dialect-specific prosodic timing patterns. In such tasks, speakers are required to coordinate their repeated utterances with explicit external phase cycles of variable duration. Thus, the relative strength and significance of different timing constraints are tested; for example, is strong temporal marking of lexically stressed syllables preserved at the expense of compression of unstressed syllables? As we report, such enforced speech entrainment does indeed show evidence of differential timing constraints between Hadari and Bedouin Kuwaiti dialects, such as might be less evident in temporally unconstrained elicitation paradigms.
1.6 Predictions
As discussed in Section 1.4, there are various factors that would influence the alignment of stressed syllables to simple phases within the PRC in this task, including dialectal variation in both phonological vowel length and tolerance of syllabic compression under speech rate pressure. Moreover, several factors in the structure of the spoken Arabic phrases in our recordings might be expected to influence Hadari and Bedouin speakers’ external and internal phase alignment in the speech cycling task, notably syllable weight, stress pattern, and metronome period. These are detailed below.
1.6.1 Syllable weight
The predicted effects associated with syllable weight are top-down in nature. Both dialects are expected to preferentially align stressed heavy syllables closer to simple phase angles than stressed light syllables, because the phonological length of stressed, heavy syllables provides a greater affordance—via beat salience—for simple phase alignment. In addition, there is a greater tendency for unstressed vowel reduction in Hadari, which thus has stronger stress contrast than Bedouin. Thus, even stressed light syllables are relatively salient, compared with unstressed syllables, in Hadari: this salience contrast should also provide stronger affordance for simple phase alignments than in Bedouin.
1.6.2 Stress pattern
Trochaic phrases (i.e., phrases composed of three disyllabic SW words) are expected to facilitate stressed syllable alignment to simple phases more than iambic phrases (i.e., phrases composed of three disyllabic WS words), as phrase-initial stressed vowel onsets can easily be aligned with the metronome beep signaling the start of the cycle, that is, Phase 0. By contrast, in the iambic pattern, phrase-initial unstressed syllables naturally intervene between the metronome beep and the initial stressed syllable. Thus, a delay in Phase 0 (corresponding to the start of the cycle) is predicted for the iambic pattern, which would, other things being equal, also cause later alignment for medial and final stressed syllables (and thus a shift from simple phase angles). This trochaic versus iambic difference is expected to have less effect on phase alignment in Hadari than in Bedouin, because greater unstressed syllable compressibility in Hadari should facilitate closer alignment of the first stressed syllables in iambic phrases with metronome beeps, with consequently less impact on later alignment of stressed syllables with simple phase angles.
1.6.3 Metronome period
As metronome period length decreases, speaking rate is likely to increase, which potentially disrupts the temporal alignment of syllables. The availability of unstressed vowel reduction in Hadari is predicted to make compression more tolerable, thus allowing more stable phase angles across different metronome periods. In contrast, the limited vowel reduction in Bedouin should afford less compression at shorter metronome periods (i.e., faster speech rates), so the phase alignment of medial and final stressed syllables in Bedouin may be later relative to the predicted simple phase angles at slower, less constrained speech rates.
2 Method
To investigate possible hierarchical timing differences between Hadari and Bedouin Kuwaiti dialects, we used a “non-targeted” speech cycling paradigm (Tajima, 1999), wherein speakers align the start of each of a succession of short phrases with metronome beeps. We avoided the use of “targeted” speech cycling, where speakers align stresses with high and low metronome beeps (Cummins & Port, 1998), because it was found in previous experiments that the latter task can be challenging for speakers of languages with low stress contrast (Cummins, 2002). Given that Arabic dialects in general have lower degrees of stress contrast compared with English, we therefore used a non-targeted speech cycling task.
We first conducted a pilot study to find out whether speakers could comfortably adapt to the task demands. The participants in the pilot study were three Hadari and three Bedouin male speakers. The Arabic trial phrases used were each composed of six syllables. Each trial featured 12 metronome beeps at a constant rate. The participants were instructed simply to listen to the first four metronome beeps at the beginning of the trial, to familiarize themselves with the speed of metronomes. From the fifth beep onwards, they had to start repeating the trial phrase on the fifth beep and stop when no more beeps were heard. There were ten trials, which varied in the metronome period, with the first and longest being 1,800 ms and the last and shortest being 963 ms. The metronome period of each following trial was 93% of the previous one (following Tajima, 1999). Speakers were instructed to repeat phrases in each trial on a single breath, thus not taking a breath between repetitions, as this has been found to bias timing toward certain phases (Cummins & Port, 1998; Tajima, 1999) and may not reflect prosodic constraints on phase production. Speakers were told that if they needed to take a breath during a trial, they should skip a repetition cycle and continue with the next metronome cycle. Although speakers were able to repeat phrases in a single trial without inserting breaths between repetitions, there were significant disfluencies in their productions, especially at the shorter metronome periods. We therefore decided to use three comfortable metronome periods: “long” at 1,800 ms, “normal” at 1,512 ms, and “short” at 1,270 ms. Speakers in the pilot produced fewer disfluencies when repeating sentences at these metronome periods.
It is also possible that using fewer metronome periods—three, compared with the fourteen metronome periods in Tajima (1999)—across a narrower range of rates may allow for a more controlled comparison between Hadari and Bedouin dialects. A greater number of metronome periods may lead to more variability in rhythmic modes (e.g., 1/3, 1/2, 2/3, as observed in Tajima, 1999), whereas fewer metronome periods may promote a single distinct rhythmic mode (e.g., just 1/3), thus focusing the comparison of Hadari and Bedouin speakers’ alignment to this mode.
2.1 Participants and recordings
Recordings took place in a soundproofed room in the media lab at Kuwait University. Speakers were all students at Kuwait University and participated voluntarily. We recorded 23 Bedouin speakers and 22 Hadari speakers, all male speakers between 21 and 28 years old. Speakers were monodialectal, typically with elementary or intermediate knowledge of English. Participants completed a questionnaire about their dialectal background. All Bedouin speakers belonged to two Bedouin tribes: “Al-Shammari” and “Al-Enzi.” Most of them lived in “Al-Jahra” city, which is located to the north of Kuwait City and where the majority of the population are Bedouin. Hadari participants originated from Iraq, Iran, and Hasa (the eastern part of Saudi Arabia).
Five Bedouin speakers’ recordings were excluded from further analysis when the first author, who made the recordings, judged that the accents were not typical of the self-declared types: specifically, they seemed to have Hadari accents; hence, they were excluded from the study. To have an equal number of speakers across the two dialects, 18 per dialect, we excluded four Hadari speakers with relatively high instances of disfluencies in their productions.
Participants read aloud sentences presented on a computer screen and were recorded using a Zoom H4 recorder (sampling rate 44,100 Hz, quantization 16 bit) and an AKG D5 stand-mounted dynamic microphone. Metronome beeps were presented via Sennheiser headphones from a MacBook Pro laptop. The metronome beeps were generated using Praat (Boersma & Weenink, 2018) via a script (Quenè (2002): https://www.hugoquene.nl/tools/index.html) adapted by the first author for this task. Each beep was a 400-Hz pure tone.
2.2 Materials
Participants read 12 sentences, each composed of three disyllabic words. Six sentences contained three words with a trochaic stress pattern, and six contained three words with an iambic stress pattern. Table 1 shows the sentences used in the study, with typical Bedouin pronunciation shown, as that is more canonical and does not feature reduction of unstressed vowels. All sentences had the same syntactic structure, that is, verb-subject-object (VSO), since the use of multiple syntactic structures may induce variability in prosodic boundary realization (Price et al., 1991) and hence introduce an uncontrolled influence on timing. Stressed syllables had CVC and CVV(C) structures. CVV(C) syllables are here termed “stressed heavy,” while CVC will be termed “stressed light.” While it is unconventional for CVC to be called “light,” this was done purposefully to differentiate stressed CVC syllables from stressed CVV(C) for two reasons: (a) syllables with long vowels are potentially more prominent than CVC, due to greater vowel lengthening and greater contrast with unstressed syllables and (b) in Kuwaiti dialects and various other Arabic dialects, when CVC and CVV syllables are in the same word, CVV receives stress (Watson, 2011a, 2011b). Syllables with a medial geminate consonant, CVG, were also labeled as “stressed light,” since the geminate consonant behaves as both a coda and an onset for the following syllable; thus, regarding syllable weight, CVG is comparable to CVC.
Read Sentences for the Speech Cycling Task, With the Syllable Structure of Each Word Shown (G Indicating a Geminate Consonant).
Stressed syllables in bold are heavy.
2.3 Experimental procedure
The participants produced the test materials according to a protocol that was finalized (particularly regarding the chosen metronome rates) after the pilot study (see Section 2).
Each participant produced eight repetitions of 12 sentences at three metronome periods: slow 1,800 ms, medium 1,512 ms, and fast 1,270 ms. Half of the participants started with the trochaic sentences and then read the iambic sentences, at a single metronome rate, while the other half started with the iambic sentences. Each participant completed the task in approximately 20 min. Before the experimental recordings, the participant had a training session, repeating two sentences, one trochaic and one iambic (not from Table 1, but with a similar structure), at the middle metronome rate. After this training session, lasting around 2 min on average, and when they reported confident understanding of the task, participants started the experimental recordings. One continuous recording was made for each speaker for all trials, from the longest to the shortest metronome period.
The first and final (8th) repetitions were excluded from analyses to avoid potential boundary effects. Cycles during which speakers inserted a breath were also excluded. There were some instances where speakers inserted an intrusive vowel: for example, in the sentence, “soˈbag naˈbe:l ʕaˈbi:r”, some speakers inserted a vowel after the second syllable in “soˈbag[i].” Such cases were excluded from the analysis, so that the analyzed productions conform with the syllable structures shown in Table 1. As regards stress assignment, speakers’ productions were monitored during the recordings to check for errors, including non-standard stress placement. No auditory checks were made subsequently for stress assignment, which is highly predictable given the utterance meters and syllable structures and with materials having been informally piloted—as deemed appropriate by a native speaker—to ensure between-speaker agreement on stress placement. The total number of tokens, that is, cycles, that were analyzed for the external phase measure was 5,390 (Bedouin: 2553; Hadari: 2522), and for the internal phase measure was 5,118 (Bedouin: 2619; Hadari: 2499). Note that the disparity between the number of analyzed tokens in the external and internal phase measures is due to a procedural discrepancy in the measurement and extraction of the external and internal phase data. The boundaries of the internal phase were determined and extracted in a second round of measurements, with the external phase boundary as the reference landmark (external phase boundaries were added manually after segmenting vowels onsets; see Section 2.4). In some cases, due to a processing oversight, the first repetition of a phrase within a cycle was not extracted for the internal phase measurement, although it had been included in the external phase data, hence the lower numbers of internal phase measurements. The exclusion of a small subset of the internal phase data was not systematic with respect to the experimental design, and, as the external phase and internal phase measures are analyzed independently, the different numbers do not have a material consequence for the interpretation of our results.
2.4 Data preparation and measurements
Repetitions of each phrase from each metronome period trial were marked with Praat TextGrid boundaries. We then used a Praat script (Arantes (2018): https://github.com/parantes/slicer) to extract repetitions of each phrase at each metronome rate into separate audio files.
The recorded phrases were (initially) automatically annotated via the BeatExtractor Praat script (Barbosa, 2003), which implements Cummins and Port’s (1998) Beat Extractor algorithm, with some modifications. BeatExtractor determines vowel onsets by detecting a certain energy threshold in the low-pass filtered energy envelope. The script provides reasonable detection of vowel onsets when the preceding consonant is an oral stop or nasal; however, vowel onset locations were manually modified, to some extent, after the automatically determined placement on most occasions. Moreover, the script added spurious boundaries for fricatives and approximants, which were removed.
All automatically determined vowel onset boundaries were checked manually by the first author, according to standard segmentation procedures (e.g., Turk et al., 2006). The PRC was determined as the span from the first stressed vowel onset of a phrase to the first stressed vowel onset of the following repetition of the same phrase (Figure 2).

Example annotation of a phrase repetition cycle (here “C6”) from the sentence “
2.5 Phase measurements
Two-phase measurements, shown in Figure 3, were computed:
External Phase: The duration from the first beat, that is, the vowel onset of the first stressed syllable within a phrase, to the vowel onset of the last stressed syllable divided by the duration of the whole cycle. This measure defines the phase of the last stressed syllable.
Internal Phase: The duration from the vowel onset of the first stressed syllable to the vowel onset of the second stressed syllable and divided by the duration from the vowel onset of the first stressed syllable to the vowel onset of the third (final) stressed syllable (the latter marked on the last tier in Praat textgrid, as in Figure 2). This measure defines the phase of the medial stressed syllable.

Schematic representation of the computation of the external and internal phases. The external phase is b/c. The internal phase is a/b.
2.6 Vowel duration measurements
From the sound files that included multiple repetitions of a single phrase, each phrase repetition was extracted into a separate sound file for vowel duration analysis. Boundaries for the six vowels in each sentence (repetition) were added, in Praat, based on the segmentation criteria described earlier. Vowel duration was extracted using a custom-made Praat script. The total number of vowels analyzed was 36,665 (Bedouin: 19,513; Hadari: 17,152). Note that, to maximize the size of the vowel duration dataset, we measured vowels from some utterances that were excluded from the phase analysis. These exclusions were due to factors that may have a substantive influence on phrase measures but little or no effect on vowel duration data: in particular, we took vowel duration data from utterances that included breaths and also from the first and final repetitions in a set.
2.7 Statistical analysis
2.7.1 Statistical models for the phase measures
We constructed separate linear mixed-effects models for each phase measure (external and internal). There were four predictors: dialect (Bedouin vs. Hadari), stress pattern (iambic vs. trochaic), syllable weight (heavy vs. light), and metronome period (slow vs. medium vs. fast), which was treated as a continuous predictor. As there were expectations of variable influences, according to dialect, of the latter three predictors (see Section 1.6), two-way interactions between dialect and stress pattern, dialect and syllable weight, and dialect and metronome period were included in the model.
All categorical variables (dialects, weight, stress pattern) were sum-coded, and the continuous variable (metronome period) was centered; thus, the intercept represents the grand mean phase ratio across all predictors.
In defining the random structure of the model, we initially included speaker and sentence as random intercepts, with by-speaker random slopes for stress pattern, syllable weight, and metronome period and by-sentence random slopes for dialect and metronome. In the external phase analysis, the model did not converge when it included the by-sentence random slope for dialect; thus, this was removed.
The structure of the external phase statistical model is thus:
For the internal phase measure, the model converged with the maximal random structure:
We conducted likelihood ratio tests to test the significance of each of our predictors as well as the interaction terms, comparing the full model to the nested models with single terms dropped for each comparison. Likelihood ratio tests were conducted using the afex package (Singmann et al., 2015) in R software (R Core Team, 2024). Pairwise comparisons, through by-subject two-tailed t-tests, with the Bonferroni correction for multiple comparisons, were conducted using the R package phia (Rosario-Martinez et al., 2015).
2.7.2 Statistical models for the vowel duration measures
A linear mixed-effects model was fitted to the data, with vowel duration in milliseconds as the dependent variable. There were five predictors: dialect (Hadari vs. Bedouin), stress pattern (iambic vs. trochaic), vowel stress (stressed heavy, i.e., (long) vs. stressed light, i.e., (short) vs. unstressed), phrasal position (initial vs. medial vs. final) and metronome period as a continuous variable. Phrasal position was considered as a predictor due to the potential phrasal position effect on vowel duration. For example, lengthening due to phrasal accents on phrase-initial words may occur, since they align with metronomes, which signal the start of the cycle, and phrase-final words may lengthen due to boundary adjacency.
Two-way interactions between dialect and the other four predictors were included in the model. Because phrasal-accent lengthening effects could vary based on vowel stress (cf. White & Turk, 2010), we included a two-way interaction between position and vowel stress and a three-way interaction between dialect, position and vowel stress.
Speaker, sentence, and vowel type (/a/, /e/, /o/, /i/) were included as random intercepts. There were by-speaker random slopes for metronome period, vowel stress, and stress pattern. The model did not converge with a more complex structure:
3 Results
3.1 Results for external phase analyses
The estimated value of the intercept, at the mean values of all predictors, is 0.445. This is shown in the density plot of the external phase ratios (Figure 4). This value is close to a 0.5 external phase ratio, reflecting a 1/2 rhythmic mode in the PRC, that is, a structure of four beats from phrases made of three stresses, plus a fourth silent beat.

Kernel density estimation, y-axis, of phase ratios, x-axis. The dashed line represents the value of the intercept, which is 0.445.
There was no effect of dialect, χ2(1) = 0.12, p = .272, or syllable weight on external phase ratios, χ2(1) = 0.62, p = .430.
There was an effect of stress pattern on external phase ratio, χ2(1) = 4.58, p = .030, β = −0.009, SE = 0.004. Figure 5 shows the model’s prediction for the trochaic external phase, 0.452, and the iambic external phase, 0.436, illustrating that final stressed vowel onsets in trochaic sentences are aligned later than in the iambic sentences.

External phase ratios for iambic and trochaic phrases. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations.
The effect of metronome period on external phase ratios was significant, χ2(1) = 65.16, p < .001, β = 0.038, SE = 0.002. Figure 6 shows the model’s predicted effects of metronome period on external phase: the shortest period is 0.483, the medium period is 0.446, and the longest period is 0.408. Thus, final stressed vowel alignment is closer to the simple phase of 0.5 (and later in the cycle) at shorter metronome periods (and hence faster speech rates).

Metronome period effects on the external phase ratio. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations. Means and medians here typically coincide; hence, dotted lines are not usually visible. The fitted regression line is shown in blue.
For the external phase, the likelihood ratio tests showed that there were no two-way interactions between dialect and stress pattern, χ2(1) = 0.56, p = .454, nor between dialect and metronome period, χ2(1) = 1.36, p = .244. There was an interaction between dialect and syllable weight, χ2(1) = 4.43, p < .035. As shown in Figure 7, Hadari stressed light and stressed heavy syllables aligned similarly, while in Bedouin, there was later alignment of stressed light syllables in the PRC compared with stressed heavy syllables.

External phase ratio by dialect and syllable weight. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations.
In post hoc analyses exploring the interaction, there were no differences in phase ratio between stressed light and stressed heavy syllables within dialects (Bedouin and Hadari both p > .05). Comparing the magnitude of the within-dialect differences in phase ratios (stressed light vs. stressed heavy syllables), however, showed that the greater difference in Bedouin was indeed statistically robust (see Table 2).
External Phase Ratios Differences Between Stressed Light and Stressed Heavy Syllables in Bedouin and Hadari.
The later alignment of stressed light syllables in Bedouin, compared with Hadari, may reflect differences in unstressed vowel reduction. Thus, less temporal compression of unstressed vowels in Bedouin would lead to later alignment of subsequent stressed light syllables in the PRC than in Hadari. An alternative (or possibly complementary) explanation is that the similar alignment of stressed light and stressed heavy syllables in Hadari could reflect a top-down effect: given greater unstressed vowel reduction in Hadari, stressed light syllables may be conferred greater relative prominence and thus tend to align similarly to heavy syllables.
3.2 Results for internal phase analyses
The internal phase relates to the temporal ratio between vowel onsets of medial (second) stressed syllables and the interval between initial and final (third) stressed syllables within the phrase. The model estimate of the intercept value is 0.503; thus, vowel onsets of medial stressed syllables tend to lie in the middle of the whole, with a harmonic phase angle of 1/2. The intercept value is represented in the density plot of the phase ratios in Figure 8.

Kernel density estimation, y-axis, of internal phase ratios, x-axis. The dashed line represents the value of the intercept, which is 0.503.
There was no effect of dialect, χ2(1) = 0.13 p = .721, or syllable weight, χ2(1) = 2.87, p = .090, on the internal phase.
There was a significant effect of stress pattern on internal phase ratio, χ2(1) = 5.71, p = .017, with β = −0.017 and SE = 0.006. The predicted internal phase ratio for trochaic phrases is 0.520, and for iambic phrases, it is 0.486 (Figure 9).

Internal phase ratio for iambic and trochaic phrases. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations.
There was an effect of metronome period on internal phase ratio, χ2(1) = 13.80, p = .001, with β = −0.0047 and SE = 0.001. The predicted internal phase for the shortest metronome period is 0.498, for the medium period is 0.503, and for the longest period is 0.507, indicating that phase alignment is closer to the simple phase 0.5 (and earlier in the cycle) at shorter metronome periods (Figure 10).

Metronome period effect on the internal phase ratio. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations. The fitted regression line is shown in blue.
There were no two-way interactions between dialect and syllable weight, χ2(1) = 3.10, p = .078, or between dialect and stress pattern, χ2(1) = 1.37, p = .242, or dialect and metronome period, χ2(1) = 0.78, p = .376.
3.3 Discussion of phase ratio results
Two main factors influenced both external phase ratios and internal phase ratios. First, the stress pattern of the phrases (trochaic vs. iambic): both external and internal phase alignments were earlier in the cycle for the iambic phrases than the trochaic phrases. We had predicted that, because the first stressed syllables in iambic phrase are separated from the metronome beep by unstressed syllables, there would be a shift toward later phase angles, but our results indicate the opposite alignment. This unexpected result may reflect differences in text materials rather than prosodic timing constraints, however. Thus, unstressed syllables in iambic phrase have a CV structure, but unstressed syllables in trochaic phrases have a CVC structure. This durational difference, other things being equal, would lead to shorter overall durations of iambic phrases (vs. trochaic phrases), and thus, stressed syllables in iambic phrase would occur earlier in the cycle. Figure 11 shows a schematic of text material effects on phase alignment in the iambic and trochaic patterns.

Illustration of phase alignment in iambic phrases (left) and trochaic phrases (right), where simpler unstressed syllables in the former lead to earlier phase alignment in the cycle, as indicated by the blue arrows.
It should be noted, in passing, that a number of Arabic dialects assign stress on the initial syllable in CV.CVC structures (Watson, 2011a, 2011b). This is not the norm in Kuwaiti and other Gulf dialects (see Section 1.3); however, there is a possibility that some speakers of Bedouin and Hadari Kuwaiti dialects, influenced by stress assignment in other Arabic dialects, may produce stress on the initial syllable in CV.CVC structure. In the current study, this would naturally have an effect on the phase measurements reported earlier, as initial CV syllables would be attracted to simple phases rather than final CVC syllables. We are confident, however, that any such influence of dialect contact is minimal, at most, in our data, given the careful piloting of materials with native speakers of Hadari and Bedouin, as well as the monitoring of the speakers’ production by the experimenter during the recordings.
Second, the metronome period influenced phase alignment: for both external and internal phase ratios, alignment was closer to a simple phase of 0.5 (and later in the cycle) at shorter metronome periods. As shorter metronome periods are associated with faster speaking rates, it may be the case that there is a preferred speaking rate for temporal coordination with a 0.5 phase. It seems that as speakers are forced to speak more rapidly at shorter metronome periods, temporal variation reduces and speaking mode becomes more rhythmic, that is, speakers tend to place stress beats more harmonically “evenly” at 0.5 phase. Tajima’s (1999) study also suggests that speaking rate mediates temporal coordination, as stable rhythmic modes varied across various metronome rates, at least for Japanese. In Tajima’s study, however, there were distinct rhythmic modes at different rates, while in our Kuwaiti data, there was a single rhythmic mode with faster rates encouraging closer alignment with the 0.5 phase. This difference is probably due to the number of metronome periods and the magnitude of the differences between them: in Tajima’s study, there were between 10 and 14 rates, while we had three only.
Dialectal differences were only found for external phase measures; notably, there was a two-way interaction between dialect and syllable weight. Thus, Hadari had smaller differences in the phase alignments of stressed heavy, CVV(C), and stressed light, CVC, syllables than Bedouin. Specifically, Bedouin (compared with Hadari) exhibited later alignment of stressed light syllables, which were therefore more similar to stressed heavy syllables. We suggest that stressed light syllables align later in the PRC in Bedouin due to preceding unstressed syllables being less compressed than in Hadari. Conversely, as pointed out in Section 3.1, the greater reduction of unstressed syllables in Hadari may also boost the relative prominence of stressed light syllables, thereby making them behave more like stressed heavy syllables (i.e., in terms of their propensity for rhythmic alignment).
3.4 Results for vowel duration analyses
To further understand the impact of dialectal differences in segment timing on the observed patterns of external and internal phase alignments, we examined vowel duration, considering—in addition to dialect—the effects of vowel stress (stressed heavy, stressed light, unstressed), stress pattern (iambic vs. trochaic), and metronome rate.
In our mixed-effects linear regression model, the intercept value, representing the grand mean of vowel duration, was 74 ms. There was no effect of dialect, χ2(1) = 0.07, p = .785. There was an effect of stress pattern (iambic vs. trochaic), χ2(1) = 4.00, p = .046, β = 2.04 ms, and SE = 0.94 ms, although the differences in mean vowel duration between the two stress patterns are small: iambic 76 ms versus trochaic 72 ms. The effect of the metronome period was significant, χ2(1) = 60.00, p = .001, with β = −4.86 ms and SE = 0.46 ms. The predicted mean duration for the slow metronome period is 78 ms; for the medium period, it is 74 ms; and for the fast period it, is 69 ms. Figure 12 shows decreasing vowel duration as the metronome period shortens.

Vowel duration as a function of metronome period. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations. The fitted regression line is shown in blue.
The effect of vowel stress was significant, χ2(2) = 132.52, p < .001, with β = 34 ms and SE = 0.88 ms for stressed heavy vowels relative to unstressed vowels, and β = −11 m and SE = 0.45 ms for stressed light vowels relative to unstressed vowels. The predicted mean for stressed heavy vowels was 109 ms, χ2(1) = 1,639.14, SE = 1.45, p < .001; for stressed light vowels, 62 ms, χ2(1) = 453.29, SE = .513, p < .001; and for unstressed vowels, the reference level for pairwise statistics, 51 ms (see Figure 13 for actual vowel duration distributions according to stress). Of course, as stressed heavy vowels are phonologically long and the vowels in stressed light and unstressed syllables are phonologically short, the greater duration of the former is attributable to a combination of vowel length and lexical stress.

Duration by vowel stress. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations.
The effect of phrasal position was significant, χ2(2) = 1,624.42, p < .001, with β = −1.01 ms and SE = 0.14 ms for the initial position relative to the final position, and β = −4.26 ms and SE = 0.13 ms for the medial position relative to the final position. Predicted mean vowel durations are, for the initial position, 73 ms, χ2(1) = 611.90, SE = .253 p < .001; for medial position, 70 ms, χ2(1) = 1,645.76, SE = .234 p < .001; and for the final position, 80 ms, the reference level. (Figure 14 shows predicted vowel duration distributions by position.)

Vowel duration by phrasal position. Within boxes: dotted lines represent the means, solid lines represent the medians, and whiskers represent the standard deviations.
The increased duration of phrase-final vowels is likely due to phrase-final lengthening (Turk & Shattuck-Hufnagel, 2000). Although vowels in phrase-initial position (73 ms) are longer than in phrase-medial position (70 ms), the differences are too small to assume accentual lengthening effects in phrase-initial position.
However, as accentual lengthening may vary depending on vowel stress, the two-way interaction between position and vowel stress, χ2(4) = 768.10, p < .001, indicates potential accentual effects in phrase-initial position (Figure 15). The contrast between stressed heavy syllables and unstressed syllables (108 − 49 = 59 ms) phrase initially, SE = .690, is higher than phrase medially (100 − 51 = 49 ms), SE = .561, χ2(2) = 757.750, p < .001, and the contrast between stressed light syllables and unstressed syllables phrase initially (62 − 49 = 13 ms), SE = .508 is higher than phrase medially (59 − 51 = 8 ms), SE = .538, χ2(2) = 113.99, p < .001.

Mean vowel duration by position and stress.
A clearer exploration of the positional differences between dialects may be found in the three-way interaction between dialect, position, and vowel stress, χ2(4) = 53.33, p < .001 (Figure 16). Pairwise comparisons showed that the only dialectal difference in vowel duration was in phrase-initial position between stressed light and unstressed vowels, χ2(1) = 12.113, SE = 1.134, p = .004, which was higher in Hadari (62 − 48 = 14 ms) than Bedouin (61 − 51 = 10 ms).

Mean vowel duration by stress and position in each dialect.
Due to the possibility of the effect of accentual prominence in phrase-initial position, it is not clear whether the greater contrast between stressed light versus unstressed vowels in Hadari is due to lexical stress lengthening alone or additional lengthening due to the presence of a phrasal accent in initial position.
3.5 Discussion of vowel duration results
We earlier found differences in external phase alignment between dialects. In particular, light syllables tended to be earlier in the cycle in Hadari than Bedouin and were aligned similarly to heavy syllables. We speculated that these differences could be due to differences between dialects in unstressed syllable reduction.
Indeed, we found greater durational contrast between stressed light versus unstressed vowels in Hadari than Bedouin, due to unstressed vowel reduction in Hadari, which likely leads to earlier alignment of light syllables in the PRC (see Section 3). In addition, greater unstressed vowel reduction would also cause light syllables to stand out in prominence, which may lead to similar alignment patterns to heavy stressed syllables.
Note that the differences between dialects in stress contrast may well be due to accentual rather than lexical prominence since the strongest differences were confined to phrase-initial position. The contrast between light and unstressed syllables in Hadari may be due to the way accentual lengthening is distributed in Hadari; compared with unstressed syllables, stressed light syllables may absorb most of the accentual lengthening effects (cf. White & Turk, 2010).
We should point out, however, that differences between dialects in vowel duration are small. This is likely due to the highly constrained task of speech production in speech cycling. Future research on these dialects should consider more natural speech elicitation while also controlling for key sources of durational variation, such as lexical stress and accentual effects.
4 General discussion
Using the speech cycling paradigm (Cummins & Port, 1998; Tajima, 1999), we found differences between Hadari and Bedouin dialects in how the stressed vowel onsets of six-syllable trochaic and iambic phrases were coordinated within the PRC. Although the differences were small between the two dialects, the variable alignment of stressed vowel onsets appears to be mediated by dialect-specific structural differences, that is, the variable degrees of durational stress contrast. Specifically, we found that phrase-final stressed light syllables were aligned earlier in the PRC in Hadari, where they also had more similar phase alignment to stressed heavy syllables. We inferred that this dialectal difference in phase alignment may be attributed, in particular, to the greater degree of unstressed vowel reduction in Hadari. Vowel duration analysis corroborated this: there was greater contrast between stressed light vowels and unstressed vowels, through unstressed vowel reduction, in phrase-initial position, in Hadari than in Bedouin. Greater unstressed vowel reduction in Hadari would afford earlier alignment of stressed light syllables and would also boost their relative salience, making them more similar to stressed heavy syllables in rhythmic coordination.
We earlier discussed cross-linguistic differences in speech cycling that may reflect differences in temporal stress contrast: in particular, Zawaydeh et al. (2002) showed that the number of preceding unstressed syllables had more impact on the later alignment of phrase-medial stressed syllables in Jordanian Arabic than in American English. This pattern in Jordanian Arabic is likely influenced by the relatively limited unstressed vowel reduction in this dialect (Vogel et al., 2017) compared with English (e.g., Dauer, 1983), that is, there is less possibility for compression of unstressed vowels, and thus, phase alignment moves later as the number of such vowels increases.
We note, however, that the differences between Hadari and Bedouin dialects in the durational contrast between stressed light and unstressed vowels may not be entirely due to differences in lexical stress but may also be influenced by (non-systematic) lengthening due to phrasal accent. This points to a potential shortcoming of the speech cycling task, whereby speakers have to utter multiple repeated phrases in coordination with metronome beeps. Elucidating timing differences that are specifically due to stress contrast needs to control for phrasal-accent effects (cf. White & Turk, 2010), but controlling such effects (potentially variable between and within speakers) is very challenging in speech cycling tasks, and risks coaching speakers into atypical productions that are not representative of their normative timing patterns.
We found that shorter metronome periods were—for both dialects—associated with closer alignment with 0.5 phase. We suggested that as shorter metronome periods lead to faster speaking rates, temporal variation reduces, and speech production becomes more rhythmic (in a periodic sense). This encourages the alignment of stress beats at relatively evenly spaced, harmonic, time intervals in the PRC.
Comparing our findings regarding the effect of metronome period on phase alignment to those reported in Tajima (1999) suggests that phase alignment in Kuwaiti dialects may seem more similar to that found for Japanese than English. In Tajima’s study, Japanese, like Kuwaiti dialects, exhibited incremental changes in phase alignment as the metronome period decreased, while English showed more consistent phase alignment across different rates. This may be due to greater unstressed syllable compression in English than in Japanese and Kuwaiti dialects.
Speech cycling in Hadari and Bedouin dialects reflected a pattern of entrainment of stressed vowel onsets to extrinsic rhythmic cycles, with greater degrees of temporal stress contrast affording stronger entrainment, that is, closer phase alignment, with the PRC. This finding corroborates our earlier suggestion that hierarchical interaction between prosodic units (e.g., O’Dell & Nieminen, 1999) may be influenced by gradient local timing effects such as temporal stress contrast (e.g., White, 2014). These findings may have implications for patterns of timing interaction between interlocutors in conversational situations. For instance, it has been shown by Włodarczak et al. (2012) that in conversational overlapped speech, English speakers tend to interrupt dialogue partner’s speech at the end of vowel-to-vowel (VTV) intervals, establishing an in-phase coupling relation with VTV boundaries. Similar findings were reported for German and French speech (Włodarczak et al., 2012). It is unknown, however, how different degrees of prominence, in the three languages, may influence the salience of interaction anchor points. The study of dialects in conversational situations will provide potential insights into investigating the effects of gradient prosodic prominence effects on timing interaction between speakers of different dialects without conflating semantic and syntactic cues, which dialects share at many levels.
Exploration of timing interaction and influences of prosodic prominence in natural conversation should consider the effects of speech rate too. Metronome rate in speech cycling in Hadari and Bedouin, specifically faster rates, provided a timing window in which rhythmic production is boosted. We interpreted the effect of faster speech rate as being due to the shorter duration and reduced temporal variation in produced utterances leading speakers to a more rhythmic mode of production. Faster speech rates also have effects on the perception of stress. Perceptual studies indicate that listeners’ entrainment with the forgoing speech rate provides listeners with the timing background against which prosodic timing effects are compared (cf. Reinisch et al., 2010, 2011; White et al., 2022). The perception of durational stress is facilitated when the preceding speech rate is fast, as the salience of stress contrasts increases when the preceding segments are shorter (Reinisch et al., 2011).
5 Conclusion
Recordings elicited using a speech cycling task with speakers of Hadari and Bedouin Kuwaiti Arabic dialects showed patterns of hierarchical temporal coordination between prosodic units. Medial and final stressed vowel onsets were aligned at simple phase angles, 1/2, establishing a rhythmic mode of four, with the fourth being a silent beat in utterances of three stressed syllables. Metronome period had a noticeable effect, as at shorter metronome periods, the alignment of stressed vowel onsets was closer to 1/2 phase angle. This reflects that faster speech rate, associated with shorter metronome periods, facilitates rhythmic production. Differences between dialects, although small, emerged in the alignment of the external phase of stressed light syllables: in Hadari, they were earlier in the cycle and more similar to stressed heavy syllables than in Bedouin. Vowel duration analysis showed greater unstressed vowel reduction in phrase-initial position in Hadari than in Bedouin, which may explain the earlier alignment of stressed light syllables and the more similar alignment with stressed heavy syllables in Hadari. This shows how gradient degrees of prominence influence speech timing in a highly constrained speech task. The findings regarding the effects of speech rate and gradient degrees of prominence on temporal coordination in speech cycling have implications on timing interaction in natural conversation and the extent to which prosodic prominence may have on timing patterns between interlocutors.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: S.G. would like to thank Kuwait University for funding the PhD program, which resulted in this publication. J.A.-T. acknowledges partial support by the Laboratoire de Linguistique Formelle, under the investment program “France 2030” launched by the French Government and implemented by the University Paris Cité as part of its program “Initiative d’excellence” IdEx with the reference ANR-18-IDEX-0001, to JAT.
