Abstract
While relatively rare, implosives can be observed in numerous African languages such as Fulfulde, isiXhosa, and isiZulu. Research on the acquisition of implosives remains limited; however, the few existing studies suggest that these sounds are typically acquired by around age 3;0. Despite these findings, the developmental processes underlying this class of sounds in children are still not well understood. Notably, most studies look at the bilabial implosive /ɓ/, and tend to emphasize categorical acquisition, with relatively little attention given to the acoustic characteristics involved in implosive development. To address this lacuna, this study investigates implosive acquisition in Shimaore, a Bantu language spoken on the French island of Mayotte. Cross-sectional methodologies are used to examine the way Shimaore-French bilingual children produce /ɓ/ and /ɗ/ during a nonword repetition task. Participants were 53 children aged 3;0 to 7;1. Sounds were analyzed using acoustic features, including Voice Onset Time, f0, H1*–H2*, and Cepstral Peak Performance. The results indicated that while children produce implosives at 3;0, they also frequently substitute these sounds with nasals, voiceless stops, and approximants. Nonetheless, substitutions decrease with age. Acoustic analyses demonstrate that, over time, children’s production of implosives increasingly resembles those of adult speakers. These findings are consistent with contemporary phonological theory on language acquisition, particularly the A-Map model, which focuses on constraints related to both accuracy and precision.
Keywords
Introduction
Research on phoneme acquisition in child language has a long-standing history, tracing back to the late 19th century (see Rose, 2017) and continuing through the 20th century with foundational studies by Wellman et al. (1931), Templin (1957), and many others that followed. Over time, study findings have helped not only linguists further their comprehensive understanding of how children acquire language but also speech pathologists to better treat young patients (McLeod & Crowe, 2018). Today, we understand various aspects about phoneme development as related to language-specific phenomena and translingual tendencies. Children often acquire certain classes of sounds earlier than others, while others are developed at varying rates, resulting in a wide range of acquisition periods (McLeod & Crowe, 2018).
The phonemes previously studied, while numerous, are limited to those found in certain language families or those that are the most common cross-linguistically. One class of sounds yet to be thoroughly studied in child language acquisition is implosives, a relatively rare class of sounds found in around 13% of the world’s languages (Maddieson, 2013). Many African languages have implosives in their phonemic inventory (Greenberg, 1970, Maddieson, 2003), including isiZulu (Naidoo, 2010; Pascoe & Jeggo, 2019), Mpiemo (Nagano-Madsen & Thornell, 2012), isiXhosa (Maphalala et al., 2014), and Shimaore (Mori, 2023). Given that these sounds require complex articulatory movement to create rarefication in the oral cavity, they may be challenging to learn. The present study examines the production of bilabial and alveolar implosives in young children’s primary language, Shimaore, a Bantu language spoken on the island of Mayotte in the Mozambique Channel.
Implosives: Acoustic Features of a Rare Natural Class
The IPA (International Phonetic Alphabet) identifies seven voiced implosives: /ɓ/, /ɗ/, /ᶑ/, /ʄ/, /ɠ/, /ʛ/, and /ɠ͜ɓ/ and their seven voiceless counterparts. The most common is the voiced bilabial /ɓ/, with descending frequency moving from anterior to posterior place of articulation (Greenberg, 1970). In addition, languages with implosives privilege them in syllable initial position (Easterday, 2023). Implosives have not been easy to define, partially because they vary in their phonetic properties (see Catford, 1973; Clements & Osu 2002; Lindau, 1984). For example, Clements and Osu (2002) argue that the phonological features of this sound class are at once [-obstruent] and [-sonorant]. In response to this claim, Sande and Oakley (2020, 2023) show that some African languages with implosives exhibit phonological patterns that align with sonorants, whereas others pattern with obstruents; in some cases, languages display both types of behavior. In addition, while it was once assumed that the defining feature was ingressive airflow caused by larynx lowering (Ladefoged, 1971), many now agree that implosives should be defined by what they do not do: Implosives do not have egressive airflow (Lex, 1994). Rather, airflow is either neutral or ingressive, with Hausa (Nihalani, 1986) being an example of the former.
In terms of articulation, speakers across languages employ different strategies to reduce air pressure build-up during the production of voiced implosives, including larynx lowering (Ladefoged & Maddieson, 1996), and in some cases, cheek puffing (Demolin et al., 2002; Grimm, 2019). Indeed, voiced implosives implicate more complex articulatory actions than their pulmonic, plosive counterparts because the speaker must at once vibrate the vocal cords during closure and reduce the resulting increased air pressure that builds up in the oral cavity. It is thus accepted that these complex sounds need to be carefully reconsidered (Ashby, 1990).
Measuring aerodynamic values to check for positive, neutral, or negative oral pressure is arguably the most straightforward method for identifying implosive production. However, they can also be analyzed using acoustic metrics, which is helpful in situations when researchers lack access to intraoral pressure measurement equipment. One key metric is Voice Onset Time (VOT), which is a measure of the duration of release of a stop consonant and the start of vocal cord vibration. Implosives tend to have short negative VOT when compared to plosives (Demolin, 1995; Hussain, 2018; Ladefoged & Maddieson, 1996; Nihalani 1986).
Another measure concerns pitch, as implosives can impact the fundamental frequency (f0) during consonant closure and into the following vowel (CF0). During closure duration, when the vocal cords are vibrating, f0 is higher in implosives than in plosives (Hombert et al., 1979; Ohala, 1976). For some languages with implosives, this high frequency (CF0) carries into the vowel, resulting in a higher pitch than voiced plosives, making them more like voiceless plosives (Hombert, 1978). This has been observed in Mpiemo (Nagano-Madsen & Thornell, 2012). In fact, in terms of pitch, implosives appear to fall somewhere between obstruents and sonorants, as shown in Siswati (Wright & Shryock, 1993). In this language, the f0 of vowels after the bilabial implosive was found to be lower than after the obstruent [ph], but higher than after the sonorant [m].
The last acoustic dimension useful for identifying implosives concerns voice quality, or spectral tilt, because glottal constriction of varying degrees tends to occur during implosive production. Voice quality lies on a continuum, with breathy voice (less glottal constriction) on one end and creaky voice (high glottal constriction) on the other, with modal voice quality in the middle (Jackson et al., 1986). This feature is used in various aspects of linguistics, from identifying pathologies in speaker’s voices (Caldeira Martinez & Cassol, 2015) to describing sociolinguistic phenomenon, like vocal fry (Davidson, 2021). In addition, some languages have a correlation between creaky voice and implosives (Gordon & Ladefoged, 2001; Ladefoged & Maddieson, 1996), such as Hausa. As such, voice quality can also be useful when looking at implosives.
To measure this feature, First Harmonic to Second Harmonic (H1–H2) ratios and Cepstral Peak Performance (CPP) can be used together (Garellek & Esposito, 2023). While H1–H2 measures glottal constriction (low values indicate higher glottal constriction), it is often in paired analysis with CPP values to identify voice quality as approaching breathy, modal or creaky. For example, when two individuals with similar H1–H2 measurements have differing CPP measurements, it can be said that the person with lower CPP values demonstrates more glottal constriction, thus having a voice qualified as more creaky or less modal than the other individual (see Garellek, 2019). Because it is less sensitive to background noise (Thomas, 2011), which is frequent in recording environments outside of acoustic cabins in, for example, tropical conditions, CPP is excellent for measuring harmonics-to-noise ratios.
The Shimaore Language Contextualized in Mayotte’s Linguistic Landscape
Before detailing the phonetic and phonemic characteristics of Shimaore, a brief sociolinguistic overview of Mayotte is necessary because the study’s participants are bilingual Shimaore-French speakers. Located in the Mozambique Channel, this small, tropical, and multilingual island is one of four in the Comorian Archipelago (for a history, see Walker, 2019). It is also France’s newest department, putting it on equal constitutional footing with mainland France. For instance, schooling is compulsory from the age of three and is conducted in French, which is spoken by approximately 82% of the population between the ages of 15 and 24, but only 29% of the population over 60 years of age (Dehon & Louguet, 2022).
The two main local languages were recently recognized as regional languages of France, creating the possibility to teach them in schools. The primary local language, Shimaore, is classified as a G44d Bantu language (Guthrie, 1948, see Nurse & Hinnebusch, 1993) and is spoken by most of the population (approximately 80% according to Dehon & Louguet, 2022). The other language, Kibushi, is an Austronesian language originating from Malagasy and is spoken by about 20% to 30% of the population (Dehon & Louguet, 2022; Jamet, 2016). Notably, there are several varieties of these two official, local languages, such as Shindzuani (see Mohamed, 2017), which comes from the nearby island of Anjouan and is mutually intelligible with Shimaore (for a map from Mori, 2023, see Supplemental Material: SM1). The Shimaore language is thus found within a dynamic and intricate linguistic milieu, notably characterized by the mundane nature of language contact and multilingualism.
As for the language itself, Table 1 (from Mori, 2023) shows the sound inventory of Shimaore. The language has a voiced bilabial implosive /ɓ/ and a voiced alveolar implosive /ɗ/ phoneme, and a non-phonemic palatal implosive [ʄ] (Mori, 2023). This palatal implosive is an allophone of /ɖ/ and can mostly be observed in the varied pronunciations of the verb udya “to eat,” in which the pronunciation is [uʄa]. The bilabial is the most prevalent implosive in the language. In addition, Shimaore has penultimate word stress and requires open syllables. Implosives (and other consonants) are therefore only found in onset positions (i.e., word-initial and intervocalic positions). 1
Consonants and Vowels of Shimaore, With Marginal Consonant Phonemes Shown in Parentheses.
Source. Adapted from Mori (2023).
The post-alveolar retroflex plosives /ʈ/ and /ɖ/ have also been observed as retroflex affricates [ʈʂ] and [ɖʐ] by Rombi (1983) and Mori (2023). Marginal consonant phonemes are indicated in parentheses.
In terms of acoustic properties, there is only one study to date that looks at implosives in Shimaore, and of that, only the bilabial: Mori (2023) conducted a comparative analysis of the acoustic characteristics of the phonemes /ɓ/ and /b/, articulated by 28 adult Shimaore speakers. Quantitative analyses revealed that the speakers exhibited an average VOT of −57.9 ms (SD 22.67) for /ɓ/, while the plosive /b/ demonstrated an average VOT of −105.55 ms (SD 33.32). This discrepancy was found to be statistically significant. Furthermore, f0 and amplitude were elevated in implosives relative to plosives during closure duration. In terms of voice quality, while not classified as creaky, Shimaore implosives exhibited evidence of glottal constriction. Qualitative analyses of spectrograms and waveforms showed that implosives tended to have a build-up of amplitude during closure, and that some had visible bursts at release.
Implosive Development in African Languages
Various studies have looked at consonant development in children. For example, McLeod and Crowe’s (2018) review article examined consonant acquisition in 27 languages, including 5 languages from the African continent: Afrikaans, English, Setswana, Swahili, and isiXhosa (Arabic was also analyzed but from countries outside of Africa). The article included three studies that looked at implosives, which were all on the same language, isiXhosa (Maphalala et al., 2014, Mowrer & Burger, 1991, Tuomi et al., 2001). Table 2 shows these and other studies looking at implosive acquisition in African languages, most of which focus on the voiced bilabial implosive. As can be seen, /ɓ/ has been observed to be acquired before 3 years of age in Bantu languages, specifically isiXhosa (Maphalala et al., 2014, Mowrer & Burger, 1991, Toumi et al., 2001), isiZulu (Pascoe & Jeggo, 2019), and Swahili (Gangji et al., 2015). However, in a case study on Ikwere of the Niger-Congo family, Alerechi (2019) found that voiceless implosives were not yet acquired by 4 years of age. In his doctoral dissertation based on case studies of several children, Cissé (2014) observed implosives being produced as early as 7 months of age. Notably, most studies look at children starting at around 3, and Matlhaku (2023) has underscored the lack of studies on children younger than this.
Studies Looking at Acquisition of Implosives in African Languages.
Cissé’s PhD dissertation involved several participants, but for simplicity and scope, only one child is considered here, who was a bilingual Fulfulde-Bambara speaker. Bambara has no implosives.
In terms of methodology, most studies used categorical measures (i.e., acquired or non-acquired) determined by investigator judgment (auditory ratings). This is typically accompanied by a ratio of the quantity of phonemes correctly produced to the total amount targeted, called the Percentage of Consonants Correct (PCC) Metric (see Shriberg et al., 1997). For instance, McLeod and Crowe’s (2018) review paper identified phonemes as acquired when produced 75% to 90% of the time by children in the studies. Such measures allow for cross-linguistic comparisons as well as application in clinical settings. In fact, the studies listed in Table 2 used descriptive and categorical methods to assess acquisition, and none used acoustic measures. Given the complexity of the phonetic-phonological interface in language acquisition (Pierrehumbert, 2003), this methodological addition could be important. Phoneme development is gradual and non-discrete; for instance, children as old as 6 are still acquiring certain sounds in isiZulu (Naidoo et al., 2005). Throughout this extended developmental trajectory, target phonemes undergo phonetic variation in their realizations. Studying these phonetic changes can provide important insights into the processes underlying language development, including issues related to natural classes and specific acoustic features. In conjunction with standard measures such as PCC, incorporating this type of metric could foster innovation in the field and help address existing methodological and perhaps theoretical blind spots.
Several theories exist regarding phoneme acquisition in children, some of which rely on articulatory considerations, whereas others consider cognitive development. For the former, Sander (1972) posited that children’s phoneme development follows a hierarchy influenced by place and mode of articulation. For example, phonemes with anterior place of articulation (such as bilabials) and either plosive or nasal manner of articulation (such as /p/ or /m/) are acquired at a young age, most likely because of poor mastery of tongue coordination required for other places of articulation. During development, various phonological processes can be observed, including substitution, which depend on various factors, such as syllable form, vowel co-articulation, place of articulation, and manner of articulation (Vihman, 2014). For example, palatal consonants may be replaced by bilabial ones, thus requiring less tongue manipulation. For the voiced and voiceless bilabial implosive, Alerechi (2019) observed a substitution with the nonphonemic plosive counterparts [p] and [b] in children between the ages of three and four.
Phoneme acquisition is not just a matter of articulatory restraints related to vocal track and motor development, however. A growing body of research grounded in the emergent phonology framework (see Archangeli & Pulleyblank, 2022; Vihman & Keren-Portnoy, 2013) highlights the dynamic nature of language development, emphasizing the crucial role of input and exposure within a child’s linguistic environment. These factors include issues related to word frequency (Demuth, 2007; Fikkert & Levelt, 2008; Sosa & Stoel-Gammon, 2012), perception (Bennett et al., 2018), and the role of constraints as explained by Optimality Theory. For example, McAllister Byun et al. (2016) have posited an innovative model for understanding child phonemic development, the A(rticulation)-Map model (the A-Map model). Their theoretical article examines the interplay of different constraints that influence speech production, with the goal of understanding the developmental reasons why children produce phonemes that differ from those of adult speakers. They concluded that constraints are influenced by two main factors: the need to be accurate (thus producing speech like adults) and the need to be precise (thus being consistent in their speech). That is, when a target phoneme cannot be accurately pronounced, children may opt for regularly producing an alternative sound.
Phonemic development in children is a complex process shaped by both cognitive and articulatory influences. Gaining a fuller understanding would involve examining substitution patterns and acoustic characteristics. Such a combined perspective aligns with certain strands of emergentist theory, which suggest that phonological representations evolve through experience. Tracking measurable acoustic changes over time can provide useful insights into the dynamic nature of phoneme acquisition.
Methods
Participants
Fifty-three children (of which 21 boys) between the ages of 3;0 (3 years, 0 months) and 7;1 (7 years and 1 month) participated in the study (average age 5;2, median 5;9, mode 5;9). Specifically, 12 three-year-olds, 11 four-year-olds, 12 five-year-olds, 15 six-year-olds, and 3 seven-year-olds participated. This age range of 3 to 6 years old is used in various studies looking at child language acquisition in African languages (Mahura & Pascoe, 2016; Maphalala, et al., 2014; Mowrer & Burger, 1991; Naidoo et al. 2005; Pascoe & Jeggo, 2019), and this range harkens back to studies done in the 1980s elsewhere in the world on child language acquisition (see Irwin & Wong, 1983). Due to the voluntary nature of the study, age and gender were not representative, but care was taken to select children born throughout the calendar year of their age group.
The participants were identified by their teachers based on age and language background, specifically that they spoke Shimaore. Neither parents nor teachers mentioned any developmental difficulties, particularly those related to language proficiency. Standard parent, teacher, and institutional consent were acquired. Due to the nature of the project (unfinanced with minimal resources and focused on quantity rather than case-study methods), detailed information about each participant’s language background was not provided. What is known is that the children spoke both French and Shimaore, the latter to such an extent that the parents and teachers thought it was appropriate for them to participate in a study on Shimaore-speaking children. At the time of the study, there was a paucity of research on first language use in Shimaore-speaking homes on the island.
Setting
The study was conducted in a Shimaore-speaking village on Mayotte Island at a private primary school. In terms of demographics, participants were from households with stable socioeconomic standing since monthly school costs are significantly higher than those of the free public education system. While some students came from adjacent villages where the target language was also spoken, many if not most of the children were from the town itself.
Stimuli
A Non-Word Repetition Task (NWRT; Chiat, 2015) was created for the experiment in which children are encouraged to repeat aloud nonce words. The protocol took inspiration from Alkhudidi’s doctoral dissertation (2024), which used fictional alien names and Polišenská and Kapalková’s (2014) research, which used magical beads to elicit data in efforts to increase child participation during NWRTs. This two-story design was developed to prevent the youngest participants from becoming disinterested in the task. The alien task involved showing cartoons of aliens and explaining that they had to return to their planet but that their spaceships were having problems taking off. The participants could pronounce the aliens’ names out loud to help light the spaceship and guide the aliens on their journey. Once the child said the name of the alien five times, it disappeared from the screen. The second task scenario involved a child named Moussa, who was trying to string a beaded pearl necklace for his friend, but ran into trouble because the beads did not want to cooperate. The participant could help Moussa string the beads by saying each bead’s name aloud five times, after which it disappeared. During the experiment, children interacted with a tablet which showed the images of the individuals to be named out loud and which described the experiment before saying aloud each target sound twice.
Four phonemes were targeted in the experiment: /ɓ/, /ɗ /, /ʈʂ/, and /mb/. Each phoneme was found in eight word-initial, disyllabic nonce words (alien names or bead names) and were produced five times, for a total of 40 tokens per phoneme. Care was taken to vary vowel type, as can be seen in Table 3. Four randomized sets were created and randomly assigned to the participants. That is, in the first task with the aliens, participants would hear a randomly selected combination of names beginning with the target phonemes. Then, in the second task, they would also hear a random mixture of words containing the target phonemes. Due to risks of test fatigue, given the limited attention span of young children, it was decided to only look at phonemes in word-initial position, such that phonemes in intervocalic position were not tested.
Nonce Word Names of Aliens and Beads, in IPA.
Note. IPA = International Phonetic Alphabet.
The person whose voice was heard during the tasks is a woman who speaks Shimaore and French and lives in the village where the study was done. To facilitate cultural understanding of the story, instructions were given in French. For instance, in Shimaore, the word “magic” has a negative association with sorcery, such that interacting with magical beads for the second task might have been poorly seen. Instructions were recorded in a quiet room using an H1N Zoom microphone. See Supplemental Material (SM2 and SM3) for waveforms and spectrograms of the nonce words /ɓavi/ and /ɗista/ used as stimuli.
Procedure
The children were recorded individually in relatively quiet spaces on the school premises during school hours. The recording equipment utilized was an H1N Zoom microphone, positioned a few inches from each child’s mouth to mitigate potential sound nuisance issues from outside. The stimuli were displayed on an 8-inch tablet equipped with standard speakers, positioned approximately one foot from the child. The stimuli were presented in the form of a video on the tablet, accompanied by still images and a voiceover that explained the purpose and instructions of each task. The voiceover articulated the names of the extraterrestrial beings and the beads. The experiment was designed to last approximately 10 min per participant. The children were encouraged to articulate their words clearly and at a moderate pace. They were also instructed to repeat the name of the alien or bead in question five times.
Analyses
Data were annotated to TextGrids in Praat (Boersma & Weenink, 2019). The author coded, segmented, and identified each consonant of the first syllable for the 16 target words with the help of waveforms and spectrograms via Praat. Regarding the identification of implosives, all consonants exhibiting a negative VOT, a visible release, and non-nasal formant qualities were segmented as “ɓ” or “ɗ.” 2 This decision was made to circumvent potential errors in categorization: Mori’s (2023) study revealed variations in Shimaore bilabial implosives, including instances where they exhibited acoustic properties similar to bilabial plosives, which were namely a long-VOT and minimal amplitude build-up prior to release. In fact, the acoustic properties of implosives can be conceptualized as a continuum, with certain types exhibiting characteristics analogous to those of their plosive counterparts. Thus, a more conservative approach to categorization was adopted, whereby implosive versus plosive bilabials and alveolars were not discretely categorized, since acoustic analyses were privileged. That is, we have not differentiated between b/ɓ and d/ɗ.
Some potential tokens were discarded due to noise or obscure and unclear pronunciation. A total of 3,725 target words were annotated. Various Praat scripts were used to extract measurements, including PraatSauce for f0, H1*–H2*, CPP, and other Spectral Tilt measurements (Kirby, 2020). It is common procedure to use the corrected H1*–H2* values rather than the original H1–H2 since this standardizes measurements and helps comparison across vowels (see Garellek, 2019). F0 was converted to Semitones to compare values across participants.
Data visualization and statistical analyses were done using various R (R Core Team, 2023) and Python packages, such as ggplot (Wickham, 2016). For visualization of waveforms and spectrogram, PraatPicture (Puggaard-Rode, 2024) was used. Due to the non-normative distribution of the data, non-parametric tests were used, and for analyzing variation, interquartile range (IQR) rather than standard deviation (SD) was calculated. The IQR measures the spread of the middle 50% of the calculated values, giving readers a sense of the range of variation. Outliers (more than 3 SD) were discarded when calculating means. To check for patterns over time (age in months) regarding acoustic data, linear regression models were also employed.
Results
Target Phoneme Production: Type and Token of Substitutions
The first observation is that children of all ages substitute the target implosive sounds with other phonemes. Figure 1 displays a waffle chart of each token by type and by age in years for the target sound /ɓ/. As can be seen, three main consonants are produced: [ɓ], [m], and [ph]. Token count of the correctly produced target sound /ɓ/ increases with age. However, other phonemes were also produced, such as [h]. Figure 2 displays the information for /ɗ/, in which there is more variation of phoneme type when compared to the bilabial implosive. While the target sound /ɗ/ increases with age, participants mainly substitute the sound with [th], [l], and even [ɓ].

Types and tokens of bilabial imlosive sounds by year of age.

Types and tokens of alveolar imlosive sounds by year of age.
These tokens and categories are shown by age in years and frequency in percentages in Table 4. In terms of change over time, the realization of the target sounds /ɓ/ and /ɗ/ increases with age, reaching 50% by 5 years for bilabials and 7 years for alveolars. The total token production of the target sound /ɓ/ was 53% but only 31% for /ɗ/, showing that substitution for this latter sound was more frequent than for the former. These PCCs are much lower than the 75% and 90% threshold typically used to attest phoneme acquisition.
Phoneme Type and Token in Descending Order of Frequency, in Years.
Acoustic Analyses
The following measurements were taken from the tokens produced as /ɓ/ and /ɗ/ only. That is, this section does not further analyze other sounds discussed in section “Target phoneme production: Type and token of substitutions.” Thus, a total of 989 tokens for /ɓ/ and 576 for /ɗ/ were analyzed (see Table 4). Because some participants did not produce exploitable tokens, not all participants were represented on the graphs. This was a small number of participants: two for /ɓ/ and four for /ɗ/.
Voice Onset Time
Results for VOT are shown in Figures 3 and 4 for the bilabial and alveolar implosive, respectively. VOT means are shown across age in months with confidence intervals. As can be seen in the figures, individual VOT varies across time, and there does not appear to be a correlation between VOT and age, as the slope of the linear regression is not significant (for /ɓ/, the slope = −.058 and .202 for /ɗ/). Table 5 calculates means and IQR by years in age for each implosive. Across phoneme type, VOT averages range between −37.24 ms and −55.61 ms, which are more like the values found for /ɓ/ (−57.9 ms) than for the plosive /b/ (−105.55 ms) in Shimaore spoken by adults (Mori, 2023). The IQR decreases over time for the bilabial implosive, meaning that the VOT averages have less spread, suggesting an incremental increase in precision. For the alveolar implosive, the IQR measures show a non-linear pattern, with values decreasing and increasing over time before reaching levels similar to the bilabials by age 7.

Bilabial implosive mean VOT values per participant across months.

Alveolar implosive mean VOT values per participant across months.
VOT Averages in ms With IQR.
Note. Kruskal–Wallis H-test for /ɓ/ = 15.47 p<.01 /ɗ/ = 6.47, p = .167. IQR = interquartile range; VOT = Voice Onset Time.
Pitch and Amplitude of /ɓ/ and /ɗ/
Pitch and amplitude averages during the last 20 ms of oral closure were analyzed (see Supplemental Material [SM4–SM7] for figures showing means and confidence intervals over time). Several observations can be made, mainly that there is no change in pitch nor amplitude over time (global trendlines’ slopes are not statistically significant), and individual variation exists. Table 6 displays averages by year, showing the variation of pitch and amplitude during oral closure. It appears that from the youngest age, implosives are being produced in ways like adults in terms of pitch and amplitude (Mori, 2023).
Pitch and Amplitude Averages During Last 20 ms of Closure Duration in the Target Phonemes /ɓ/ and /ɗ/.
Note. Kruskal–Wallis H-test for age is statistically significant for all measurements: For /ɓ/ Pitch = 66.62 p < .01; Amplitude = 371.46 p < 0. For /ɗ/ Pitch = 106.52, p < .01; Amplitude = 291.46 p < .01. IQR = interquartile range.
For example, amplitude averages at the consonant release of bilabial implosives are 56.69 Db (SD = 10.62) versus 51.95 Db (SD = 11.00) for bilabial plosives of adult Shimaore speakers, which is a statistically significant difference (Kruskal–Wallis χ2 = 410.58, df = 1, p < .01) (Mori, 2023). While the amplitude measures in Table 6 differ from the plosive amplitude averages, they are comparable to or higher than the adult implosives. For a visual illustration, Figure 5 shows the spectrogram and waveform for [ɓen] from /ɓenu/, produced by the youngest participant (3;0). Increased amplitude is visible in the waveform before release, and VOT appears short, just like in Figure 6 of the same production made by the oldest participant (7;1). As for alveolars, Figure 7 shows [ɗiʃ] from /ɗiʃa/, produced from a child aged 3;8, which also shows short VOT, increased amplitude, and visible formants as in Supplemental Material (SM3).

[ ɓen] from / ɓenu/ of the youngest participant 3;0.

[ ɓen] from / ɓenu/ of the oldest participant 7;1.

[ ɗish] from /disha/ of participant 3;8.
Voice Quality and Spectral Tilt: H1*–H2* and CPP
As for the two measurements for identifying voice quality and spectral tilt, Table 7 shows H1*–H2* and CPP averages. While slightly decreasing over time, linear trends are not significant for either H1*–H2* measures nor CPP for /ɓ/. However, the slope is statistically significant for CPP in /ɗ/, and this value declines as age increases (slope = −.059, p = .01). See Supplemental Material (SM8–SM11) for figures. For comparison, these values for bilabial implosives in adults are the following: the H1*–H2* mean is 8.10 Db (SD 6.60) and the CPP mean is 14.45 (SD 3.31) (Mori, 2023). As for adult bilabial plosives, for the H1*–H2* mean is 11.15 Db (SD 7.11), and the CPP mean is 13.82 (SD 3.38) (Mori, 2023).
H1*–H2* and CPP Averages for 20 ms of the Vowel (From 5 ms to 25 ms After Oral Closure).
Note. /ɓ/: For CPP: Kruskal–Wallis H statistic: 118.97 p-value: 8.86e-25. For H1*–H2*: Kruskal–Wallis H statistic: 20.16 p-value: .00046. /ɗ/: For CPP: Kruskal–Wallis H statistic: 40.21 p-value: 3.91e-08. For H1*–H2*: Kruskal–Wallis H statistic: 27.16 p-value: 1.84e-05. CPP = Cepstral Peak Performance; IQR = interquartile range.
Discussion
Several conclusions can be drawn from analyses regarding Shimaore-speaking children’s bilabial and alveolar implosive development. Perhaps the most notable finding is that both sounds are being pronounced in adult-like ways as early as 36 months, supporting previous research on bilabial implosive development in Swahili (Gangji et al., 2015), isiXhosa (Maphalala et al., 2014, Mowrer & Burger, 1991, Toumi et al., 2001), and isiZulu (Naidoo et al., 2005; Pascoe & Jeggo, 2019). In terms of acoustic characteristics, it is observed that as early as 3 years of age, the VOT, f0, amplitude, and spectral tilt measures have qualities similar to those of adults (Mori, 2023). However, a closer examination of certain acoustic measures reveals a nonlinear developmental pattern, with sometimes different trajectories for the bilabial and alveolar implosives.
First, regarding VOT, while the global trend lines are not significant, the VOT slope of the bilabial implosive (see Figure 3) is smaller than that of the alveolar implosive (see Figure 4), suggesting the (mostly) stable realization of the former over time. In addition, the range of variation in VOT for the bilabial decreases steadily, suggesting a constant increase in precision. This is not the case for alveolar implosion, where we see more variation in the mean and IQR. In fact, the range of variation shows a non-linear trajectory across age groups, with an initial decrease, followed by an increase, and then a subsequent decrease. A similar, though less pronounced, pattern is also observed for pitch and amplitude. These findings suggest that the alveolar implosive is not fully acquired and that children go through different stages in producing the target phoneme, as the A-Map model theorizes (McAllister Byun et al., 2016).
As for spectral tilt (voice quality) and glottal constriction, we see that CPP values decrease over time (see Supplemental Material: SM10 and SM11), along with IQR values. In particular, the CPP for /ɗ/ shows a statistically significant decrease with age, gradually approaching adult-like levels (Mori, 2023). That is, the implosives produced by children reflect some glottal constriction that has been observed in adults for the bilabial implosive. This pattern indicates that children refine voice quality features as they age, echoing the developmental trajectory observed in McAllister Byun’s (2012) study on velar fronting (as discussed in McAllister Byun et al., 2016).
However, another key finding is that children do not consistently produce the target phonemes at a PCC rate of 75% to 90% from a young age, as required to demonstrate acquisition according to McLeod and Crowe (2018). Rather, at 7 years of age, rates are 64% for /ɓ/ and 52% for /ɗ/, which shows that these two implosives pattern differently in production. For example, implosive production increases with age, with bilabial implosives emerging earlier and occurring more consistently than alveolar implosives during the initial stages of development. Although the source of this discrepancy remains uncertain, it is likely the result of several contributing factors. In the case of the alveolar implosive, these may include lower input frequency (see Demuth, 2007) (as it is less frequent in Shimaore), 3 perceptual challenges (Pierrehumbert, 2003), and greater articulatory complexity associated with tongue movement (McAllister Byun et al., 2016).
In addition, across age groups, children replace implosive sounds with phonemes that are similar in mode and/or place of articulation. For /ɓ/, this was mainly [m] and [ph], with a consistent decrease in the use of the sonorant over time (see Table 4). At age 3, nearly one third of tokens are realized as the sonorant [m], but this proportion decreases to roughly 1 out of 10 by age seven. In contrast, the aspirated stop remains relatively stable, appearing as a substitution in about one fifth of cases. For /ɗ/, the pattern is somewhat reversed: the obstruent [th] is the most frequent substitute, followed by the sonorant [l]. The obstruent initially accounts for nearly one third of substitutions, but its rate declines to about one tenth by age seven. Although the reasons for these substitution patterns are not fully clear, they may reflect the interaction of several factors already mentioned, including input, articulatory constraints, and perception.
Another possible reason is that children assign greater weight to certain acoustic properties of implosives—specifically VOT, f0, and amplitude. This ranking may in turn influence consonant substitution choice. For instance, the acoustic properties of nasals and glides overlap with those of implosives in ways characteristic of sonorants, particularly with respect to VOT and amplitude. By contrast, the acoustic profile of aspirated stops—particularly the rise in f0 during the following vowel—resembles the elevated pitch values reported for Shimaore implosives at release (Mori, 2023), and which has been observed in Siswati (Wright & Shryock, 1993) and Mpiemo (Nagano-Madsen & Thornell, 2012). Such patterns suggest that children may prioritize certain acoustic cues when selecting substitution phones. This would be consistent with the findings of Mori (2023), who identified VOT, amplitude, and f0 as the most salient features distinguishing implosives from plosives (more so than, say, voice quality via glottal constriction).
Finally, it was observed that children sometimes produce the bilabial implosive (9% on average) when the alveolar implosive is needed. While it is unclear why this occurs, we theorize that this may be due to perceptual issues related to frequency, which could affect perception and language development (see Menn et al., 2013). Other factors such as the lack of visual cues such as lip rounding could have also influenced perception. In any case, all of these hypotheses warrant future investigation.
The study highlights several promising opportunities for future research. First, phoneme perception research, particularly with respect to frequency effects, may complement analyses of consonant substitution by shedding light on the relative weighting of acoustic features in children’s productions. Given that implosives were most frequently substituted with sonorants and obstruents—and given the apparent distinction between alveolar and bilabial implosives—perceptual studies could provide a valuable framework for understanding the basis of these production patterns. For example, Oakley and Sande (2023) found that speakers with and without the bilabial implosive in their language repertoire (Guébie and English speakers, respectively) tend to identify implosives as plosives (obstruents), rather than nasals or glides (sonorants). They argue that the features that implosives share with plosives are more significant than those shared with nasals and glides. It is unclear what this might look like in Shimaore, and if there might be a difference in perception and subsequent categorization for bilabial versus alveolar implosives.
Another area that warrants further investigation involves the role of language exposure—particularly in terms of frequency (Fikkert & Levelt, 2008; Sosa & Stoel-Gammon, 2012) and perception (Bennett et al., 2018)—in the acquisition of implosives in Shimaore, paralleling cross-linguistic findings on rhotic acquisition reported by Rose and Penney (2022). This is of particular importance given the bilingual status of the children, and the inherent difficulty of controlling language use and exposure levels in bilingual environments. Incorporating measures of language exposure into acquisition analyses would be a valuable direction for future research.
A methodological consideration that merits attention when interpreting the results and their limitations pertains to the analysis of implosives in word-initial position only. Due to the resting state of the vocal cords, producing word-initial implosives poses a greater articulatory challenge and may therefore be considered less stable than production in intervocalic position (Didier Demolin, personal communication). Although Mori (2023) observed no substitutions in adult productions of word-initial bilabial implosives, it remains possible that this positional factor influenced the realization of implosives in the present data.
Conclusion
In conclusion, this study sought to better understand implosive phoneme acquisition in Shimaore-speaking children by means of cross-sectional methods. It examined children aged 3;0 to 7;1 as they produced nonce words containing /ɓ/ and /ɗ/ in word-initial position. The findings indicate that while the phonemes are pronounced with target-like accuracy as early as 3;0, their frequency is lower than at other ages. As children mature, implosive use increases, at time in a non-linear manner. The most frequently substituted phones were sonorants and obstruents, especially aspirated voiceless stops, nasals, and approximants at the respective articulatory positions. Acoustic measurements show that bilingual Shimaore-French children produce bilabial and alveolar implosives from an early age similar to those produced by adults (see Mori, 2023), and their precision increases over time. Age-related improvements in features like VOT and voice quality support the A-Map model of phoneme development (McAllister Byun et al., 2016), in that precision and accuracy increase as children become older, and that changes sometimes follow an expected non-linear trajectory, particularly for the alveolar implosive.
Supplemental Material
sj-jpg-1-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-1-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-10-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-10-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-11-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-11-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-2-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-2-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-3-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-3-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-4-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-4-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-5-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-5-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-6-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-6-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-7-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-7-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-8-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-8-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Supplemental Material
sj-jpg-9-fla-10.1177_01427237251381994 – Supplemental material for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island
Supplemental material, sj-jpg-9-fla-10.1177_01427237251381994 for Voiced Implosive Acquisition: Acoustic Analyses of /ɓ/ and /ɗ/ as Produced by Shimaore-Speaking Children 3;0 to 7;1 on Mayotte Island by Miki Mori in First Language
Footnotes
Acknowledgements
The author would like to acknowledge the following people for their help with this project: Titia Benders, Keynote speaker at the 20th ICPhS in Prague, who inspired and then encouraged me to do this work; Anwar Alkhudadi, Dr. Bender’s former PhD student, whose methodological feedback helped during the protocol making; Anliati Ahmed Abdallah who helped validate and produce the wordlist; Saraïta Mouhiddine for her help recording participants; the teachers and staff where the study took place, including “maîtresse Lise” for organizing participant recruitment; the children who participated and their parents for giving participation consent. The author would like to thank the anonymous reviewers for their constructive feedback.
Authors’ Note
The images used as stimuli in the experiment can be provided upon request.
Consent to Participate
Consent forms were obtained from the parent of the participants
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The author received financial support from the Université de Mayotte for publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental Material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
