Abstract
This article focuses on the early acquisition of fricative and affricate consonants in Setswana, a Bantu language spoken in Botswana and South Africa (Tswana S30). We describe a series of intriguing patterns displayed in the speech of Setswana-learning children between the ages of 1 and 3 years. The data display clear trends also expected from the larger literature on phonological and phonetic development, including the stopping, affrication and debuccalization of target fricatives |s, ʃ, f, χ|, and the simplification (deaffrication; deaspiration) of target affricates |tɫ, tɫʰ, ʦ, ʦʰ, ʧ, ʧʰ, ʤ|. Beyond these general trends, the data reveal intriguing asymmetries: (a) the fact that coronal fricatives |s/ʃ| display much less debuccalization to laryngeals than the non-coronal fricatives |f, χ|; (b) the observation that while the non-lateral affricates |ʦ, ʦʰ, tʃ, ʧʰ, ʤ| are generally produced with their target place of articulation, the lateral affricates |tɫ, tɫʰ| can be variably substituted for velar stops [k, kʰ] instead. We first discuss why an analysis of the observations stated above transcends traditional models of phonology based on phonological features. We then argue that, in addition to phonological conditioning, issues in speech phonetics may also influence how children analyze the speech forms of their language. Our analyses reconcile the data with phonological theory, also in ways that offer additional insight on the origins of speech sound substitutions in child language.
Keywords
Introduction
Mainstream models of phonological theory coalesce around sets of phonological features, taken as the atoms of phonological representation. Features can be used to express formal contrasts between speech sounds or encode phonological operations, for example, in terms of contextual allophonic variation or as the result of morpho-phonological alternations (Ewen et al., 2011; Goldsmith et al., 2011). Likewise, models of phonological acquisition often adopt similar formalisms to capture the types of substitutions we observe in child speech: Rules and/or constraints making reference to phonological features can be used to capture the types of speech sound substitutions that children commonly display as they learn to produce the sounds and sound combinations of their native languages. For example, it is common for children to add voicing to voiceless obstruents, in words such as papa [papa] produced as [baba]. This type of substitution can be captured via rules or constraints making reference to phonological features such as [±voiced] or [voiced]/[voiceless] feature contrasts in monovalent systems.
However, as discussed throughout the literature on child language acquisition, children often display speech sound substitutions that defy the type of logic that normally defines phonological patterning in adult systems (Priestly, 1977; Rose & Inkelas, 2011). Further, while substitution of |r| by [w] 1 (e.g. ring produced as [wɪŋ]) is frequently observed in the speech of English-learning children, it is not clear whether it should be captured in terms of place of articulation (labiality) or sonority (liquid-to-glide substitution) (Rose & Penney, 2022 for recent discussions).
In this article, we focus on similar challenges, this time from the early acquisition of fricative and affricate consonants by first-language learners of Setswana, the national language of Botswana, aged between 1 and 3 years old. 2 Beyond well-documented general trends, the data reveal asymmetrical patterns between different types of fricatives as well as completely unexpected patterns of place substitutions affecting lateral affricates, at least if considered from traditional analyses based on phonological features. In order to reconcile data with theory, we claim that the patterns observed have their origins in the children’s own analyses of the target speech forms, which can be influenced, at least in part, by phonetic conditioning, especially in the area of auditory perception. Our interpretation of the data embraces an emergentist approach to phonological knowledge, in particular concerning how speech phonetics can shape the child’s own representation of the speech forms they attempt to reproduce through their own speech. We take as examples two prominent asymmetries. The first is the observation that while debuccalization of target fricatives to laryngeals is noticeable across the sample set, coronal fricatives |s/ʃ| are much less affected by this process than the non-coronal fricatives |f, χ|. Second, while the non-lateral affricates |ʦ, ʦʰ, ʧ, ʧʰ, ʤ| are generally reduced to simpler stops or fricatives with their target places of articulation, as coronals, the lateral affricates |tɫ, tɫʰ| may also be substituted for velar stops [k, kʰ], a place substitution pattern that is clearly surprising at first sight. We frame our discussion of these patterns within the A-map model of phonological emergence (McAllister Byun et al., 2016; Rose et al., 2021), which focuses on the mechanisms by which the child learner must map auditory categories of speech into the types of articulatory gestures required to reproduce these categories into speech. As predicted by the A-map, early auditory-articulatory mappings of speech forms may yield variable and/or surprising results, as the child must first discover the correct mappings, a non-trivial challenge and, from there, learn to reproduce them faithfully and reliably in their own speech productions.
This study does not contribute developmental norms for the language, a task that extends well beyond the scope of our empirical investigation. However, this study offers keys to understand some of the challenges that Setswana learning children must cope with while learning the sets of fricative and affricate consonants that compose the language. By extension, it contributes insight toward the development of speech assessment materials and services for Setswana learning children, an emerging area of public service in Botswana.
We begin with an overview of the phonological system of Setswana, in the next section. We then provide background to some theoretical as well as empirical issues in the acquisition of Setswana phonology, in section ‘Background’. Building on this background, we introduce our study methods in section ‘Current study’, followed in section ‘Results’ by the relevant data obtained from our empirical sampling. In section ‘Phonetic factors affecting the acquisition of fricative and affricate places of articulation’, we discuss the theoretical implications of these findings and, for each pattern, the types of phonetic factors that may give rise to their emergence in the children’s speech. We conclude briefly in section ‘Conclusion’.
Setswana Phonology
Consonant Phonemes
Setswana, also known as Tswana (S30), is a Bantu language spoken in Botswana, South Africa, Namibia, and Zimbabwe. Setswana is the official language of Botswana. 3 In Botswana, it is spoken by approximately 1.3 million individuals (approximately 79% of the population; Batibo et al., 2003). In South Africa, Setswana is 1 of 11 official languages, while in Namibia and Zimbabwe it is a minority language. Table 1 provides the 28 distinctive consonants of Setswana (Department of African Languages & Literature [DALL], 1999).
Setswana Consonant Inventory.
Source. Adapted from DALL(1999, p. 10).
[d] is an allophone of /ɫ/ and the two occur in complementary distribution. [d] is found before [+high] vowels /u/ and /i/ while /ɫ/ precedes the vowels /ɪ ɛ a ɔ ʊ/. In addition to the 28 consonants, Setswana includes in its inventory a series of consonants with a secondary, labio-velarized place of articulation (Cole, 1955; Gouskova et al., 2011; Mathangwane, 1999; Mogapi, 1984; Otlogetswe, 2017; Rogers, 2009), including the plosives (/tʷ tʰʷ kʷ kʰʷ qʰʷ/), fricatives (/sʷ ʃʷ χʷ/), affricates (/ʦʷ ʦʰʷ tɫʷ tɫʰʷ ʤʷ ʧʷ ʧʰʷ/, nasals /nʷ ɲʷ ŋʷ/, and the liquids / rʷ ɫʷ/.
The discussion below focuses on Setswana’s rich series of fricative and affricate consonants. Voiceless fricatives are attested across all major places of articulation (labial, coronal, dorsal), in addition to the laryngeal /h/. Coronal affricates are attested at the alveolar, alveo-palatal and latero-alveolar places of articulation. As we will see in the data, different places of articulation for fricatives and affricates in Setswana may yield different patterns of development.
Syllable Structure
Setswana, just like many other Bantu languages, displays a maximally CV syllable structure. A word can begin either with a single consonant or a single vowel, but must always end with a vowel, and there are no consonant clusters word medially (e.g. mosadi /mʊsadi/ ‘woman’, peo /pe.ʊ/ ‘seed’, aba /a.ba/ ‘give away’; DALL, 1999, p. 31; Otlogetswe, 2017, p. 404). The nasals, the trill, and the lateral are syllabic, and display the same distribution as vowels (Cole, 1955) in word initial position (i.e. mpho [m.pʰɔ] ‘gift; ntate [
Given the relative simplicity of Setswana syllable structure, the children’s data in the area of syllable structure development in the language are relatively unremarkable, also given the early acquisition of syllabic consonants (Tsonope, 1993). While it is possible that extremely early developmental data (e.g. during the transition between the babble and early speech stages) reveal phonological conditioning in the development of Setswana syllables, for example, concerning the syllabic nature of the sonorant consonants, our data do not suggest any particular patterns. On this note, we introduce the background to our study, in the next section.
Background
Phonological Processes in Child Language
Stampe (1973, p. 1) defines child language phonological processes as ‘mental operations that apply in speech to substitute, for a class of sounds or sound sequences presenting a common difficulty to the speech capacity of the individual, an alternative class identical but lacking in the difficult property’. Phonological processes are important because they reveal the types of constraints at play, be they phonological, in relation to various issues in structural complexity, perceptual, from the perspective of the child’s auditory categorization of speech sounds in context, or in relation to issues in oro-motor development (Green et al., 2002; McAllister Byun et al., 2016). As recent research clearly highlights, phonological processes must also be interpreted in the context of individual languages, given that they do not manifest themselves the same across languages or even, at times, between individual learners of the same languages (Rose & Penney, 2022, for a recent summary). It is thus important to document child speech across a maximum number of languages, to properly identify each contributing factor and its relative influence on the data.
In light of these general observations, our understanding of phonological development requires that we interpret the data both phonologically and phonetically, for example in terms of the phonological contrasts relevant to the learner’s target language(s) as well as how these contrasts manifest themselves phonetically. For example, while rhotic consonants (or ‘r’ sounds) are commonly analyzed through the feature [rhotic] within the phonological literature, these consonants employ a wide range of places and manners of articulation across languages (e.g. apical, retroflex or uvular, produced as taps, trills or even fricatives). In turn, these different phonetic attributes may influence how rhotics are acquired by learners of different languages (Bernhardt & Stemberger, 2018; Rose & Penney, 2022). These learners must indeed acquire rhoticity as a phonological category, its phonological distribution as well as how each position manifests itself within the speech stream. Moving from rhotics, which have been discussed within the recent literature, we now turn to the acquisition of fricatives and affricates in Setswana.
Fricatives and Affricates Across Languages and in Setswana
From an articulatory standpoint, fricatives are produced by forcing air through a narrow point of constriction, resulting in turbulent noise within the speech signal throughout the duration of the consonant. As turbulent airflow may involve different speech articulators and configurations (e.g. tongue grooving or spreading), the production of fricatives involves high degrees of articulatory precision (Kent, 1992; Rose et al., 2021).
In comparison, affricates begin with a complete blocking of the airflow released into a fricative constriction at the same point of articulation (Ladefoged & Maddieson, 1996, pp. 90–91; Stevens, 1993). Affricates are thus structurally more complex than fricatives, given their sequenced manners of articulation; the fricative release involves similar levels of articulatory complexity at each point of articulation (Chomsky & Halle, 1968; Ladefoged & Maddieson, 1996; Lombardi, 1991; Rubach, 1994; Sagey, 1986; Stevens, 1993).
From an acoustic standpoint, the frication present in fricatives and affricates generally falls into one of two categories: louder, or strident, versus weaker, or non-strident. The former involves noticeably higher sound energy (Stevens, 1993, p. 251), compared to the latter. At a more formal level, one may assign the feature [+strident] to fricatives articulated within the alveolar and alveo-palatal places of articulation (e.g. [s z ʃ ʒ]) as well as corresponding affricates (LaCharité, 1993 for an early analysis; see also Kim et al., 2015 and references therein). In contrast to this, fricatives articulated across virtually all other places of articulation are assigned the feature [-strident] (e.g. [f, v, θ, ð, x, ɣ, h, ɦ]). As defined by Jakobson et al. (1952): Strident phonemes are primarily characterized by a noise which is due to turbulence at the point of articulation. This strong turbulence, in its turn, is a consequence of a more complex impediment which distinguishes the strident from the corresponding mellow consonants [. . .] (p. 24)
From an auditory standpoint, stridency translates into robust cues to consonantal places of articulation. In contrast to this, non-strident fricatives and affricates display weaker cues to place of articulation. This also holds true of lateral affricates, whose second formant, its main cue to place of articulation, is of generally low amplitude (Ladefoged & Maddieson, 1996, p. 206). Below we show how the respective characteristics of these consonants influence their acquisition.
As predicted by the phonetics of both frication (partial constriction) and affrication (sequencing of constrictions), patterns of acquisition, in the spirit of Stampe (1973), predictably include stopping (complete closure at the point of constriction) and deaffrication to a single manner, either stop or fricative. These general predictions are borne out of the data, as we summarize in the next paragraphs.
Across languages, fricatives and affricates have been reported among later-developing sound categories, as they tend to be mastered after stops, nasals, and glides (Ferguson, 1978; Jakobson, 1941; Smit et al., 1990; Stoel-Gammon, 1985). Dodd et al. (2003) report the late acquisition of fricatives going beyond the age of 6 years (e.g. age 6;11). Ferguson (1978), Stoel-Gammon (1985), Dyson (1988) and Shriberg and Kwiatkowski (1994) observe the following order of acquisition for fricatives for English-speaking children: the earlier acquisition of |f s ʃ|, followed by |v z| and, later, by |θ ð ʒ|. 4 Affricates, on the other hand are reported to emerge later than the fricatives in this general order: |ʧ ʤ| are acquired later than the fricatives |f s|. However, these affricates can be acquired alongside |ʃ| (Ingram, 1978; Shriberg & Kwiatkowski, 1994). The same general order was reported for Putonghua (Mandarin Chinese) by Hua and Dodd (2000). The production of these consonants can vary significantly between learners, including both typically developing children (Ferguson & Farwell, 1975; Smith, 1973) as well as children with speech sound disorders (Ingram et al., 1980; Shriberg, 1993; see also Bernhardt et al., 2015). Finally, fricatives and affricates may display different orders of acquisition across languages. For example, Cook (2006) reports that children learning Chipeywan (Dëne Sųłiné) acquire affricates and stops before fricatives. Further, Smith (1973) reports substitutions of both affricates and fricatives by stops in his longitudinal study of English acquisition; Smit (1993) reports similar patterns in her general survey of English acquisition.
Building on such observations, stopping appears to be the most common pattern affecting the acquisition of fricatives and affricates across languages (McLeod, 2007; Smith, 1973; Watts, 2018). Concerning place of articulation, depalatalization of alveo-palatal or palatal fricatives and affricates to their alveolar counterparts is frequently observed, with the opposite substitution robustly attested as well (Hodson & Paden, 1981; Hua & Dodd, 2000; McLeod, 2007). Finally, the literature reveals a general pattern of fricative debuccalization to [h] across languages, which appears to be much more prominent in younger than in older learners (Levelt, 1994; Smit, 1993). These general results are borne out of studies on the acquisition of African languages as well. This includes the acquisition of Akan (Amoako, 2020), isiXhosa (Lewis, 1994; Mowrer & Burger, 1991; Tuomi et al., 2001), isiZulu (Naidoo et al., 2005), Sesotho (Demuth, 1992, 2007), Swahili (Gangji, 2012) and Setswana (Mahura, 2014; Mahura & Pascoe, 2016; Matlhaku, 2023).
Concerning the general pattern of variation affecting the acquisition of precise coronal places of articulation, we follow the general literature on acquisition suggesting that during early stages of acquisition children have not yet acquired the fine distinctions between different coronal articulations. This observation can be captured in terms of lack of feature or feature specification within phonological representations (Fikkert & Levelt, 2008; Levelt, 1994; Levelt & van Oostendorp, 2007), also in relation to models of phonological underspecification (Lahiri & Reetz, 2010).
In sum, the general patterns affecting the general manner of articulation of target fricatives and affricates can be analyzed straightforward both phonologically and phonetically. However, as we argue below, additional, and at times more subtle, patterns exist in the data which reveal additional ways in which child learners may interpret the speech data of their target language. We introduce these patterns next, which we discuss in light of recent proposals on phonological emergence.
Factors Influencing the Emergence of Phonological Processes
According to McAllister Byun et al. (2016) and Rose et al. (2021), phonological explanations alone, which focus on phonological features and their development within the learners’ systems, cannot capture in sufficient detail the range of facts observed in the acquisition data. Following Priestly (1977), they note that many of the phonological processes observed in child language, can hardly, if at all, be encoded within formal models of phonology. They claim that these challenging patterns may in fact arise from pressures that lie outside the realm of phonology, including the general factors we overview in the next paragraphs.
One general factor concerns the physiological configuration of the vocal tract, in particular the differences in vocal tract sizes and configurations that exist between young children and adults. For example, a young child’s vocal tract is smaller, with the tongue disproportionately large and forward positioned, as it almost completely fills the entire length of the vocal cavity (Kent, 1992). This configuration in turn significantly affects the child’s lingual manoeuvrability and, consequently, hinders accurate production of linguo-palatal sounds (Kent, 1992; Vorperian et al., 2005).
Another general factor relates to the fact that children are composing with motor control limitations which may also hinder the production of certain sounds (Inkelas & Rose, 2003; Rose at al., 2021). Kent (1992) broadly differentiates between two types of constrictions, ballistic vs. controlled, as follows: Ballistic sounds typically involve movements of short duration, high velocity with rapid acceleration and deceleration of the speech articulators (Kent, 1992, p. 85). These include oral stops and nasals. In contrast to this, controlled sounds require more refined levels of articulatory precision and timing, for example the fine constriction required to create airflow turbulence, or the precise manipulation of tongue body gestures involved in the production of rhotics and laterals (Green et al., 2002; Kent, 1992). Together, physiological and articulatory constraints on speech production ‘contribute to the emergence of phonological processes in child language’ (Rose et al., 2021, p. 579).
A third factor relates to the acquisition of the various components of the human speech chain more generally, given that children must first be able to accurately perceive the speech sounds of their languages before they can learn to reproduce them in their own speech (Curtin et al., 2017; Rose & Penney, 2022). In a nutshell, auditory issues may result in the child misinterpreting the speech signal of the sound at hand, especially during the early period of development, resulting in either an incomplete (ill-defined) or erroneous speech sound or sound combination to reproduce. If a sound is not perceived correctly (or well enough), it logically cannot be reproduced accurately. Rose and Penney (2022) take as an example the acquisition of the uvular fricative rhotic |ʀ| in four languages (Dutch, German, French, Portuguese), whose articulation involves subtle constrictions of the velopharyngeal area affecting the aerodynamic control of airflow making its way through these constrictions (Ohala, 1983). They found that |ʀ| was predominantly substituted by [h] in German and, to a lesser extent, in Dutch, while it was deleted for children learning French and Portuguese. Rose and Penney link these observations to language-specific differences in aspiration between German and Dutch, even if both of these languages display an /h/ in their respective phonemic inventories, while French and Portuguese display neither aspiration nor a laryngeal fricative in their phonological inventories.
Considerations such as these are inherent to emergentist models of phonology such as the Linked-Attractor model (Menn et al., 2009, 2013) and the A-map model (McAllister Byun et al., 2016). Similar views form the basis of computational models of speech acquisition (Guenther, 1994; Guenther et al., 2006; see also Lin & Mielke, 2008). For example, within the A-map, learning consists of attaining phonological representations that successfully relate the speaker’s internal knowledge of auditory inputs (present in the ambient language) and the motor plans required to reproduce these inputs through their own speech articulations.
In the next section, we introduce the data relevant to the current discussion. We begin with a general description of our study. We then turn our focus on the more subtle patterns we have been alluding to across our different discussions above.
Current Study
Our study follows the general tradition of naturalistic, longitudinal research on child language acquisition. In the next subsections we overview how we recorded, processed, and analyzed speech data from three young children learning Setswana as their first language.
Participants
Our study documents three typically developing monolingual learners of Setswana code-named W, T, and B, respectively. We also use WTB to refer to this group of learners as a whole. WTB were learning the SeKwena dialect of Setswana, which is predominantly spoken in villages located in the southeastern part of Botswana. W is from Molepolole; T and B are from Mankgodi. The children were selected on the basis of their families’ use of Setswana in everyday communication as well as the caregivers’ willingness to participate in the study. Table 2 provides an overview of these participants.
Participants to the Study.
WTB were recorded during a period of approximately 4 months, between W’s ages of 1;10.18 and 2;02.02; T’s ages of 2;05.03 and 2;08.15; and B’s ages of 03;02.22 and 03;06.05. We also highlight that W and B are boys and T is a girl. W has an older sibling and both are living within an extended family setup. B also has an older sibling and the children live with their mother. T is the first born child with a younger sibling and, like W, lives within an extended family setup. Given the small size of our study and the children’s similar socio-economic backgrounds, we did not take into consideration individual factors such as biological sex or socio-economic status (SES) which have been reported to play an important role in children’s acquisition (Winitz, 1969; Wells, 1985, 1986). However, we note that while T, the girl, was not the older child in our study, she generally displayed more accurate productions than the two boys. This sex-based difference was also observed by Mahura (2014) (see also Dodd et al., 2003; MacCobby & Jacklin, 1974; McCormack & Knighton, 1996; Smit et al., 1990; So & Dodd, 1995). In comparison, W’s productions were the most variable of the three children, as we expected given him being both a boy and the youngest participant in our study.
Audio Recording
We audio recorded each of the children in their own homes. The recordings were done by a native speaker of Setswana with advanced training in linguistics, also in the presence of a parent or caregiver. The recordings took place at regular intervals, once or twice a week, for a period of approximately 4 months. None of the children attended pre-school prior to the recordings or were exposed in any meaningful way to a secondary language such as English. Prior to carrying out the recordings, we obtained from the caregivers their informed written consent concerning both the data recording procedures and the later use of the children’s audio-recorded speech data for research. The caregivers also completed a questionnaire consisting of a list of common lexical items such as nouns (i.e. depicting domestic animals, modes of transportation, nature, food items, utensils), verbs, adjectives, and question words to ensure that all children had the expected knowledge of basic Setswana words. We further verified with the caregivers that all the children were developmentally and socially healthy and that they also had no vision or hearing problems.
We used picture books to elicit words that together maximally cover the sounds of Setswana across all CV positions (initial, medial and final) within which they can appear. Elicitation involved making use of single-word naming and image description, which started with the children being asked to informally and spontaneously name objects or actions they identified in the picture books; we did not use any fixed word lists or structured speech elicitation protocols. However, prompting questions were offered to the child if they did not immediately name the object or action they saw in the picture, for example, golo mo ke eng? ‘what is this?’; ke eng selo se? ‘what is this thing?’; o dira eng? ‘what is he/she doing?’; se dira eng? ‘what is it doing?’; or ba/di 5 dira eng? ‘what are they doing?’ Besides our use of prompts, the children were generally in control of the trajectory of the conversation, during which they were positively encouraged to spontaneously produce their own speech. As part of these interactions, the adult interviewer focused on repeating the child’s productions using the adult form. This provided the child with a stimulating environment for speech production and language learning, and also served the subsequent identification of the speech forms attempted by the child. However, a consequence of this approach is that it could not guarantee that all children produced all the sounds in all of the recordings. This limitation had no measureable impact on the results we present below.
Data Transcription and Analysis
The children’s speech productions were orthographically and phonetically transcribed and analyzed by a native speaker and trained linguist using the Phon software program (Rose et al., 2006, 2013; Rose & MacWhinney, 2014). Phon enables us to make one-on-one comparisons between attempted (target) and produced (actual) sounds and sound combinations. Using the analytic and query functions of Phon, we then extracted the data relevant to our study, in particular the patterns of phonological production that manifest themselves in the data such as consonant substitution or deletion across all syllable positions within which they appear in the children’s productions.
Note that we combined the results for target alveolar |s| and post-alveolar |ʃ| within a single category below for the following two reasons: (a) they displayed similar patterns of behaviours, (b) |ʃ| had significantly fewer attempts (n = 55) relative to alveolar |s| (n = 1,539), 43 of which were attempted by T and 11 by B. T mostly affricated |ʃ| to [ʧʰ] (in 65% of her attempts) and |s| to [ʦ] and [ʦʰ] (66% and 11%, respectively). B displayed coronal stopping for both |s| and |ʃ|, while W’s few attempts yielded variable outcomes. We return to these results in the next section, where we describe the trends observed in the children’s acquisition of Setswana’s fricatives and affricates.
Results: WTB’s Acquisition of Fricatives and Affricates
In this section, we present the children’s patterns of acquisition for the target fricatives and affricates of Setswana. The data descriptions below combine target consonants across initial, medial and final syllable onsets, as we have not found differences in behaviours across these positions.
Fricatives
Our corpus documents 4,630 attempts at fricatives, including 532 labiodental |f|, 1,594 coronal |s/ʃ|, 2,437 uvular |χ|, and 67 laryngeal |h|. 6 This results in a category of fricatives for each major place of articulation (Labial, Coronal, Dorsal, and Laryngeal). We present in Table 3 the general rates of substitution affecting each category of fricatives.
WTB’s Combined Attempts at Fricatives.
As we can see in this table, the coronals |s/ʃ| display an outsized proportion of substitutions compared to the other categories, with an accuracy rate of less than 12%, while |f, χ, h| range in accuracy from 53% to 81%. T made the most attempts at all the fricatives. W and B made a similar number of attempts. Table 4 presents the individual differences in the children’s attempts at each fricative.
Overview of WTB’s Target Fricatives |s/ʃ, f, χ, h|.
Of all the fricatives, target |h| recorded the lowest number of attempts; because of these low numbers, and as the development of |h| is tangential to the current discussion, we will not discuss it further. We identified six patterns for target fricatives. We classify these patterns by place features and manner features, in particular, in the case of affrication. Patterns involving place and manner substitutions include debuccalization and gliding. The category ‘other’ is used to group together all substitutions that failed to form a distinguishable pattern. We now turn to a more detailed description of each of these patterns, starting with |s/ʃ| in the next subsection.
Substitution Patterns for Coronal |s/ʃ|
As we can see in Table 5, the children display more or less individual patterns of variation in their productions for |s/ʃ|. These target fricatives yield relatively stable substitutions for T and B; T predominantly substituted |s/ʃ| by the affricates [ʦ ʦʰ]. In contrast to T, B displayed a combination of stopping to coronals [t/d] and affrication [ʦ]. We note in this context that the majority of B’s stop productions appear to be the result of consonant harmony 7 with another coronal consonant present within the word. Finally, W, the youngest of the three children, displayed much more variable patterns, with productions ranging between affrication, stopping, gliding, and debuccalization to laryngeal consonants throughout the recording period.
WTB’s Production Patterns for Target |s/ʃ|.
The ‘other’ productions for T’s and B’s |s/ʃ| all involve substitutions to [r/l].
We also note that, proportional to each child’s level of productivity, most of the marginal cases of debuccalization we find in the data for |s/ʃ| come from W, the youngest and least advanced learner. This developmental observation offers additional context for our description of |f| and |χ| below, characterized by higher rates of debuccalization in all three children’s productions.
Substitution Patterns for Non-Coronal |f| and |χ|
As we can see in Tables 6 and 7, the non-coronal consonants |f| and |χ| were produced with lower rates of substitutions, especially by T, our most proficient learner. We identified six different patterns in these data, involving different manner and place dimensions; in particular, we highlight the high rate of debuccalization to [h ʔ], especially for |f|.
WTB’s Production Patterns for Target |f|.
WTB’s Production Patterns for Target |χ|.
The category ‘other’ for W consists of substitutions to [l]; T’s consists of substitutions to [f]/[l]/[r]; B’s consists of a vast majority of substitutions to [l].
We also observe a more marginal tendency for this fricative to be stopped to [p b], while the uvular |χ| displays a wide range of variation. W and T mainly stopped this consonant to uvular [qʰ] and velar [k], while B displayed substitutions of this consonant by coronal and labial stops.
In sum, the children displayed much less accuracy in their realization of places of articulation for the non-coronal (and non-strident) fricatives compared to the coronal (strident) ones. Recall from above the observation that coronal fricatives, the strident targets, tend to maintain their general places of articulation even when they are affected by phonological processes. Before we discuss our interpretation of these facts, we turn to our second segmental context, this time defined by the series of affricate consonants that Setswana presents.
Affricates
Our corpus documents 4,701 attempts at affricates, namely 3,298 non-lateral |ʦ ʦʰ ʤ ʧ ʧʰ| and 1,403 lateral |tɬ tɫʰ|). 8 By far the most common process affecting the children’s productions of these consonants is deaffrication. Table 8 breaks down the deaffrication rate of each affricate for each child. In the speech of both boys (W, B), lateral affricates were much more prominently affected by deaffrication than non-lateral ones, irrespective of the place of articulation of the resulting consonant. In comparison, T continued to display higher levels of achievement, with much lower rates of deaffrication overall.
WTB’s Rates of Deaffrication for Non-Lateral and Lateral Affricates.
Substitution Patterns for Non-Lateral |ʦ ʦʰ ʧʰ ʤ|
We begin in Table 9 with a summary of each child’s accuracy rates, for each target affricate.
Overview of WTB’s Target Affricates |ʦ ʦʰ ʧʰ ʤ|.
The following four tables break down the patterns, for each child, for target |ʦ ʦʰ ʧʰ ʤ|, respectively. Starting with |ʦ ʦʰ|, in Tables 10 and 11, we observe nine different patterns of substitution, which we categorized in terms of place and manner of articulation, similar to our classification of fricative productions in the previous section.
WTB’s Production Patterns for Target |ʦ|.
WTB’s Production Patterns for Target |ʦʰ|.
For target |ʧʰ| in Table 12 below, we also observed nine different patterns of substitution, categorized mainly to different place and manner.
WTB’s Production Patterns for Target |ʧʰ|.
Finally, we report on target |ʤ| in Table 13. As we can see, this voiced affricate yielded more variable patterns, also with higher rates of deletion overall.
WTB’s Production Patterns for Target |ʤ|.
Including to non-palatal |ʦ ʦʰ| and palatal |ʧ ʧʰ| substitutions.
W’s and T’s gliding are exclusively to [j] while B glides to [w].
Despite the relative variability observed in the data, the majority of which involve manner substitutions, we highlight that these affricates were generally produced with their target coronal place of articulation. 9 Further, we note the absence of any pattern of velar substitution. As we describe next, lateral affricates present a much different picture in both these respects.
Substitution Patterns Affecting the Lateral Affricates |t͡ɫ t͡ɫʰ|
As can be seen in Table 14, the lateral affricates |t͡ɫ| and |t͡ɫʰ| also displayed noticeable rates of substitution, for each child, in line with our general predictions about affricate development.
Overview of WTB’s Target Affricates |t͡ɫ t͡ɫʰ|.
In addition to the high rates of substitution summarized in Table 14, the data in Table 15 reveal a more puzzling pattern of velar substitution for the target affricates, especially prominent in W’s productions (e.g. tlaya |t͡ɫaːja| → [kàːja] ‘come’; tlola |t͡ɫʊːla| → [kʊːla] ‘jump’; |t͡ɫaːla| → [káːja] ‘hunger’). Tables 15 and 16 (i.e. target |t͡ɫ| and |t͡ɫʰ|, respectively) further break down the substitution patterns and rates for each child.
WTB’s Production Patterns for Target |t͡ɫ|.
WTB’s Production Patterns for Target |t͡ɫʰ|.
Similar asymmetries can be observed in the data for target |t͡ɫʰ|, as we can see in Table 16:
During the course of our observation period, we could also witness W’s productions evolving from velar to coronal stopping during the course of the observation period. We can also see instances of substitutions to the non-lateral affricates for the lateral affricates, but the children never displayed lateral substitutions for the non-lateral counterparts.
Interim Summary
In sum, our general findings about WTB’s acquisition of the fricatives and affricates of Setswana fall generally in line with general expectations based on the cross-linguistic literature. However, a few trends in the data transcend these expectations, in particular (a) the relative persistence of debuccalization as a process affecting non-coronal (and non-strident) fricatives as well as (b) the unexpected pattern of velar substitution affecting lateral affricates. Arguably, both of these processes are representative of early stages in phonological development. As reported above, fricative debuccalization is more prominent in the speech of younger child speakers cross-linguistically. Similarly, velar substitution for lateral affricates (also a non-strident category of consonants) manifests itself much more prominently in the early productions of W, also our younger child participant. We discuss the potential origins of these production patterns in the next session.
Phonetic Factors Affecting the Acquisition of Fricative and Affricate Places of Articulation
In this section, we address our developmental observations from the perspective of segmental emergence. We first highlight some of the difficulties involved in the analysis of these patterns using formal models of phonology. We then consider these patterns in light of the phonetic properties of the target system and how these properties may ultimately be interpreted by the child learners. We frame this discussion using the A-map model of segmental development (McAllister Byun et al., 2016; Rose et al., 2021).
Factors Affecting the Emergence of Fricatives
In section ‘Results’, we observed in WTB’s data an asymmetry whereby debuccalization as a substitution process affecting target fricatives is much more prominently observed with non-coronal, non-strident fricatives than with coronal, strident ones.
To our knowledge, no model of phonology can encode this asymmetry in a straightforward fashion. First, we are aware of no language where fricative debuccalization is a necessary function of coronality, except perhaps in the case of lenition processes related to specific syllable positions (e.g. /s/ debuccalization or deletion in syllable codas across different dialects of Spanish, Harris, 1969). To distinguish between categories of consonants more or less prone to debuccalization, an option would be through [±strident] distinctions, given that labial and velar fricatives cannot be easily combined within a single place category. However, even if a formal distinction can be established based on this feature, any rule of debuccalization affecting [-strident] sounds would merely consist of a formal restatement of the phenomenon observed. Further, a rule should predict categorical behaviours; this makes it challenging to capture optionality within the data; it would also need to be associated to relatively early stages of phonological development. The same criticisms apply to constraint-based models of phonology unless one were to ground the formalism in speech phonetics (Archangeli & Pulleyblank, 2022 for a recent argument), also in relation to the acquisition of phonological knowledge.
Building on these observations and related criticisms, we take as a starting point the observation that stridency (or absence thereof) played a role in the children’s acquisition of each type of fricative consonants, as follows: The speech cues to the place of articulation of coronal fricatives were carried through strident frication, enhancing the learner’s ability to identify the coronal fricatives within the signal, which were then reliably produced as coronals as soon as the children had acquired the ability to reproduce coronality through their own speech. While frication (and affrication) remained a challenge, yielding generalized patterns of stopping, the children displayed no problem in realizing these stopped consonants within their general place of articulation.
The only exception to this general scenario within the data on coronal fricatives concerns the noticeable pattern of debuccalization presented by W, our youngest learner, whose productions were highly variable. For example, throughout the observation period, W produced setilo |sɪtilɔ| ‘chair’ as [hɪtiwɔ]/ [ʔɪtiwɔ]; sekuta |sɪkuta| ‘motor-bike as [ʔututa] and the possessive pronoun saaka |saːka| ‘mine’ as [ʔaːka]. These facts are also compatible with the general framework of phonetically grounded phonological emergence, as they represent a state where the child clearly perceives the presence of a fricative sound but has yet to acquire the place of articulation for this consonant.
In contrast to this, the place of articulation of non-coronal, non-strident fricatives, may at times be hindered or, minimally, not perceived as accurately by the learner, given that cues to place of articulation are weaker and more variable when carried through non-strident frication. Under an emergentist approach, this predicts the same general developmental trajectory as with strident fricatives, however, with slower rates of development as well as more variability in the data leading to the mastery stage. To the overall weakness of the acoustic signal, we also add the fact that non-strident fricatives involve a contrast between two different places of articulation (labial, velar), a factor potentially adding to the challenge of acquiring these fricatives. Not only do the learners of these contrasts have to cope with weak auditory cues, they also must identify and replicate contrasting places of articulation based on these cues.
This explanation captures the characteristics of early stages of development for non-strident fricatives concerning their slower rates of acquisition, the substitution patterns affecting their early productions and the overall variability observed throughout the acquisition period. In contrast to this, as stated above, it would be extremely challenging to capture these data and the variability within it using a formal, rule- or constraint-based model of phonology. Any such analysis would also have to contend with explaining the origins of these data in the first place, which the current analysis captures in a straightforward way. It was based on the properties of speech to which the children are exposed. We expand on this discussion based on WTB’s substitution patterns for affricates, in the next section.
Factors Affecting the Emergence of Affricates
The general pattern of deaffrication displayed by WTB reflects the inherent complexity of affrication. Substitution by deaffrication can thus be analyzed, in general terms, as the child’s reduction of both the structural complexity of the consonant and of its related phonetic attributes (see also Demuth, 2007; Mowrer & Burger, 1991; Tuomi et al., 2001).
As we highlighted above, more intriguing in the data is the place asymmetry we observe between non-lateral and lateral affricates in the children’s substitutions. While non-lateral affricates are overwhelmingly reduced to coronal consonants, the lateral affricates display more variable behaviours, including unexpected substitutions to the velar place of articulation.
Building on the logic developed above in the context of fricatives, we contend that formal modelling of these observations is possible, for example, through a combination of [±strident] and [±lateral] features, whether encoded in terms of phonological rules or constraint-based analyses. However, both the developmental characteristics of the phenomenon and its variability in children’s speech makes it extremely challenging to capture within formal models. Likewise, this modelling would hardly provide an answer as to the origins of the phenomenon. Similar to fricative development above, we argue that an approach grounded in speech phonetics offers better insight into the phenomenon.
Starting with stridency, we first note that, in comparison to the non-lateral affricates, the lateral affricates tend to display relatively deprived cues to their places of articulation, through a second formant (F2) of generally low amplitude (Ladefoged & Maddieson, 1996, p. 206). We also note that coronal + lateral sequences of speech articulations generally display a blurry contrast with velar + lateral sequences (Davidson & Shaw, 2012; Hallé & Best, 2007; Hallé et al., 1998). For example, Hallé et al. (1998), Hallé and Best (2007), and Pitt (1998) show that /tl/ and /dl/ sequences, which are unattested in word-initial positions in languages such as French or English, generally tend to be perceived by speakers of these languages as /kl/ and /gl/, respectively. Note that the speakers’ misperceptions must relate at least in part to the fact that French and English listeners are phonotactically biased against /tl/ and /dl/ clusters, given the absence of such sequences in their languages. However, these speakers never associate individual alveolar stops or laterals to velar segments. The perception of /tl/ and /dl/ sequences as involving a velar place of articulation is thus specific to the coronal stop + lateral phonetic sequence.
We argue that any analysis of the optional velar substitutions observed in the data must incorporate these phonetic facts, given that they provide an answer about the origins of the perception of coronal stop + lateral phonetic sequences as involving a velar articulation. While the learners of Setswana are evidently exposed to speech phonotactics different from those of English or French, in particular to the presence of genuine lateral affricates in the language, these learners nonetheless face the challenge to auditorily interpret the lateral affricates as coronal, given also the presence of velar consonants within the language. In light of the relative confusability of coronal stop + lateral phonetic sequences, it is thus not surprising that at least a portion of the target lateral affricates present in the input were misinterpreted by the children as involving a velar place of articulation. As stated by Davidson and Shaw (2012), the phonotactics of a language are not the only triggers of perceptual illusion; other triggers which are not language specific, such as acoustic similarity, may also yield perceptual confusion. For example, fricative-initial sequences may lead to prothesis illusions; stop-nasal sequence may lead to the illusion that the initial consonant is either not present in the string or is present in some modified form, while stop-stop sequences may lead to vowel epenthesis. Building on the research above and on our own observations from the acquisition of Setswana lateral affricates, we add to this list alveolar stop + lateral sequences as potential triggers of velar place perception. This hypothesis also dovetails with the typological observation reported above that /tl/ and /dl/ sequences, either as consonant clusters or, in the case of Setswana, as lateral affricates, are hardly attested in the phonological inventories of the world’s languages (Hallé et al., 1998; Maddieson, 2005). While universal constraints on the perception of given phonetic sequences can be incorporated into constraint-based models of phonology, such analyses are also typically grounded in speech phonetics. We leave the development of such formalism for future research.
Conclusion
In summary, we highlighted the general trends observed in the speech of three first-language learners of Setswana, most of which find parallels across languages involving similar sound categories. We then focused on asymmetries in the data which posing challenges to formal models of phonology. We analyzed these patterns under the lens of speech phonetics, which can capture both the a priori unexpected patterns observed in the data as well as the variability that these patterns display within our data.
This research thus highlights that while phonological systems encode systematic phone distribution and patterns within languages, their acquisition must be understood at least in part from the perspective of the ways these systems present themselves to the learners, through the phonetics of the ambient language. This forms the basis of recent emergentist models of phonological acquisition, as opposed to any attempt at describing child phonological behaviours as fully completely similar to that of adult speakers of their target language.
In regard to potential formalisms to encode these phenomena, we highlight that our data descriptions embrace both the main patterns as well as the variation present in the data. Any formal analysis of these phenomena would thus require similar contextualization of the variable data, a topic which transcends the goal of the current paper. It is however, our hope that the type of understanding that stems from our discussion above will offer useful steps toward this goal; we also hope that this understanding will offer insight for speech clinicians and educators who may encounter these phenomena in the speech of Setswana-learning children, or that of any other typologically similar language.
Footnotes
Author Contributions
Keneilwe Matlhaku: Conceptualization; Formal analysis; Investigation; Methodology; Software; Validation; Writing – original draft; Writing – review & editing. Yvan Rose: Conceptualization; Software; Validation; Writing – review & editing.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
