Sage Journals: Discover world-class research

Abstract

Prenatal language exposure has been shown to influence early speech perception, with evidence suggesting that the acquisition of certain aspects of native-language prosody may begin before birth. While this notion has been well-supported from a perceptual perspective, findings across available production studies are inconsistent. This study, therefore, explored whether the pitch of newborns’ early vocalisations is influenced by prenatal exposure to their native language via a systematic review of quantitative evidence. The review included 18 studies analysing the pitch patterns of cry and non-cry vocalisations in healthy newborns aged birth to 3 months with monolingual prenatal exposure to 11 languages. It systematically describes the study characteristics and outcomes of these studies and synthesises the converging and diverging trends across them through cross-linguistic comparisons. Within this sample, the available evidence for cross-linguistic differences in the pitch patterns of early vocalisations is sparse and inconsistent. Importantly, substantial variations in the analysis methods used across studies were noted, which further obscure cross-studycomparisons. Furthermore, studies were restricted in number as well as the diversity of languages and vocalisation types represented. Taken together, our review suggests that there is at best tentative evidence for an influence of prenatal language exposure on newborns’ pitch production. Based on this review, we offer recommendations to improve the replicability and representativeness of future research, with the aim of arriving at a more definitive answer to the research question.

Keywords

prenatal language exposure newborn cry and non-cry vocalisations prosodic development pitch systematic review cross-linguistic

Introduction

The melodic and rhythmic properties of spoken language are increasingly acknowledged as the starting point of language acquisition (for review, see Gervain et al., 2021). From the start of the third trimester, the foetus’s auditory system is comparable to that of a newborn in structure and functional connectivity (for review, see Ghio et al., 2021). In addition to intrauterine sounds such as the maternal heartbeat and respiration, the foetus also perceives low-pass filtered speech signals, which reflect native-language prosody. This opportunity to become accustomed to the prosody of their native language in utero empowers the theory that language acquisition may have prenatal origins. The first weeks of life provide an invaluable window during which the influence of prenatal language exposure on early speech and language development can be investigated¹. This neonatal window period has been well-studied concerning early speech perception skills. Both neonates’ perceptual preference for prosodic information exposed to prenatally and their ability to discriminate native-language prosody from the prosodic patterns of non-native languages have been demonstrated by numerous studies (for reviews, see Gervain, 2015, 2018; Nallet & Gervain, 2021; Ortiz Barajas & Gervain, 2021). The present study concerns the role of prenatal language exposure in early speech production skills.

Early speech production skills, such as newborns producing a variety of vocalisation types in both the presence and the absence of communicative partners, are acknowledged as an important first building block for social communication and speech-language development (Long et al., 2022; Pisanski et al., 2022). Newborns’ vocal repertoire includes signals like cries and laughs, which typically serve fixed communicative functions, and speech-like sounds known as protophones, which are used flexibly (Oller et al., 2021). Protophones are produced most frequently (Oller et al., 2021) and are therefore essential to study alongside other vocalisation types such as cries, which have traditionally been the primary focus in research on newborn vocalisations. The prosodic features of preverbal vocalisations are of special significance, as infants gradually develop the ability to intentionally modulate the prosodic patterns of their vocalisations to encode specific meanings (prosodic form-meaning mappings). For example, Catalan-babbling infants aged 7 to 11 months were found to produce vocalisations directed at caregivers with shorter durations and a wider pitch range compared to those not directed at caregivers (Esteve-Gibert & Prieto, 2013). Additionally, these infants adjusted the prosodic patterns of their vocalisations depending on their communicative intent. Requests or expressions of discontent featured wider pitch ranges and longer durations, while responses to caregiver speech had narrower pitch ranges and shorter durations. This development continues through the babbling, first-word, and subsequent developmental stages until native-like prosodic competence is reached to successfully fulfil various communicative functions in a language-specific manner (Wermke, 2015). The prosodic features of early vocalisations, including not only cries but also protophones, may thus serve a similar function to newborns as spoken language to older populations, that is, signalling meaning(s) to their caregivers and surroundings in an effective manner. The entire repertoire of early vocalisations may therefore be considered a key foundation for the development of prosodic form-meaning mappings in the adult model of native prosody.

From early on, newborns are not only tasked with perceiving but also learning to produce the prosodic patterns characteristic of their native language—patterns that can differ substantially across languages with differing prosodic typologies. Pitch variation, in particular, serves multiple linguistic functions, and there are marked cross-linguistic differences in how it is used for lexical, structural, and communicative purposes (Gussenhoven & Chen, 2021). While it is not yet clear whether foetuses can perceive the full spectrum of pitch-related cues in utero, it is plausible that the prenatal sound environment differs depending on the prosodic characteristics of the ambient language. For instance, foetuses may be exposed to pitch variation at the phrasal-level that marks phrase boundaries or conveys communicative meanings such as illocutionary force or information status. In tonal and lexical pitch-accent languages, they will also encounter pitch variation that carries lexical meaning at the word-level. Moreover, even when serving similar functions, the nature of pitch variation can differ considerably between languages. To understand whether such prenatal exposure shapes the pitch features of newborns’ early vocalisations, it is necessary to critically review existing findings and consider the varied answers that have been proposed across a broad range of data.

The question of whether early vocalisations exhibit language-specific pitch characteristics has been explored across infant populations of varying ages. DePaolis et al. (2008) compared disyllabic vocalisations in American English, Finnish, French, and Swedish-learning infants aged 10 to 18 months, and found that these productions were only beginning to show language-specific patterns of pitch use. They concluded that there is limited evidence for controlled pitch use during the preverbal period, suggesting that the finely-tuned use of prosody may require a level of linguistic sensitivity that emerges only once word production is well established. In contrast, Whalen et al. (1991) identified cross-linguistic differences in a younger age range (5–11 months), reporting that French-learning infants produced a 1:1 ratio of falling to rising pitch contours, whereas English-learning infants produced a 3:1 ratio. Further still, Mampe et al. (2009) reported cross-linguistic differences in the cries of German- and French-learning newborns as early as the first week of life. Specifically, French-learning infants predominantly produced cries with rising pitch contours, while German-learning infants produced cries with falling pitch contours. These findings provide the first evidence for an influence of prenatal language exposure on the prosody of newborn vocalisation, aligning with evidence on foetal hearing and newborns’ speech perception.

However, Mampe et al.’s (2009) findings have since been challenged on methodological grounds. Specifically, Gustafson et al. (2017) pointed out that Mampe et al.’s (2009) statistical analysis treated individual cries as independent observations rather than nesting them within participants, inflating the likelihood of uncovering group-level differences that may not exist. Since then, a growing body of research has further explored the role of pitch in early infant vocalisations by comparing the pitch characteristics of infants aged 0 to 3 months exposed to non-tonal versus tonal or lexical pitch-accented languages (e.g., Prochnow et al., 2019; Wermke et al., 2013, 2016, 2017). Additionally, several studies, including some predating Mampe et al. (2009), have investigated pitch use in newborn vocalisations within single-language contexts (e.g., Baeck & de Souza, 2007; Lind & Wermke, 2002; Wermke et al., 2007). In light of this broader research landscape, the current study set out to systematically review the available literature to assess whether the use of pitch in newborns’ early vocalisations is influenced by prenatal exposure to the prosody of their native language by means of a systematic review of all relevant studies on newborns’ cry and non-cry vocalisations. This review synthesises findings across studies on newborns’ cry and non-cry vocalisations by first describing their methodological and analytical features and then identifying the converging and diverging patterns through cross-linguistic comparisons.

To the best of our knowledge, no other reviews with a comparable research question have been conducted. To establish this before starting the current systematic review, the broad query (infant* AND vocalisation* AND acoustic*) was used to search JBI Evidence Synthesis, Open Science Framework Registries, Cochrane Database of Systematic Reviews, Prospero, and Google Scholar. Three completed reviews focused on the broad topic of infant cries were found (Gabrieli et al., 2019; Lingle et al., 2012; Vermillet et al., 2022). However, the current systematic review differs significantly from these reviews in its goal of making cross-linguistic comparisons of the prosodic patterns of early vocalisations and its coverage of both cries and non-cry sounds.

The need to answer our research question is underlined by both theoretical and clinical rationales. In terms of theory, it can further our understanding of the mechanisms driving the earliest phases of language acquisition. More specifically, it can improve our understanding of how infants learn to produce the prosodic patterns of their native language. In terms of clinical practice, this review can highlight the importance of language input during the prenatal period to practitioners working in prenatal care, as it has the potential to demonstrate that speech learning already begins in the prenatal period. Moreover, since preverbal vocalisations depend on a well-coordinated laryngeal-respiratory system, their prosodic patterns are being explored and used as a non-invasive, cost-effective tool for screening and diagnosing a range of clinical conditions, including anatomical and physiological abnormalities, Autism Spectrum Disorder, Sudden Infant Death Syndrome, and auditory-related difficulties (Gabrieli et al., 2019). For example, Manigault et al. (2023) highlighted the use of acoustic cry characteristics as a biomarker for the early identification of risks for long-term developmental and behavioural deficits in preterm infants. Furthermore, a Newborn Cry Diagnostic System has been developed to use newborn cries to distinguish healthy newborns from newborns with a wide range of pathologies regardless of their race, gender, or the reason for crying behaviour (Khalilzad & Tadj, 2023). Establishing whether the prosody of newborns’ vocalisations is influenced by prenatal exposure to their native language will shed light on whether language background should be a further consideration in developing such clinical tools or if differences in early vocalisations between newborns with different language backgrounds are negligible for clinical purposes.

Methodology

Protocol and Registration

The present systematic review of quantitative evidence was conducted based on a pre-registered review protocol available on Open Science Framework [https://doi.org/10.17605/OSF.IO/XKWYR], and is structured according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines (Tricco et al., 2018; Tufanaru et al., 2017).

Eligibility Criteria

The review focused on production studies of a quantitative nature that analysed the pitch patterns of the vocalisations of healthy newborns aged 0 to 3 months monolingually exposed to any given language. The detailed study-related inclusion criteria are summarised in Table 1 using the Population-Intervention-Comparator-Outcome (PICO) framework (Eriksen & Frandsen, 2018). Regarding publication status, all sources of evidence up to October 2023 with an accessible English version were considered eligible. This included published studies accessible via peer-reviewed scientific databases; grey literature, such as dissertations, theses, conference proceedings, and abstracts; and unpublished studies shared via researchers and research networks. Evidence from websites, newspapers, and magazine sources was excluded.

Table 1.

Study-Related Inclusion Criteria.

Study design	Population	Intervention	Comparator	Outcome
a. Evidence of quantitative nature. b. Production studies. c. Studies relying on naturalistic observations or experimentally controlled conditions.	a. Newborns from birth up and till the third month of life. b. Healthy newborns are defined as being born at term with a normal birth weight, normal APGARs, no complications, and a minimal risk of developing complications. c. Monolingual acquisition of any given language.	The study has no explicit intervention. The phenomenon of interest is the vocalisations of human newborns, including both cry and non-cry vocalisations.	The language newborns are acquiring should be stated. All language typologies may be included, as well as studies investigating a single or multiple language groups.	a. Any measurements or analysis method of pitch. b. Unit of analysis may be varied based on the study’s working definition of a vocalisation. c. Analysis methods may be manual or automated.

Note. APGAR – Appearance, Pulse, Grimace, Activity, Respiration.

Information Sources

To ensure a comprehensive literature search, multiple sources were consulted, as shown in Figure 1. The primary source was electronic databases, supplemented by other sources, including grey literature, reference list searching, and author contact.

Figure 1.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram with identification, screening, and inclusion results (diagram format from Page et al., 2021).

First, five databases were selected with the assistance of a librarian to ensure the representation of peer-reviewed evidence from the fields of Medicine, Social Sciences, and Linguistics. These databases included PubMed (accessed via the National Library of Medicine), Scopus (accessed via Elsevier), Web of Science (accessed via Clarivate), PsycINFO (accessed via Ovid), and Linguistics and Language Behavior Abstracts (accessed via ProQuest). The database search resulted in a total of n = 28,277 records. After manual screening of the titles and/or abstracts, n = 51 studies were selected for full-text retrieval. Following the full-text assessment, n = 42 studies were excluded for one or more of the reasons; unspecified language background (n = 29), lack of focus on pitch (n = 10), exceeding the age range of interest (n = 10), absence of a production experiment (n = 1), or inclusion of a newborn with health conditions (n = 1).

In addition to the database searches, grey literature was consulted using Google Scholar and OpenGrey (accessed via the DANS EASY data archive) to identify relevant unpublished studies. Although specific grey literature hits were not recorded in the identification phase, one study from this source ultimately met the inclusion criteria and was included in the review. Furthermore, the reference lists of studies included in the review following full-text screening were searched (n = 15). This led to the identification of n = 994 records, and after screening, n = 2 studies were included in the final selection.

Finally, corresponding authors of the studies included in the review were contacted to request unpublished or related published studies within the scope of the review (n = 15 authors were contacted, but no additional studies were identified). Authors of studies meeting all inclusion criteria, except for providing information on the language background of the participating newborns, were also contacted to ascertain whether this information was available (n = 19). This inquiry resulted in the identification of n = 10 records, and after screening, n = 6 studies were included in the final selection.

As shown in Figure 1, a total of n = 18 studies met the inclusion criteria, comprising n = 9 from the electronic database search, n = 1 from grey literature, n = 2 from reference lists, and n = 6 from author inquiries.

Search

A search string was designed after a scope of the literature field in May 2023 using the broad query (newborn* AND (vocalisation* OR cry*) AND acoustic*) in Scopus and Google Scholar. This query was selected to identify studies investigating the acoustical properties of newborn vocalisations and/or cries. This provided an overview of the terminology often used within the defined field, which formed the starting point for the selection of search terms. The search term list was organised using the PICO framework (Eriksen & Frandsen, 2018) and expanded by using the Medical Subject Headings (MeSH) terms generator. This resulted in the maximal search string²: (newborn* OR infant* OR neonate* OR baby) AND (vocalisation* OR vocalization*OR cry* OR sound*) AND (acoustic* OR pitch OR ‘acoustic analysis’ OR melody). All databases were searched by title and abstract using the above-mentioned search string, with its syntax adjusted to the programming format of each database. This process was conducted between June and October 2023.

Selection of Sources of Evidence

Evidence selection occurred in three consecutive phases: a reliability verification step, title and abstract screening, and lastly, full-text screening. During all of the phases, discrepancies between researchers were discussed and reconciled through consultation of the inclusion criteria.

Firstly, the reliability of the search process and inclusion criteria was verified. The first author and a research assistant independently searched and screened the PubMed database by title and abstract, and compared their Excel sheets’ results categorised into three categories, namely ‘include’, ‘exclude’, and ‘uncertain’. The same number of hits (n = 1,927) was yielded, although the inter-rater agreement rate for the ‘include’ category was only 17%. Upon discussion, it became clear that the assistant added all hits that did not state critical information, such as the age or language exposure of newborns, to the ‘uncertain’ category, whereas the first author added these hits to the ‘include’ category to determine whether this information was provided in the full-text. To reconcile this difference, the ‘uncertain’ tab was discarded, and all hits that potentially fit the scope of the review, except for one of the PICO concepts, were placed in the ‘include’ category to review whether the missing information was provided in the full text. This decision significantly improved the inter-rater agreement to 87% and indicated that the search process and inclusion criteria were reliable.

Secondly, the title and abstract screening of all databases was conducted by the first author, and studies to be included for a full-text review were recorded in an Excel sheet. The research assistant conducted a blind cross-check on this list, and an inter-rater agreement of 87% was obtained.

Thirdly, the full-text screening was conducted by the first author, and studies eligible to be included in the systematic review were recorded in an Excel sheet. The research assistant randomly selected 50% of the studies and conducted a blind cross-check, which yielded a 100% inter-rater agreement.

Data Charting Process

Data charting was done manually by the first author using a self-compiled data charting tool designed by integrating elements from Gabrieli et al.’s (2019) checklist to ensure the replicability of infant cry studies, and the JBI standardised data extraction tool for quantitative evidence (Tufanaru et al., 2017). The following data items were charted and aggregated in an Excel sheet:

a. Metadata: Title, author, year, country

b. Study design: Research questions and aims, dependent variable, independent variable, comparator groups, control groups, cross-sectional versus longitudinal nature

c. Participant information: Number of participants, age of participants, sex of participants, health status of participants, language(s) exposed to, number of vocalisations per participant, additional information about the participant, additional information about the caregiver

d. Data collection information: Type of vocalisation, the context wherein vocalisations were produced, recording environment, duration of data collection session(s), frequency of data collection sessions, infant positioning, type of microphone used, microphone-to-mouth distance

e. Data analysis information: Software/hardware used, signal preprocessing procedures, exclusions, number of vocalisations analysed, pitch parameter analysed, unit of analysis, pitch settings, statistical analysis method(s)

f. Study outcome information: Numerical outcome variables, statistical outcome, inferences made

Data Synthesis

A data-driven, narrative synthesis was conducted on each of the charted data items outlined above using Excel, RStudio (Posit team, 2023), and the ggplot2 package (Wickham, 2016). This involved assigning a posteriori codes to categorise the data set, conducting frequency counts of each code, and subsequently summarising trends in the data set through narratives and figures. The numerical outcome data were coded based on which pitch parameter it described, and then all data points that described the same pitch parameter were aggregated in visual illustrations according to language group, age group, and vocalisation type. These visual illustrations were used to make cross-linguistic comparisons qualitatively, and they also allowed comparisons to be made between vocalisation types and age groups. Synthesis was completed by the authors, who engaged in frequent discussions and iterative exchanges to ensure the most accurate and objective results when synthesising the charted data.

Results

A total of 18 studies, published between 2002 and 2023, were included in the review. Seventeen of them were peer-reviewed scientific publications, and one was a doctoral dissertation (references for the studies included in the review are marked with an asterisk (*) in the reference list). Figure 1 shows a PRISMA flow diagram (Page et al., 2020) illustrating the number of studies identified, screened, and included in the review from each respective source. The results are structured into two sections: firstly a summary of the key study characteristics of included studies and secondly, a report of the pitch parameters analysed across studies and their respective outcomes.

Study Characteristics of Included Studies

The key study characteristics are summarised in Table 2. Eleven languages were included in the review, with German occurring the most frequently (German n = 12; French n = 3; Mandarin Chinese n = 2; Lamnso³ n = 2; Swedish n = 1; Italian n = 1; Arabic n = 1; American English n = 1; Australian English n = 1; Brazilian Portuguese n = 1; Japanese n = 1). Within this language sample, there were two tone languages (Mandarin Chinese, Lamnso) and two lexical pitch accent languages (Swedish, Japanese), whilst the remaining languages were considered ‘intonation only’ languages. Eleven studies described the acoustic features of early vocalisations of a single language, while six studies made comparisons between two languages, and one study made comparisons between three languages.

Table 2.

Key Study Characteristics of Studies Included in the Review.

Study	Language	Age range, sampling nature	Number of participants	Vocalisation type	Number of vocalisations analysed	Pitch parameter(s) analysed
Armbrüster et al., 2021	German	1–4 months, longitudinal (monthly)	12	Cry	3,114	MI
Baeck & de Souza, 2007	Brazilian Portuguese	1–6 months, longitudinal (bi-weekly)	30	Cry	350 cry recordings, exact number of cries not specified	mean
Gratier & Devouche, 2011	French	3 months, cross-sectional	20	Non-cry	321	shape
Gregory, 2013	Australian English	1–6 months, longitudinal (monthly)	4	All	7,517	mean
Gustafson et al., 2017	Mandarin Chinese; American English	1 month, cross-sectional	33 (22 English, 11 Chinese)	Cry	497	mean; SD; max; span; max pos.
Kottmann et al., 2023	German	0–12 months, longitudinal (monthly)	10	All	9,237 (1,141 cry; 4,667 non-cry)	MCI
Lind & Wermke, 2002	German	0–3 months, longitudinal (daily)	1	Cry	279	mean
Mampe et al., 2009	French; German	2–5 days, cross-sectional	60 (30 French, 30 German)	Cry	1,254	shape
Manfredi et al., 2019	French; Italian; Arabic	1–3 days, cross-sectional	47 (2 Arabic, 17 French, 28 Italian)	Cry	7,500	mean; SD; shape
Prochnow et al., 2019	German; Swedish	1–5 days, cross-sectional	131 (79 German, 52 Swedish)	Cry	4,702	MCI
Shinya et al., 2017	Japanese	First day of life, cross-sectional	107 (33 term, 77 preterm)	Cry	3,578	span; MCI
Wermke & Robb, 2008	German	1–7 days, cross-sectional	131	Cry	1,350	mean; max; min
Wermke et al., 2002	German	2 months, cross-sectional	6 (3 twin sets)	Cry	136	mean; SD
Wermke et al., 2007	German	1–4 months, longitudinal (monthly)	34	Cry	10,862	MCI
Wermke et al., 2013	German; Lamnso	3 months, cross-sectional	33 (14 German; 19 Lamnso)	Non-cry	808	mean; span; MCI
Wermke et al., 2016	German; Lamnso	4 days, cross-sectional	42 (21 German, 21 Lamnso)	Cry	1,002	mean; SD; span; fluct.
Wermke et al., 2017	German; Mandarin	1–5 days, cross-sectional	102 (55 German, 47 Chinese)	Cry	5,239	mean; SD; min; span; fluct.
Wermke et al., 2021	German	2–4 days, cross-sectional	74	Cry	1,251	mean; SD; max; min; span

Note. mean = mean pitch; SD = standard deviation of mean pitch; max = maximum pitch; min = minimum pitch; span = pitch span; fluct. = pitch fluctuation defined as the ‘measure of the mean pitch variation between ﬁve consecutive time sections of equal length per cry utterance’; MCI = melody complexity index; MI = melodic interval; shape = pitch contour shape; max pos. = position of maximum pitch.

With regards to the age ranges of participating newborns, the majority of studies were conducted cross-sectionally within the first week of life (n = 8), followed by longitudinal follow-up studies in the first 3 months of life (n = 6); and cross-sectional studies at 3 months of age (n = 2), at 2 months of age (n = 1) and within the first month of life (n = 1). Although some of the included studies were conducted with age ranges exceeding 3 months, only data up to and including the 3^rd month were included in the review. The number of participants ranged from a single participant to 131 participants, with the average number of participants being 49 per study.

In terms of vocalisations, all studies focused on spontaneous vocalisations of healthy newborns, and three categories of vocalisations were included. Cry vocalisations were the most dominant (n = 14), followed by non-cries (n = 2) and all vocalisation types (n = 2). Cries were primarily recorded in hunger contexts before regular feeding times or during typical interactions like diaper changes or spontaneous fussing episodes. No cries were pain-induced. Non-cries were recorded during unstructured face-to-face interaction with caregivers. Recordings were made in medical environments such as maternity wards (n = 8), home environments (n = 5), or both environments (n = 5). During recordings, infants were lying supine, for example, on a dresser, blanket, or in a bassinet (n = 6); propped up in a reclining seat (n = 2); or in the mother’s lap (n = 1). In most of the studies, information about the infant’s positioning was, however, unspecified (n = 10). The number of vocalisations ranged from 136 to 10,862, with the average number of vocalisations analysed per study being 3,450 (excluding Baeck & de Souza, 2007, as the number of analysed vocalisations was not specified).

Pitch Parameters Analysed and Outcomes⁴

Across studies, we observed ten different parameters related to the pitch patterns of early vocalisations, each describing a different feature of the pitch contour. There was not one parameter that appeared consistently across studies, and there was not one study that reported all 10 possible parameters (Table 3). To facilitate a succinct synthesis, the definitions, analysis criteria and outcomes⁵ of the ten parameters are presented in three groups (in sections ‘Static Pitch Variation Parameters’ to ‘Pitch Contour Shape Parameters’, respectively). Each group describes comparable features of the pitch contour. This includes:

Table 3.

Comparison of Pitch Parameters Analysed Across Studies Describing Pitch Patterns of Newborn Vocalisations with ‘x’ Indicating That a Parameter Was Analysed.

	Static pitch variation						Pitch contour complexity		Pitch contour shape
Study	mean	SD	max	min	span	fluct.	MCI	MI	shape	max pos.
Armbrüster et al., 2021								x
Baeck & de Souza, 2007	x	x
Gratier & Devouche, 2011									x
Gregory, 2013	x
Gustafson et al., 2017	x	x	x		x				x	x
Kottmann et al., 2023							x
Lind & Wermke, 2002	x	x
Mampe et al., 2009									x	x
Manfredi et al., 2019	x	x							x	x
Prochnow et al., 2019							x
Shinya et al., 2017	x		x	x	x		x
Wermke & Robb, 2008	x	x	x	x
Wermke et al., 2002	x	x
Wermke et al., 2007							x
Wermke et al., 2013	x	x			x		x
Wermke et al., 2016	x	x			x	x
Wermke et al., 2017	x	x		x	x	x
Wermke et al., 2021	x	x	x	x	x
Total	12	10	4	4	6	2	5	1	4	3

Note. mean = mean pitch; SD = standard deviation of mean pitch; max = maximum pitch; min = minimum pitch; span = pitch span; fluct. = pitch fluctuation; MCI = melody complexity index; MI = melodic interval; shape = pitch contour shape; max pos. = position of maximum pitch.

static pitch variation parameters: describe the magnitude of pitch variation within a vocalisation (including mean pitch, standard deviation of mean pitch, maximum pitch, minimum pitch, pitch span, and pitch fluctuation defined as the ‘measure of the mean pitch variation between ﬁve consecutive time sections of equal length per cry utterance’)

pitch contour complexity parameters: describe the number of pitch modulations within a vocalisation (including melodic complexity index and melodic intervals[MI])

pitch contour shape parameters: describe the shape of the pitch contour of a vocalisation categorically (including pitch contour shape and the position of maximum pitch)

Table 3 presents a comparison of the pitch parameters analysed across studies and how these parameters were grouped.

Static Pitch Variation Parameters

This group involved six parameters describing variation amongst newborn vocalisations as numerical measurements based on static time points on the pitch contour, thereby describing the magnitude of pitch variation. This was done using mean pitch (n = 12), standard deviation of mean pitch (n = 10), maximum pitch (n = 4), minimum pitch (n = 4), pitch span (n = 6), and pitch fluctuation (n = 2). Relying on the static pitch variation measures to describe the pitch patterns of newborn vocalisations was the most common, as 12 studies relied on one or more of the listed parameters. Furthermore, it was well-distributed across linguistic diversity (except for the pitch fluctuation parameter).

Mean Pitch and Standard Deviation

The definition of mean pitch (arithmetic average of all the pitch values of a vocalisation) and standard deviation (standard deviation of the mean pitch of a vocalisation) was uniformly applied across studies, thereby improving the validity of cross-linguistic comparisons. Figure 2 presents a scatter plot showing the mean pitch and its standard deviation for vocalisations, categorised by language, age group, and vocalisation type. In terms of mean pitch, there were no significant group differences in the initial studies. In terms of standard deviation, there were significant cross-linguistic differences in cry vocalisations in the first week (Wermke et al., 2016, 2017) and in the first month of life (Gustafson et al., 2017). In the first week of life, Wermke et al. (2016) reported that Lamnso-exposed newborns produced higher standard deviations and thus displayed larger pitch variation than German-exposed newborns. Similarly, Wermke et al. (2017) reported that Chinese-exposed newborns displayed larger pitch variations than German-exposed newborns. In the first month of life, Gustafson et al. (2017) found that American English-exposed newborns display larger pitch variation than Chinese-exposed newborns, although the authors described this finding within the bounds of what they would expect based on chance alone.

Figure 2.

Mean pitch and standard deviation of newborn vocalisations per language, age group, and vocalisation type.

Maximum Pitch, Minimum Pitch, and Pitch Span

The definition of maximum pitch (maximum pitch value of a vocalisation), minimum pitch (minimum pitch value of a vocalisation), and pitch span (maximum minus minimum pitch of a vocalisation) was uniformly applied across studies, thereby improving the validity of cross-linguistic comparisons. Figure 3 displays a scatter plot of these parameters per language and age group (all vocalisations were cries). In terms of maximum pitch, no significant group differences were reported in the initial studies. In terms of minimum pitch, Wermke et al. (2017) reported a significant group difference in the cries of German versus Chinese learning newborns in the first week of life, with Chinese-acquiring newborns producing lower pitch minimums. In terms of pitch span, Wermke et al. (2017) also found that Chinese-acquiring newborns produce broader pitch ranges than their German counterparts. Similarly, Wermke et al. (2016) reported that Lamnso-exposed newborns produce significantly broader pitch ranges than German-exposed newborns.

Figure 3.

Maximum pitch, minimum pitch, and pitch span of newborn cries per language and age group.

Pitch Fluctuation

Pitch fluctuation was defined as the ‘measure of the mean pitch variation between ﬁve consecutive time sections of equal length per cry utterance’ by both studies that employed this parameter (Wermke et al., 2016, 2017). These two studies reported the following pitch fluctuation values for cry utterances in the first week of life: 29 Hz (German), 31 Hz (German), 33 Hz (Mandarin Chinese), and 39 Hz (Lamnso). Wermke et al. (2016) reported a significant difference in the pitch fluctuation values of German and Lamnso newborns, with Lamnso-exposed newborns showing higher values and thus greater pitch variation.

Pitch Contour Complexity Parameters

This group involved two parameters describing the complexity of a newborn vocalisation, as determined by the number of arcs or peaks within the pitch contour, thereby describing the number of pitch modulations. This type of pitch analysis was less common (n = 6) than describing static pitch features and was poorly distributed across linguistic diversity since it was used in one country (Germany) in five out of six instances.

Melodic Complexity Index

The melody complexity index (MCI) parameter expressed the share of vocalisations with complex melodies (versus simple melodies) among all the vocalisations. It was calculated by firstly classifying melodies as simple (single-arc) or complex (multiple-arc), and then calculating an index using the formula:

\begin{array}{l} MCI = number of vocalisations with complex melodies / \\ (complex + simple melodies) \end{array}

A higher index thus indicated higher melodic complexity. Figure 4 demonstrates a simple and complex cry melody reproduced from Prochnow et al. (2019).

Figure 4.

Waveform and spectrogram of cries with a simple melody (single arc) versus a complex melody (two arcs) reproduced from Prochnow et al. (2019).

This general analysis criterion was uniformly applied across studies (n = 5), although subtle differences existed in the targeted criteria used across studies to determine when an arc occurred. This may compromise the validity of making inter-study comparisons. In terms of duration criteria, an arc was defined as being longer than 150 ms (Prochnow et al., 2019; Shinya et al., 2017; Wermke et al., 2007, 2013) or longer than 300 ms (Kottmann et al., 2023). In terms of pitch criteria, an arc had to exhibit a frequency amplitude of at least 3 semitones (Prochnow et al., 2019; Wermke et al., 2007) or 2 semitones (Shinya et al., 2017; Wermke et al., 2013) or 2 semitones for cries and 1, 5 semitones for non-cries (Kottmann et al., 2023). Figure 5 displays a scatter plot of the MCI of newborn vocalisations per language, age group, and vocalisation type. Prochnow et al. (2019) reported that Swedish-exposed newborns produce cries with a significantly higher MCI than German-exposed newborns, thereby inferring that Swedish newborns cry with more complex melodies than their German counterparts as early as the first week of life.

Figure 5.

Melody complexity index of newborn vocalisations per language, age group, and vocalisation type.

Complexity of Melodic Intervals

The complexity of occurring MI was employed as a measurement only by Armbrüster et al. (2021). This study described the probability of producing complex MI in their cry vocalisations from the first to the twelfth week of life in German-exposed newborns. A MI was defined as an event consisting of three elements: a first plateau that defines a relatively stable tone of ⩾50 ms and with amplitude variation of < ± a quartertone, followed by a transition where pitch glides from the first plateau to a subsequent higher or lower plateau (Armbrüster et al., 2021). A single MI consists of one of the above, while a complex MI is a combination of two or more single intervals. A consistent increase in the probability of German-exposed newborns producing a complex MI in cry vocalisations, rising from 30.2% in the first week of life to 59.5% in the twelfth week, was reported. This finding demonstrates a gradual and continuous development in their ability to produce more complex cry melodies.

Pitch Contour Shape Parameters

This group of parameters involved describing the shape of the pitch contour of newborn vocalisations using categorical descriptors, for example, rises, falls, plateaus, and so forth. This type of pitch pattern analysis was the least common amongst included studies (n = 4), although well-distributed amongst linguistic diversity. The possible pitch contour shapes and the criteria used to determine shapes varied significantly across studies, as shown in Table 4. This included differences in the number of possible shapes, differences in the labels used to describe pitch contour shapes, and differences in reliance on duration, peak alignment, and/or pitch change criteria to categorise shapes. This substantial variation contests the validity of making inter-study comparisons, and its implications are explored in the ‘Discussion’ section.

Table 4.

Pitch Contour Shape Analysis Criteria.

Reference	Criteria
Mampe et al., 2009	Classification of shapes based on the temporal alignment of the pitch peak of simple cries containing single rising-falling melodies. Melody arcs were normalised in duration to 1 s, and the normalised time corresponding to the maximum pitch of each melody arc was determined to assign one of two contour shapes: falling contours (values < 0.5 s) and rising contours (values > 0.5 s).
Gustafson et al., 2017	Replicated the method of Mampe et al. (2009) of analysing the temporal alignment of the pitch peak of a vocalisation.
Manfredi et al., 2019	Automated shape classification using the BioVoice software. Allows both automated and perceptual classification of the pitch contour shape among 12 basic melodic shapes: Plateau, Rising, Falling, Symmetric, Complex, Low-Up, Up-Low, Frequency Step, Double, Unstructured, Not a Cry, and Other. The software follows a flowchart procedure based on duration and pitch criteria as illustrated in detail in Figure 3 in Manfredi et al. (2019).
Gratier & Devouche, 2011	Classification of shapes based on pitch plots using Praat software. Allows classification of 8 melodic shapes: unitonal, rising, falling, bell shape, U shape, sinusoidal, complex, and uncodable. The detailed pitch criteria used to assign shapes are available in Table 1 in Gratier and Devouche (2011).

The most common method used to determine whether a pitch contour is rising or falling is determining where the maximum pitch occurs within the contour (Gustafson et al., 2017; Mampe et al., 2009; Manfredi et al., 2019). If the maximum pitch occurred in the first half of the vocalisation, it was categorised as a fall, whereas a maximum pitch in the latter half of the vocalisation constituted a rise. Essentially, the temporal alignment of the pitch peak was used to determine the shape of the pitch contour, with an early peak being categorised as a fall and a late peak being categorised as a rise. This method is demonstrated in Figure 6, which shows where the maximum pitch occurs in newborn cries per language and age group, and how this relates to the description of the shape of the pitch contour. The validity of the assumption that the temporal alignment of a pitch peak maps onto a rising or falling contour shape will be debated further in the discussion.

Figure 6.

Position of maximum pitch in 1-second time-normalised cry contours per language and age group.

Mampe et al. (2009) found that German-acquiring newborns’ cries peak significantly earlier than those of French-acquiring newborns, thereby inferring that German-acquiring newborns cry with falling pitch contours, whereas French-acquiring newborns cry with rising pitch contours as early as the first week of life. Gustafson et al. (2017) studied American English and Mandarin Chinese newborns, mimicking the method of Mampe et al. (2009) in terms of the dependent variable examined and the analysis criteria applied. However, they modified the statistical modelling by nesting each cry within the participant to account for individual variation in infant cries. Using this multilevel statistical model, they found no significant group differences between the cries of American English- versus Mandarin Chinese-exposed newborns.

Figure 7 displays which pitch contour shapes are produced per language as a percentage of all vocalisations produced by newborns. The bar graph was created by charting the shape label and percentage as they occurred in the initial studies. The four studies from Figure 7 differ vastly in how they classified shapes and in their use of terminology, making it difficult to see commonalities and differences. Descriptions of the shape categories can be found in the initial studies as specified in Table 4.

Figure 7.

Categorical descriptors of pitch contour shape per language and vocalisation type.

Discussion

The current review study asked whether the use of pitch in newborns’ vocalisations is influenced by prenatal exposure to the prosody of their native language. It aimed to systematically describe the study characteristics and outcomes of existing evidence, thereby capturing the converging and diverging trends within the defined research field to answer the research question. The discussion will first provide a structured response to the outlined research question, followed by motivated recommendations for future research.

Is the Use of Pitch in Newborns’ Vocalisations Influenced by Prenatal Exposure to the Prosody of Their Native Language?

The notion that newborns can produce salient elements of the prosodic patterns of their native language based on prenatal exposure has been supported by some studies, whereas other studies explicitly disagreed with or implicitly contradicted this opinion. Within the current sample of 18 studies, evidence for cross-linguistic differences in the pitch patterns of early vocalisations is both sparse and inconsistent. This suggests that prenatal exposure to the prosodic features of the native language may not significantly shape early speech production. This finding aligns with earlier work by DePaolis et al. (2008), who concluded that language-specific pitch use tends to emerge only once word production has begun, as this requires a certain level of linguistic sensitivity. In the sections that follow, we will systematically discuss the outcomes for each of the pitch parameters studied to illustrate the extent and nature of the evidence.

Static Pitch Variation Parameters

Some studies suggest that newborns with prenatal exposure to a tone language exhibit significantly more pitch variation in their early vocalisations than their peers exposed to a non-tone language in utero. Such differences were predominantly demonstrated for the standard deviation of mean pitch (Wermke et al., 2016, 2017) and the pitch span (Wermke et al., 2013, 2016, 2017) of cries. For example, in the first week of life, German-exposed newborns were found to cry with narrower pitch spans and smaller standard deviations in their mean pitch than Chinese-exposed (Wermke et al., 2017) and Lamnso-exposed newborns (Wermke et al., 2016). In the third month of life, Lamnso-exposed newborns also produced non-cry vocalisations with a broader pitch span than German-exposed newborns (Wermke et al., 2013).

However, opposing results were put forward by Gustafson et al. (2017), who found that during the first month of life, American English-exposed newborns displayed larger standard deviations in their mean pitch during crying than Chinese-exposed newborns. Qualitatively, the American English-exposed newborns also displayed broader pitch spans during crying than the Chinese group. A further notable qualitative difference was observed in the third month of life, where Lamnso-exposed newborns produced non-cry vocalisations with less pitch variability (as measured by the standard deviation of mean pitch) than their German peers (Wermke et al., 2013). We thus conclude that mixed evidence exists for cross-linguistic differences in pitch span and standard deviation of mean pitch between newborns with prenatal exposure to tone versus non-tone languages.

With regards to pitch fluctuation, German-exposed newborns produced cries with lower pitch fluctuation values than Lamnso-exposed newborns, although this was only reported by a single finding (Wermke et al., 2016). Similarly, a single study reported that Chinese-exposed newborns produced cries with a significantly lower minimum pitch than German-exposed newborns, supposedly due to their frequent exposure to tone three, which is characterised by a low-dipping pitch (Wermke et al., 2017). Their interpretation of the finding can, however, be challenged as there is insufficient empirical evidence to support the claim that Chinese-listening foetuses are exposed to Tone 3 with greater frequency than other tones. No cross-linguistic differences were observed in terms of mean pitch or maximum pitch.

Furthermore, the idea that newborns exposed prenatally to a tone language produce more pitch variation than those exposed to a non-tone language may be reconsidered in light of findings that neural tracking of speech develops earlier at the phrase or utterance level than at the word or syllable level (Ortiz Barajas et al., 2021). This suggests that the fetal brain may not yet register finer-grained pitch changes, such as lexical tones, as reliably as broader prosodic contours.

To sum up, support for cross-linguistic differences in static pitch variation measures is thus both limited and inconsistent.

Pitch Contour Complexity Parameters

It has been claimed that newborns exposed to more ‘musical’ languages produce more complex pitch contours than newborns exposed to languages with fewer ‘musical’ elements (Prochnow et al., 2019). Specifically, it was found that Swedish-exposed newborns produced significantly more complex cry melodies than German-exposed peers in the first week of life. Swedish was described as a ‘musical’ language and melodically more complex than German due to the presence of two lexical tones (referred to as ‘musical elements’), which occur in addition to word stress, thereby causing more frequent pitch modulation than languages without lexical pitch accents. Their finding thus supports the notion that newborns can retain and produce salient prosodic features of their native language based on prenatal exposure.

The claim that newborns exposed to ‘musical’ languages—such as lexical pitch-accented and, following the reasoning of Prochnow et al. (2019), tonal languages—vocalise with greater pitch contour complexity than those exposed to less ‘musical’ languages, such as non-tonal ones, is called into question when examined in light of additional studies on early vocalisations. For example, based on qualitative comparisons, Lamnso-exposed newborns produced non-cry vocalisations with less complex melodies than German-exposed newborns at the third month of life (Wermke et al., 2013). Due to the tonal nature of Lamnso and following the reasoning of Prochnow et al. (2019), it may be assumed that it is melodically more complex than German, and therefore this finding is in contrast with the initial claim. Furthermore, the melodic complexity of the cries of Japanese newborns was qualitatively comparable to that of German newborns in the first week of life (Prochnow et al., 2019; Shinya et al., 2014). As Japanese is classified as a lexical pitch-accent language, similar to Swedish, it should then be expected that Japanese newborns produce more complex pitch contours in line with the initial claim.

Overall, the notion that lexical pitch accent or tone languages are melodically more complex than intonation languages may be debated for at least two reasons. First, languages can vary vastly in the density with which pitch is modulated as the building blocks of sentence-level prosody, despite whether they possess lexical tones (i.e. changes in pitch patterns at the syllable-level used to distinguish word meaning), lexical pitch accents (changes in pitch patterns at the word-level that can change the meaning of the word or make words distinct from each other), or non-lexical pitch accents (i.e. changes in pitch patterns that convey non-lexical meanings at the sentence level, such as contrast). For example, German possesses six non-lexical pitch accents (Grice et al.,2005), whereas Stockholm Swedish possesses two lexical pitch accents (Myrberg & Riad, 2015). Second, it is the case that non-lexical pitch accents are typically assigned to certain words, whereas lexical tones and lexical pitch accents are present in every word in a given utterance. However, research on neural tracking of acoustic changes in speech suggests that the ability to track changes at the phrase/utterance-level develops earlier than at the word- and syllable-level (Ortiz Barajas et al., 2021). This raises the question of whether more frequent changes in pitch movement in lexical tone and lexical pitch accents can be registered by the foetus’ brain. Finally, besides the number of discrete pitch patterns and the distribution of these patterns in a given utterance, variation in the pitch range that speakers typically use in a language can also differ between languages. For example, the southern variety of British English tends to have a larger pitch range than Dutch (De Pijper, 1983; Willems, 1982), German and Spanish (Andreeva et al., 2014; Estebas-Vilaplana, 2014. This may, in turn, influence foetuses’ perception of melodic complexity in speech.

Thus, we conclude that no convergence exists amongst studies to demonstrate cross-linguistic differences in the melody complexity of vocalisations amongst newborns.

Pitch Contour Shape Parameters

It has been argued that newborns with different language backgrounds cry with differing pitch contour shapes as early as the first week of life. This was first claimed by Mampe et al. (2009), who found that German newborns produced mostly falling pitch contours, whereas French newborns produced mostly rising pitch contours. It was also suggested by Manfredi et al. (2019), who reported above-chance level discrimination between the cries of Arabic, French, and Italian newborns using a machine classification task.

This claim can be debated, starting with the analysis criteria used to categorise pitch contour shapes. Both Mampe et al. (2009) and Manfredi et al. (2019) classified pitch contours as rising or falling based on the temporal alignment of the pitch peak in newborn cries, as outlined in section ‘Pitch Contour Shape Parameters’. This method essentially distinguishes between early and late peak alignments in contours of similar shapes. Bell-shaped contours skewed to the left are classified as falls, while those skewed to the right are classified as rises. This raises the question of whether peak alignment is a valid cue for determining pitch contour shape, particularly in terms of directionality, in newborn cries. To the best of our knowledge, no published evidence within the field of prosody exists to show that listeners perceive early- and late-aligned bell-shaped contours in speech as falls and rises, respectively. Existing research only indicates that peak alignment may carry different connotations in some languages (Pierrehumbert & Steele, 1989). Therefore, the practice of classifying newborn cries as rises or falls based on peak alignment lacks empirical support.

The initial claim by Mampe et al. (2009) has further been challenged by Gustafson et al. (2017) due to the statistical analysis conducted. To demonstrate the importance of acknowledging the uniqueness of cry sounds to individuals, Gustafson et al. (2017) conducted two analyses: firstly, treating each cry as an independently sampled unit using t-tests as was done by Mampe et al. (2009), without taking account of individual differences, and secondly, treating each newborn as an independent unit with cries nested within individual newborns by mean of mixed linear modelling with individual differences being factored in. While the first analysis revealed significant group differences and replicated Mampe et al.’s (2009) findings of notable differences in the peak alignment among newborns from different language backgrounds, the second analysis did not. Gustafson et al. (2017) concluded that there is a risk of invalid, significant findings when researchers do not account for the variance associated with individual newborns in their statistical modelling.

Finally, the outcome that French newborns mostly cry with rising pitch contour shapes is further brought into question if considered alongside other studies on the pitch contour shapes of early vocalisations of French newborns. For example, Manfredi et al. (2019) reported only 4% of cry vocalisations in the first week of life as rising, and Gratier and Devouche (2011) classified merely 16% of non-cry vocalisations in the third month of life as rising. Rather, plateau, complex, and bell-shaped vocalisations were thought to dominate the vocal landscape of French newborns in the first 3 months of life. There is thus little convergence amongst available studies to support cross-linguistic differences in the pitch contour shape of early vocalisations.

It is vital to point out that the analysis criteria used to classify pitch contour shape were highly variable across studies, which limits the validity of making inter-study comparisons. Firstly, the criteria had large-scale differences, such as relying on pitch change, duration, and/or peak alignment cues to categorise shapes. For example, Mampe et al. (2009) relied on peak alignment, Gratier and Devouche (2011) relied on pitch change, whereas Manfredi et al. (2019) relied on all three cues to categorise pitch contour shape. Secondly, the criteria also had small-scale differences in the targeted requirements of each of the cue categories. For example, Mampe et al. (2009) classified ‘rising’ contours according to a pitch peak occurring >0.5 seconds (thus in the last 50% of the vocalisation), whereas Manfredi et al. (2019) categorised ‘rises’ according to a pitch peak in the last 40% of the vocalisation. If one applies these respective criteria to a cry vocalisation where the pitch peak occurs in the 50% to 60% region of the vocalisation, the same cry will be categorised as rising (Mampe et al., 2009) versus symmetric (Manfredi et al., 2019). Furthermore, if one compares the varying analysis criteria as outlined in Table 4, the following scenarios can occur: firstly, pitch contours with similar shapes may be assigned different labels. For example, ‘plateau’ by Manfredi et al. (2019) and ‘unitonal’ by Gratier and Devouche (2011) describe a similar shape. Secondly, different pitch contours may be assigned the same labels. For example, the label ‘rising’ contours or ‘rises’ have different meanings in Mampe et al. (2009), Manfredi et al. (2019), and Gratier and Devouche (2011).

Based on the reviewed pitch parameters, we conclude that there is both limited and mixed evidence for cross-linguistic differences in the pitch patterns of early vocalisations within the current sample of 18 studies. This conclusion can be interpreted in two ways. First, as demonstrated by this review, the available studies are still too limited in number and diversity to reliably and validly demonstrate potential cross-linguistic differences. Consequently, the current review does not yield a definite positive answer to the research question. Second, from a more speculative perspective, early vocal development during the first 3 months of life appears to exhibit a degree of universality. This suggests that prenatal exposure to native-language prosody may not significantly influence the prosodic patterns of early vocalisations, implying that early production skills may lag behind perceptual skills in prosodic development.

Although not a primary goal of the review, brief observations can be made about the influence of vocalisation type and age on the prosodic characteristics of newborns’ vocalisations in the first 3 months of life. Firstly, there is neither a consistent increase nor a decrease of mean pitch with age, as often contested in the literature (see Chapter Seven in Gregory [2013] for a review). Secondly, cries were observed to have higher mean pitch values (ranging from 378 Hz to 476 Hz) than non-cries (ranging from 318 Hz to 355 Hz), whereas the mean pitch of all vocalisation types fell between these two groups (ranging from 347 Hz to 381 Hz). Lastly, a clear developmental trend was observed across studies with German-exposed newborns, showing that melody complexity of early vocalisations consistently increased from birth to the third month of life (Armbrüster et al., 2021; Kottmann et al., 2023; Prochnow et al., 2019; Wermke et al., 2007, 2013).

Recommendations for Future Research

We make the following data-driven recommendations based on the findings of this review.

Representative Data

The representativeness of languages and vocalisation types studied should be well-considered, and the assumptions underlying related research questions or hypotheses should be supported by empirical evidence.

With regards to languages, German dominated available studies to a notable extent, and comparisons were restricted to non-tone versus tone and lexical pitch accent languages. Wermke et al. (2016) argued that the latter group relies on pitch changes across smaller units (syllable or word level), which may result in faster and increased modulation, whereas in non-tone languages, pitch operates across larger units (phrase level), which may result in slower and decreased pitch changes. Subsequently, the question of whether newborns will reflect such fundamental differences in their early vocalisations has been explored (Prochnow et al., 2019; Wermke et al., 2013, 2016, 2017). While this question opens up promising research avenues, future studies should go beyond merely comparing broad prosodic typologies such as tone and non-tone languages. Instead, comparisons should be based on empirically supported differences between languages, such as variations in the density of lexical tones, lexical pitch accents, and/or non-lexical pitch accents, and take into account the fact that the foetal brain may not be able to register frequent changes in pitch movement in lexical tone and lexical pitch accents (Menn et al., 2023).

With regards to vocalisation types, newborn cries dominate available studies, and increased focus should be placed on non-cry vocalisation types. If this is considered in conjunction with research (Oller et al., 2021), showcasing that newborns in the first 3 months of life produce significantly more speech-like sounds than cries, it indicates that the available evidence is not yet truly representative of the age ranges’ vocal repertoire. This recommendation is of importance to studies aiming to develop screening or diagnostic tools for clinical purposes based on the prosodic features of early vocalisations.

Purposeful Selection of Pitch Parameters

A further recommendation is to purposefully select distinct pitch parameters to describe the pitch patterns of early vocalisations in an attempt to tease apart how prosodic organisation occurs within production in this early developmental stage. As demonstrated in this review, multiple pitch parameters may be employed to describe the pitch patterns of early vocalisations. It should, however, be acknowledged that each parameter describes a unique characteristic of the pitch contour, and therefore, being explicit about the rationale for selecting a specific parameter is beneficial. For example, whereas some studies speculated that newborns acquiring tone languages will produce vocalisations with larger pitch variation (magnitude of change) than their non-tone peers, other studies speculated that they will produce more complex melodies (number of changes). These two parameters, however, describe distinct characteristics of the pitch contour that may each be influenced by prenatal exposure to a newborn’s native language in a unique way.

Coherency Across Studies

Improving coherence amongst research studies concerning the terminology and analysis criteria used should be prioritised. With regards to terminology, multiple terms are used interchangeably in the defined literature field, such as acoustic patterns, prosodic patterns, pitch patterns, intonation patterns, and melodic patterns. While terms are often used flexibly for readability, and consensus on their precise definitions may not always be established, it is advisable to provide at least a working definition for each chosen term. This is important because subtle differences in meaning can exist across the terminology, which may affect interpretation. Based on the search conducted during the development of the search string, the following trends exist: acoustic patterns typically refer to a broad array of physical properties such as the pitch, intensity, duration, and timbre of any given sound wave. Prosodic patterns typically refer to a subset of physical properties (pitch, intensity, and duration) but are specifically related to language and communication. Pitch patterns describe the variation in the frequency of any given sound wave, while intonation patterns specifically refer to frequency variations across phrases and utterances. The term pitch is used in both perception and production research and is used more widely than its measurable counterpart, fundamental frequency. The term ‘melody’ or melodic pattern was not well-defined and typically entailed analysis of pitch cues, rather than duration or intensity cues.

With regards to analysis criteria, static pitch variation measures such as mean pitch or pitch span were consistent across studies, and the analysis criteria for pitch contour complexity had small-scale inconsistencies. The analysis criteria for categorising pitch contour shapes were, however, highly variable, which then begs the question as to what makes a shape a shape. Such variability proves an obstacle to inter-study comparisons, and the role of pitch change, duration, and peak alignment cues to categorise pitch contour shape is yet to be standardised. Addressing this question poses considerable complexity; however, a starting point may be to establish perceptually plausible categories based on acoustic cues to determine which cues matter to the categorisation of pitch contour shapes. It is important to note that while variation in analysis criteria is not inherently negative—and may, in fact, be necessary to best reveal cross-linguistic differences or to accommodate specific data characteristics—such choices must be supported by transparent and well-reasoned methodological justifications.

Conclusion

This systematic review synthesised evidence from 18 production studies of a quantitative nature that analysed the pitch patterns of the cry and non-cry vocalisations of healthy newborns with monolingual prenatal exposure between birth and 3 months. Within this sample, the available evidence for cross-linguistic differences in the pitch patterns of early vocalisations is sparse and inconsistent. Therefore, this review does not conclude a clear role of prenatal language exposure in shaping the pitch patterns of early vocalisations to be language-specific. It should be acknowledged that a limited number of studies (n = 18) were included in the review and that these studies were limited in the diversity of languages and vocalisation types represented. Furthermore, the studies analysed the pitch patterns of early vocalisations in various ways, using diverse analysis criteria. Rather than offering definitive support, the outcome of this review provides a critical reassessment of existing claims, underscoring the scarcity of robust evidence for prenatal language influence on the use of pitch in newborns’ vocalisation. Finally, data-driven recommendations are made to stimulate future research endeavours to be more representative, purposeful, and coherent.

Supplemental Material

sj-docx-1-fla-10.1177_01427237251372203 – Supplemental material for The Influence of Prenatal Language Exposure on the Use of Pitch in Newborns’ Vocalisations: A Systematic Review

Supplemental material, sj-docx-1-fla-10.1177_01427237251372203 for The Influence of Prenatal Language Exposure on the Use of Pitch in Newborns’ Vocalisations: A Systematic Review by Elanie van Niekerk, Caroline Junge and Aoju Chen in First Language

Footnotes

Acknowledgements

We would like to express our gratitude to Marlijne Bouwmeester for her assistance in searching, screening, and selecting evidence for inclusion in our review. We would also like to thank Alissa Vavinova for conducting the ‘prosody-only’ search, which helped us to determine the validity of our search string.

ORCID iDs

Elanie van Niekerk

Aoju Chen

Ethical Considerations

Informed consent is not required as the review relies on the synthesis of existing evidence and does not involve human participants.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Author Contributions

Elanie van Niekerk: Conceptualisation; Data curation; Formal analysis; Methodology; Project administration; Visualisation; Writing—original draft; Writing—review & editing.

Caroline Junge: Conceptualisation; Resources; Supervision; Writing—review & editing.

Aoju Chen: Conceptualisation; Funding acquisition; Resources; Supervision; Writing—review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a Dutch Research Council VICI grant (VI.C.201.109) to Aoju Chen.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The numerical data extracted from the included studies are made available as a supplementary document in this manuscript.

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Andreeva

Demenko

Wolska

Möbius

Zimmerer

Jügler

Oleskowicz-Popiel

Trouvain

(2014). Comparison of pitch range and pitch variation in Slavic and Germanic languages. In Campbell

Gibbon

Hirst

(Eds.), Proceedings of the International Conference on Speech Prosody (pp. 776–780). International Speech Communication Association. https://doi.org/10.21437/speechprosody.2014-143

*Armbrüster

Mende

Gelbrich

Wermke

Götz

Wermke

(2021). Musical intervals in infants’ spontaneous crying over the first 4 months of life. Folia Phoniatrica et Logopaedica, 73(5), 401–412.

*Baeck

H. E.

de Souza

M. N.

(2007). A longitudinal study of the fundamental frequency of hunger cries along the first 6 months of healthy babies. Journal of Voice, 21(5), 551–559.

Cruttenden

(1997). Intonation (2nd ed., pp. 1–7). Cambridge University Press.

DePaolis

R. A.

Vihman

M. M.

Kunnari

(2008). Prosody in production at the onset of word use: A cross-linguistic study. Journal of Phonetics, 36(2), 406–422. https://doi.org/10.1016/j.wocn.2008.01.003

De Pijper

J. R

. (1983). Modeling British English intonation. Foris.

Eriksen

M. B.

Frandsen

T. F.

(2018). The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: A systematic review. Journal of the Medical Library Association: JMLA, 106(4), 420.

Estebas-Vilaplana

(2014). The evaluation of intonation: Pitch range differences in English and in Spanish. In Thompson

Alba‑Juez

(Eds.), Evaluation in context (pp. 179–194). John Benjamins Publishing Company. https://doi.org/10.1075/gest.8.3.02str

Esteve-Gibert

Prieto

(2013). Prosody signals the emergence of intentional communication in the first year of life: Evidence from Catalan-babbling infants. Journal of Child Language, 40(5), 919–944. https://doi.org/10.1017/S0305000912000337

10.

Gabrieli

Scapin

Bornstein

M. H.

Esposito

(2019). Are cry studies replicable? An analysis of participants, procedures, and methods adopted and reported in studies of infant cries. Acoustics, 1(4), 866–883.

11.

Gervain

Christophe

Mazuka

(2021). Prosodic bootstrapping. In Gussenhoven

Chen

(Eds.), The Oxford handbook of language prosody (pp. 563–573). Oxford Handbooks. https://doi.org/10.1093/oxfordhb/9780198832232.013.36

12.

Gervain

(2015). Plasticity in early language acquisition: The effects of prenatal and early childhood experience. Current Opinion in Neurobiology, 35, 13–20.

13.

Gervain

(2018). The role of prenatal experience in language development. Current Opinion in Behavioral Sciences, 21, 62–67.

14.

Ghio

Cara

Tettamanti

(2021). The prenatal brain readiness for speech processing: A review on foetal development of auditory and primordial language networks. Neuroscience & Biobehavioral Reviews, 128, 709–719.

15.

*Gratier

Devouche

(2011). Imitation and repetition of prosodic contour in vocal interaction at 3 months. Developmental Psychology, 47(1), 67.

16.

*Gregory

A. M.

(2013). Laryngeal aspects of infant language acquisition (Doctoral dissertation, La Trobe University).

17.

Grice

Baumann

Benzmüller

(2005). German intonation in autosegmental-metrical phonology. In: Jun

S-A.

(Ed.) Prosodic typology: The phonology of intonation and phrasing (pp. 563–573). Oxford University Press.

18.

Gussenhoven

Chen

(Eds.). (2021). The Oxford handbook of language prosody. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.001.0001

19.

*Gustafson

G. E.

Sanborn

S. M.

Lin

H. C.

Green

J. A.

(2017). Newborns’ cries are unique to individuals (but not to language environment). Infancy, 22(6), 736–747.

20.

Khalilzad

Tadj

(2023). Using CCA-fused cepstral features in a deep learning-based cry diagnostic system for detecting an ensemble of pathologies in newborns. Diagnostics (Basel), 13(5), 879. https://doi.org/10.3390/diagnostics13050879

21.

*Kottmann

Wanner

Wermke

(2023). Fundamental frequency contour (melody) of infant vocalizations across the first year. Folia Phoniatrica et Logopaedica, 75(3), 177–187.

22.

*Lind

Wermke

(2002). Development of the vocal fundamental frequency of spontaneous cries during the first 3 months. International Journal of Pediatric Otorhinolaryngology, 64(2), 97–104.

23.

Lingle

Wyman

M. T.

Kotrba

Teichroeb

L. J.

Romanow

C. A.

(2012). What makes a cry a cry? A review of infant distress vocalizations. Current Zoology, 58(5), 698–726.

24.

Long

H. L.

Ramsay

Griebel

Bene

E. R.

Bowman

D. D.

Burkhardt-Reed

M. M.

Oller

D. K.

(2022). Perspectives on the origin of language: Infants vocalize most during independent vocal play but produce their most speech-like vocalizations during turn taking. PLoS One, 17(12), e0279395. https://doi.org/10.1371/journal.pone.0279395

25.

*Mampe

Friederici

A. D.

Christophe

Wermke

(2009). Newborns’ cry melody is shaped by their native language. Current Biology, 19(23), 1994–1997.

26.

*Manfredi

Viellevoye

Orlandi

Torres-García

Pieraccini

Reyes-García

C. A.

(2019). Automated analysis of newborn cry: Relationships between melodic shapes and native language. Biomedical Signal Processing and Control, 53, 101561.

27.

Manigault

A. W.

Sheinkopf

S. J.

Carter

B. S.

Check

Helderman

Hofheimer

J. A.

McGowan

E. C.

Neal

C. R.

O’Shea

Pastyrnak

Smith

L. M.

Everson

T. M.

Marsit

C. J.

Dansereau

L. M.

DellaGrotta

S. A.

Lester

B. M.

(2023). Acoustic cry characteristics in preterm infants and developmental and behavioral outcomes at 2 years of age. JAMA Network Open, 6(2), e2254151. https://doi.org/10.1001/jamanetworkopen.2022.54151

28.

Menn

K. H.

Männel

Meyer

(2023). Phonological acquisition depends on the timing of speech sounds: Deconvolution EEG modeling across the first five years. Science Advances, 9(44), eadh2560. https://doi.org/10.1126/sciadv.adh2560

29.

Myrberg

Riad

(2015). The prosodic hierarchy of Swedish. Nordic Journal of Linguistics, 38(2), 115–147.

30.

Nallet

Gervain

(2021). Neurodevelopmental preparedness for language in the neonatal brain. Annual Review of Developmental Psychology, 3(1), 41–58.

31.

Oller

D. K.

Ramsay

Bene

Long

H. L.

Griebel

(2021). Protophones, the precursors to speech, dominate the human infant vocal landscape. Philosophical Transactions of the Royal Society B: Biological Sciences, 376(1836), 20200255. https://doi.org/10.1098/rstb.2020.0255

32.

Ortiz Barajas

M. C.

Guevara

Gervain

. (2021). The origins and development of speech envelope tracking during the first months of life. Developmental Cognitive Neuroscience, 48, 100915. https://doi.org/10.1016/j.dcn.2021.100915

33.

Ortiz Barajas

M. C.

Gervain

. (2021). The role of prenatal experience and basic auditory mechanisms in the development of language. In M. D. Sera & M. Koenig (Ed.), Minnesota symposia on child psychology: Human communication: Origins, mechanisms, and functions (Vol. 40, pp. 88–112). Wiley.

34.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

... Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

35.

Pierrehumbert

J. B.

Steele

S. A.

(1989). Categories of tonal alignment in English. Phonetica, 46(4), 181–196. https://doi.org/10.1159/000261817

36.

Pisanski

Bryant

G. A.

Cornec

Anikin

Reby

(2022). Form follows function in human nonverbal vocalisations. Ethology Ecology & Evolution, 34(3), 303–321.

37.

Posit team. (2023). RStudio: Integrated development environment for R. Posit Software, PBC. http://www.posit.co/

38.

*Prochnow

Erlandsson

Hesse

Wermke

(2019). Does a ‘musical’ mother tongue influence cry melodies? A comparative study of Swedish and German newborns. Musicae Scientiae, 23(2), 143–156.

39.

*Shinya

Kawai

Niwa

Imafuku

Myowa

(2017). Fundamental frequency variation of neonatal spontaneous crying predicts language acquisition in preterm and term infants. Frontiers in Psychology, 8, 291209

40.

Tricco

A. C.

Lillie

Zarin

O’Brien

K. K.

Colquhoun

Levac

Moher

Peters

M. D. J.

Horsley

Weeks

Hempel

Alexander

P. E.

McGowan

Garritty

Lewin

Godfrey

C. M.

Macdonald

M. T.

Stewart

L. A.

Kellam

Straus

S. E.

(2018). PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Annals of Internal Medicine, 169(6), 467–473. https://doi.org/10.7326/M18-0850

41.

Tufanaru

Munn

Aromataris

Campbell

Hopp

(2017). Chapter 3: Systematic reviews of effectiveness. In Aromataris

Munn

(Eds.), JBI Reviewer’s Manual (pp. 72–134). JBI. https://reviewersmanual.joannabriggs.org/ https://doi.org/10.46658/JBIRM-17-03

42.

Vermillet

A. Q.

Tølbøll

Litsis Mizan

Skewes

J. C.

Parsons

C. E.

(2022). Crying in the first 12 months of life: A systematic review and meta-analysis of cross-country parent-reported data and modeling of the ‘cry curve’. Child Development, 93(4), 1201–1222.

43.

*Wermke

Robb

M. P.

(2008). Fundamental frequency of neonatal crying: Does body size matter? Journal of Voice, 24(4), 388–394.

44.

*Wermke

Cebulla

Salinger

Ross

Wirbelauer

Shehata-Dieler

(2021). Cry features of healthy neonates who passed their newborn hearing screening vs. those who did not. International Journal of Pediatric Otorhinolaryngology, 144, 110689.

45.

*Wermke

Leising

Stellzig-Eisenhauer

(2007). Relation of melody complexity in infants’ cries to language outcome in the second year of life: A longitudinal study. Clinical Linguistics & Phonetics, 21(11–12), 961–973.

46.

*Wermke

Mende

Manfredi

Bruscaglioni

(2002). Developmental aspects of infant’s cry melody and formants. Medical Engineering & Physics, 24(7–8), 501–514.

47.

*Wermke

Pachtner

Lamm

Voit

Hain

Kärtner

Keller

(2013). Acoustic properties of comfort sounds of 3-month-old Cameroonian (Nso) and German infants. Speech, Language and Hearing, 16(3), 149–162.

48.

*Wermke

Ruan

Yun

Dobnig

Stephan

Wermke

Chang

Liu

Hesse

Shu

(2017). Fundamental frequency variation in crying of Mandarin and German neonates. Journal of Voice, 31(2), 255.e25–255.e33.

49.

Wermke

Teiser

Yovsi

R. D.

Kohlenberg

P. J.

Wermke

Robb

Keller

Lamm

(2016). Fundamental frequency variation within neonatal crying: Does ambient language matter? Speech, Language and Hearing, 19(4), 211–217.

50.

Wermke

(2015). Neonatal crying behaviors. In Wright

J. D.

(Ed.), International encyclopedia of the social and behavioral sciences (2nd ed., Vol. 16, pp. 475–478). Elsevier.

51.

Whalen

D. H.

Levitt

A. G.

Wang

(1991). Intonational differences between the reduplicative babbling of French- and English-learning infants. Journal of Child Language, 18(3), 501–516. https://doi.org/10.1017/S0305000900011226

52.

Wickham

(2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://ggplot2.tidyverse.org

53.

Willems

(1982). English intonation from a Dutch point of view: An experimental-phonetic investigation of English intonation produced by Dutch native speakers. Intercontinental Graphics.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB

The Influence of Prenatal Language Exposure on the Use of Pitch in Newborns’ Vocalisations: A Systematic Review

Abstract

Keywords

Introduction

Methodology

Protocol and Registration

Eligibility Criteria

Information Sources

Search

Selection of Sources of Evidence

Data Charting Process

Data Synthesis

Results

Study Characteristics of Included Studies

Pitch Parameters Analysed and Outcomes 4

Static Pitch Variation Parameters

Mean Pitch and Standard Deviation

Maximum Pitch, Minimum Pitch, and Pitch Span

Pitch Fluctuation

Pitch Contour Complexity Parameters

Melodic Complexity Index

Complexity of Melodic Intervals

Pitch Contour Shape Parameters

Discussion

Is the Use of Pitch in Newborns’ Vocalisations Influenced by Prenatal Exposure to the Prosody of Their Native Language?

Static Pitch Variation Parameters

Pitch Contour Complexity Parameters

Pitch Contour Shape Parameters

Recommendations for Future Research

Representative Data

Purposeful Selection of Pitch Parameters

Coherency Across Studies

Conclusion

Supplemental Material

sj-docx-1-fla-10.1177_01427237251372203 – Supplemental material for The Influence of Prenatal Language Exposure on the Use of Pitch in Newborns’ Vocalisations: A Systematic Review

Footnotes

Acknowledgements

ORCID iDs

Ethical Considerations

Consent to Participate

Consent for Publication

Author Contributions

Funding

Declaration of Conflicting Interests

Data Availability Statement

Supplemental Material

Notes

References

Supplementary Material

Pitch Parameters Analysed and Outcomes⁴