Abstract
Background
Autism spectrum disorder has been associated with atypical voice characteristics and prosody. In the scientific literature, four different aspects of atypical speech production in autism spectrum disorder have been highlighted; voice quality together with the prosodic aspects pitch, duration and intensity. Studies of prosody in autism spectrum disorder have almost exclusively used perceptual methods. Recently, some studies have used acoustic analyses. In these studies, it has been pointed out that the acoustic differences found are not necessarily perceived as atypical by listeners, which is why it is important to let listeners evaluate perceptual correlates to acoustic findings. The aims of this study were to use both perceptual and acoustic analyses to study prosodic production in children with autism spectrum disorder and to examine if voice and speech characteristics could be used as clinical markers for autism spectrum disorder.
Method
Eleven children within normal range of intelligence diagnosed with autism spectrum disorder and 11 children with typical development participated. Every child was recorded telling a story elicited with the expression, reception and recall of narrative instrument. Excerpts of one minute were extracted from the audio files creating the material underlying the perceptual ratings and in the acoustic analysis. An evaluation procedure, partly based on a standardized voice evaluation procedure developed for clinical practice in Sweden, was designed for the perceptual judgments and analysis. To capture critical prosodic variables, aspects of prosody based on characteristic features of Swedish prosody, prosodic features known to cause Swedish children with language impairment particular problems and current research of prosodic impairments in children with autism, were used as rating variables. The acoustic analysis was based on the four variables fundamental frequency (fo) average, fo range, fo variation and speech rate, together with the language production-related variable number of words per utterance.
Results
In the acoustic analysis, no differences were found with regards to fo-related variables or speech rate. However, the children in the autism spectrum disorder-group produced significantly more words per utterance than the typically developing children. The perceptual analysis showed no differences between the groups. Only three children with autism spectrum disorder were correctly identified as such. The narrative ability of these children, according to scores on the narrative assessment profile, was poorer than that of the other eight children. They were also more atypical in fluency and in speech rate. Given the small sample, the results should be interpreted with caution.
Conclusions and implications
The only difference in prosodic production discovered in the acoustic analysis, namely that children with autism spectrum disorder used more words per utterance than the children in the comparison group, was not detected in the perceptual assessment. This implies that it was not perceived as atypical by expert listeners. The results indicate difficulties in using voice and speech characteristics as markers of autism spectrum disorder in clinical settings. The correct identification of some of the children as having autism spectrum disorder or not also indicates that some children with autism spectrum disorder have a prosodic production sufficiently ‘atypical’ in combination with a limited ability to tell stories to be perceived.
Introduction
Autism spectrum disorder (ASD) has been associated with atypical voice characteristics and prosody ever since Kanner (1943) and Asperger (1944) published the first systematic studies of ASD. These descriptions were followed by others during the next decades (e.g. Goldfarb, Braunstein, & Lorge, 1956; Pronovost, Wakstein, & Wakstein, 1966; Simmons & Baltaxe, 1975).
Despite the consistent descriptions in the first studies from Kanner and Asperger onwards, the findings from more recent research on prosody and autism are contradictory and difficult to interpret. It is unclear whether this stems from methodological problems or from the heterogeneity among individuals with ASD (McCann & Peppé, 2003). Peppé, McCann, Gibbon, O’Hare and Rutherford (2007) found, for example, that expressive prosody varied greatly within a group with ASD. At the same time, all individuals exhibited at least one atypical prosodic trait when compared to typically developing children (TDC), while Grossman, Bemis, Skwerer, and Tager-Flusberg (2010) suggested that individuals with ASD did not seem to have specific difficulties in productive prosody. Different methods as well as differences in cognitive and linguistic level among the participants make it difficult to compare the opposing results. There are indications that the sometimes conflicting results might partly be explained by the heterogeneous symptoms that characterize autism.
In particular, there are four different aspects of atypical speech production in ASD reported in the scientific literature; voice quality together with the prosodic aspects pitch, duration and intensity (Fusaroli, Lambrechts, Bang, Bowler, & Gaigg, 2017; Titze, 1994). One of the most common descriptions is that the speech of individuals with ASD is monotonous including atypical pitch and pitch variation (Baltaxe, Simmons, & Zee, 1984; Fay & Schuler, 1980; Goldfarb, Goldfarb, Braunstein, & Scholl, 1972; Kaland, Swerts, & Krahmer, 2013; Paccia & Curcio, 1982; Pronovost et al., 1966). There are also descriptions indicating atypical voice characteristics, where children with ASD are described as having a hoarse or harsh voice (Baltaxe, 1981; Pronovost et al., 1966; Sheinkopf, Mundy, Oller, & Steffens, 2000), with a hyper nasal resonance (Shriberg, Paul, McSweeny, Klin, Cohen, & Volkmar, 2001). Goldfarb et al. (1972), Simmons and Baltaxe (1975) and Baltaxe (1981) described the speech as being too slow or too quick. Others have reported the speech as being too loud or too quiet, and sometimes shifting between these two extremes (Goldfarb et al., 1972; Pronovost et al., 1966; Shriberg et al., 2001; Shriberg, Paul, Black, & van Santen, 2011).
Different standardized diagnostic instruments also include atypical prosody as part of the diagnosis e.g. the Autism Diagnostic Interview (ADI; Lord, Rutter, & Le Couteur, 1994; Rutter, Le Couteur, & Lord, 2003) and the Autism Diagnostic Observation Schedule (ADOS) (Lord, Rutter, DiLavore & Risi, 2000; Nadig & Shaw, 2012) with a focus on the person’s use of prosody to express a certain content rather than on prosodic deviation in, e.g. pitch.
Studies of prosody in ASD have almost exclusively used perceptual methods, but in recent years, some studies have used acoustic analyses (e.g. Diehl, Watson, Bennetto, McDonough, & Gunlogson, 2009; Nadig & Shaw, 2012; Nakai, Takashima, Takiguchi, & Takada, 2014). Acoustic analysis suggests that individuals with ASD do not seem to have specific difficulties in productive prosody (Grossman et al., 2010; Kaland et al., 2013). However, based on the children’s longer expressions, Grossman et al. (2010) suggested that children with high functioning autism (HFA) have an atypical prosody production in natural settings. Kaland et al. (2013) found that adults with ASD had a narrower pitch range and were more monotonous in their speech, and Nakai et al. (2014) found more monotonous speech in their school-aged children with ASD, with the degree of monotonous speech being related to the degree of social interaction. It has also been pointed out that the acoustic differences found are not necessarily perceived as atypical by listeners, which is why it is important to let listeners evaluate perceptual correlates to acoustic findings (Diehl & Paul, 2013).
Referring to the research above the purpose of this study was, first, to examine prosodic characteristics in an objective acoustic analysis in school-aged children diagnosed with ASD within normal range of intelligence compared to TDC. As mentioned earlier, prosody is used as a characteristic of the disorder in clinical settings through the diagnostic instruments ADI and ADOS. Since prosody is considered important in such work, a second purpose was to explore whether a perceptual assessment made by experienced speech and language pathologists (SLPs) could capture any differences in productive prosody, rated as deviances from productive prosody in the typical population. Third, as another way of studying the usability of voice and speech characteristics as clinical markers for ASD, we examined to what extent group membership, i.e. an ASD-diagnosis or not, could be predicted from the ratings of productive prosody.
Method
Participants
Descriptive characteristics of the children in the two groups.
ASD: autism spectrum disorder; M/F: quotient male/female; SD: standard deviation.
Materials
As measures of language comprehension, the TROG-2 (Bishop, 2003), and the vocabulary subtest from WISC IV (Wechsler, 2003) were used. Scores from TROG2 were converted to linguistic age in years. Based on voice recorded speech elicited by means of the Expression, Reception and Recall of Narrative instrument (ERRNI, Bishop, 2004), language production and narrative ability were assessed using the narrative assessment profile (NAP, Bliss, McCabe, & Miranda, 1998). In the ERRNI, the child is asked to look closely at a picture sequence consisting of 15 pictures, and, as a second step, produce a spoken story from the sequence, i.e. using visual cues. There are two different stories and for this study, the so-called Beach story was used. No other prompts than encouraging sounds to make the child continue are allowed. The test includes a ‘warming up’- picture to introduce the child to the task. All narratives were then transcribed using the Codes for the Human Analysis of Transcriptions (CHAT) format (https://talkbank.org/manuals/CHAT.pdf). The recordings were made with a video camera, Japan Victor Company (JVC), Everio GZ-MG335, hard disc camcorder, in quiet rooms at the CAP clinic. In the case of the children in the comparison group, the recordings were made in their schools, in quiet rooms.
Data analyses
Perceptual analysis
The authors of the original master thesis in speech and language pathology (Dotevall & Lendt, 2014) developed an evaluation procedure specifically designed for the perceptual judgments and analysis. Parts of the procedure were based on a standardized voice evaluation procedure developed for clinical practice in Sweden (the Stockholm Voice Evaluation Approach (SVEA), Hammarberg, 2000). Swedish is often referred to as a pitch accent language regarded as having a relatively complex prosody compared to English (Cruttenden, 1997). For example, pitch accent may constitute a discriminative feature between two segmentally identical words, like ‘stegen’ /′ste:gɛn/ (Eng. the steps) and ‘stegen’ /'ste:gɛn/ (Eng. The ladder). To capture critical prosodic variables, we included aspects of prosody partly based on characteristic features of Swedish prosody (Bruce, 2012) as well as prosodic features known to cause Swedish children with language impairment particular problems (Samuelsson & Nettelbladt, 2004). The selection of variables was checked against current research of prosodic impairments in children with autism (Fusaroli et al., 2017; McCann & Peppé, 2003; Titze, 1994), resulting in the variables: pitch, intonation, speech rate, intensity, length of utterance, timbre, nasality and fluency. The ratings varied from 1 to 2, 3 or 4 depending on question with a higher number indicating a greater deviation from TDC. The SLPs filled in a protocol with the eight rating scales, marking the intended figure. Each scale was introduced with a written instruction. Only the end points were assigned a name. For timbre and fluency, the scale was: No deviation, small, medium and large, i.e. from 1 to 4. Nasality was rated as existing (2) or not (1). We also asked the SLPs to rate the ‘overall impression’, where we asked the raters to try to disregard of details in the other variables. Overall impression of prosody was rated as adequate, atypical or very atypical. As a last question, the SLPs were asked to state if they thought that the child had an ASD-diagnosis or not.
The rating scale was pre-tested by two experienced SLPs, both specialists in voice. These SLPs did not take further part in the project. Excerpts of one minute each were extracted from the audio files, creating the material per child to be rated. The mean length of the produced stories was 139 seconds and the extracted sequences were identified as 30 seconds before and 30 seconds after the midpoint of the recordings, with some adjustments to avoid cutting the recording in the middle of an utterance. The ratings were made by three expert listeners, all three SLP specialists in voice. These experts were selected because of their experience of making perceptual analyses of voices, of children’s speech, of being used to making consensus judgements and of their experience with the actual type of rating scales. They were informed verbally and with a written instruction of the purpose of the study but had no knowledge of the children. First, the recorded voice of a child, the same one-minute excerpt that was used for the acoustic analysis, was assessed individually by each rater, using their experience of children’s speech in their ratings. The raters were allowed to repeat each recording as many times as they wanted but were not allowed to go back to an earlier recording. Thereafter, the ratings were discussed by the three raters until a consensus judgment was reached, in this phase without repetitions of the recordings. The speech excerpts were presented randomly, based on group belonging and gender.
Acoustic analysis
The audio signals were extracted from the video recordings by means of VideoLan Creator (VLC) Media Player (VideoLan Organization, n.d.). Sequences relevant for the current study were identified and converted from stereo to mono format in Audacity®, version 2.0.5. Transcription and annotation were then performed in Wavesurfer (Sjölander & Beskow, 2000), by the two aforementioned SLP students as part of their master’s thesis. The children’s utterances were identified with visual support from both waveforms and spectrograms, and orthographically transcribed. Hence, an utterance here is operationally defined as speech produced by the child and delimited by audibly and visually confirmed silence. Based on this manual annotation of utterances, relevant speech material was automatically extracted. The same one-minute sequences that were used for the perceptual analyses were then selected for further acoustic analyses.
The selection of variables included for acoustic analysis was motivated by previous reports indicating these features as being atypical in individuals with ASD. Whenever possible, acoustic variables correlating with the selected perceptual components were included. Hence, fo
Fundamental frequency (fo,) was extracted by means of Praat (Boersma, 2001), using default settings. Apart from the fo
The calculation of
Following the recommendation by Hubbard and Trauner (2007),
Acoustically and perceptually analysed components.
Consonant-to-vowel transitions per second.
Statistical analyses
For the statistical analyses, International Business Machines Statistical Package for Social Sciences (IBM SPSS) Statistics (ver. 23) was used. To minimize the risk for Type 1 errors, comparisons were made with one-way analysis of variance, p ≤ .05 instead of t tests (Hinkle, Wiersma, & Jurs, 2003). For non-parametric analyses, Mann-Whitney U test was used.
Because of small sample size, effect sizes, where applicable, were computed by means of Cohen’s d.
Results
Acoustic analysis.
ASD: autism spectrum disorder; SD: standard deviation.
Perceptual analysis.
ASD: autism spectrum disorder.
Note: Numbers within parenthesis show the end points of the rating scale, higher number higher deviation.
The third question regarded the ability to judge, with the perceptual ratings of productive prosody as a basis, if each child had an ASD-diagnosis or not. No one in the comparison group was judged to have an ASD-diagnosis. A correct judgment was made only for three of the children with an ASD-diagnosis, i.e. only three of the 11 children with an ASD-diagnosis were identified as such.
To examine if there was any difference among the children in the ASD-group that could explain correct identification or not, nonparametric statistical tests were performed. The analyses showed that the three children identified by raters as having ASD performed worse on NAP (Md = 13 versus Md = 15, U = 3.00, p = .033) and were more atypical in fluency (Md = 4 versus Md = 2, U = .500, p = .015) and speech rate (Md = 3 versus Md = 2, U = .000, p = .009) than the eight children who were not correctly identified as having ASD.
Discussion
In this study, both perceptual and acoustic analyses were used to study prosodic production in children with ASD and to examine if voice and speech characteristics could be used as clinical markers for ASD. Although there were significant differences between the study groups on measures of language, the only statistically significant difference between the groups was found in the acoustic analysis, consisting of children with ASD using more words per utterance than the children in the comparison group. Interestingly, this variable may be considered the variable most closely reflecting language production. The other acoustic variables were not sensitive to any group difference. The children with ASD produced longer utterances, in terms of the number of words, compared to the children in the comparison group. While this is in line with similar findings reported by Hubbard and Trauner (2007), it is contradictory to other reports (e.g. Park, Yelland, Taffe, & Gray, 2012; Thurber & Tager-Flusberg, 1993), where children with ASD have been found to produce shorter utterances compared to typically developing peers. There are only few studies using more natural settings like eliciting narratives in connection with acoustic analysis and results based on narrative tasks, such as in the present study. Results based on conversational speech in more interactive settings (as in Park et al., 2012), or in more controlled story completion tasks (as in Hubbard & Trauner, 2007) are not necessarily comparable. Moreover, the divergent results may also be explained by different definitions of what constitutes an utterance. In the present study, an utterance was defined as speech delimited by silence, whereas in most previous studies, the definition of utterance is more like a sentence, i.e. a unit defined by its syntactic, semantic and/or pragmatic features. However, if this difference can explain the divergent results, it also implies that the only statistically significant difference in the present study would disappear if changing our definition of utterance. An additional factor that may explain the seemingly contradictory results are differences in participant characteristics between the children in the present study and those participating in the other studies, e.g. intellectual level or gender.
As discussed by Grossman et al. (2010) although expressive speech of individuals with ASD may not be perceived as being inaccurate, it may be quantifiably different from the expressive speech of other individuals, for example, when analysed acoustically. These differences may contribute to the verbal expressions being perceived as atypical. The perceptual analyses in the present study, revealed only slight differences between the children with ASD and the children in the comparison group, corroborating the previously demonstrated finding (see e.g. McCann & Peppé, 2003) that perceived distinctive atypical prosodic and vocal characteristics of individuals with ASD are difficult to quantify operationally. In spite of the difference in the acoustic analysis in this study, the SLPs did not perceive the prosodic characteristics in the ASD group as atypical. This could mean that the differences were not sufficiently different to affect the children's prosodic production in a natural setting, a result similar to that of Diehl and Paul (2013). In addition, the excerpts to be assessed by the SLPs might have been too limited to get a sufficient picture of a possible prosodic deviance.
The third research question regarded the ability to predict the ASD diagnosis with the perceptual ratings of produced prosody as a basis. The expert listeners did not judge any child in the comparison group as having ASD. However, only three children with ASD were correctly judged as children with ASD while the other children with an ASD diagnosis (n = 8) were judged as TDC. This suggests that not all children with ASD have a general atypical produced prosody different enough to be discovered in a natural setting (McCann & Peppé, 2003). But, it also suggests that there are children with ASD whose prosodic and vocal expression is distinctive or atypical enough to be registered in a perceptual assessment. It is also worth noting that none of the children who were judged to have ASD had a comorbid diagnosis.
In the statistical analyses of the differences between the children who were correctly judged as having ASD and those who were wrongly judged to be typically developing, three statistically significant differences were found. The three children who were correctly identified as having ASD performed worse on the NAP and were more atypical in fluency and speech rate than the other eight children in the ASD-group. The latter two differences confirm the results of Grossmann et al. (2010) and suggest that the expert listeners noticed both some aspects of atypical prosodic production together with a limited ability to retell a story. The difference in NAP score highlights one important question, namely whether children with autism can use prosody to tell stories in a manner that increases the possibility for the listener to correctly understand the narrative or not. There seems to be something in the narratives that correctly attracts the attention of the raters when making the judgement. This suggestion is in line with the results in a study on children with language impairment where a subgroup of children with pragmatic problems showed prosodic deviance at the discourse level, assessed in brief conversational and narrative tasks (Samuelsson & Nettelbladt, 2004). It is possible that such features, for example, reflected as atypical placement or expression of word stress, would have been captured with acoustic measures not included in the present work. Diehl et al. (2009) found a larger prosodic within-participant variation in narratives of children with ASD than in those of TDC. They also found a relation to clinical judgment of impaired communication. However, like in our study, there were no statistically significant group differences in average fo, indicating that something else than pitch variation is needed to explain the within-participant variation. To further study prosodic production in children with ASD within normal range of intelligence, it would be interesting e.g. to have the child express different feelings by different intonation only, i.e. in line with the symptomatology of ASD according to DSM-5 (American Psychiatric Association, 2013).
Limitations
Although all recordings took place in quiet rooms, the quality of the recordings was not optimized for acoustic analysis. However, these conditions reflect common clinical conditions and contribute to ecological validity. Moreover, as the same audio files were used both for the perceptual and for the acoustic analysis, conditions were kept constant to ensure a mirroring analysis. The methodological choice to restrict the analysis to one-minute stretches of speech, although longer samples may have given listeners more cues in their rating decisions, are also assumed to reflect realistic clinical conditions. The conclusions should be interpreted cautiously since there were only 11 children in each group and only three raters. However, the effect sizes for the acoustic variables corroborate the conclusions.
The gender imbalance between the ASD group and the TD group – with higher boys: girls’ ratio in the ASD group – warrants careful interpretation of fo averages. The risk cannot be eliminated that a lower fo-level in boys compared to girls in these ages (Baken & Orlikoff, 2000) may have obscured a potential difference between the groups. However, Fusaroli et al. (2017) found in a recently published review that only two (Filipe, Frota, Castro, & Vicente, 2014; Sharda et al., 2010) out of 16 studies investigating pitch mean, reported a significant group difference with a higher pitch mean in the ASD groups. In seven of the 14 studies with nonsignificant results, the groups were matched for gender while there was no gender match in the only two reporting higher pitch mean in the ASD group. Based on these findings (Fusaroli et al., 2017), we conclude that the lack of difference in pitch average between groups in our study cannot be explained solely by gender imbalance.
Rather than being a limitation, it is also worth noting that the study concerns children with ASD within normal range of intelligence, a growing percentage of the total group of the population with an ASD diagnosis (Idring et al., 2015).
Implications
The acoustic analysis showed that children with ASD differed in speech production, using more words per utterance, compared to typically developing peers matched for chronological age. However, this difference was not discovered in the perceptual assessment by three experienced SLPs specialised in voice, and as such was not perceived as atypical by listeners. The listeners’ difficulty to detect differences between the children in the ASD and TDC groups is further found in their judgment of diagnosis where only three of the children with ASD were correctly judged as children with ASD. However, that three children with ASD were correctly identified also suggests that some children with ASD have an atypical prosodic production which can be perceived, but only at an individual level and, in the case of the children in this study, in combination with a limited ability to tell stories. The results indicate that there are difficulties even for specialized SLPs to detect and use voice and speech characteristics as clinical markers of ASD in clinical settings. This finding should be studied more thoroughly in relation to productive prosody, narrative ability and theory of mind (McCann, Peppé, Gibbon, O’Hare, & Rutherford, 2007).
Footnotes
Acknowledgements
The authors express their gratitude to the children who participated in the study and their parents. We also thank the students of speech and language pathology Nadia Chouaiby, MarieHelene Dotevall, Louise Lendt, Helena Moreau, Louise Mucchiano and Rebecca Rindhagen who collected the data.
Ethical statement
The study was approved by the Regional Ethical Review Board in Lund, Sweden and informed consent was collected in written form from parents and from all participating children.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
