Abstract
Background:
Subtle progressive changes in speech motor function and cognition begin prior to diagnosis of Huntington’s disease (HD).
Objective:
To determine the nature of listener-rated speech differences in premanifest and early-stage HD (i.e., PreHD and EarlyHD), compared to neurologically healthy controls.
Methods:
We administered a speech battery to 60 adults (16 people with PreHD, 14 with EarlyHD, and 30 neurologically healthy controls), and conducted a cognitive test of processing speed/visual attention, the Symbol Digit Modalities Test (SDMT) on participants with HD. Voice recordings were rated by expert listeners and analyzed for acoustic and perceptual speech features.
Results:
Listeners perceived subtle differences in the speech of PreHD compared to controls, including abnormal pitch level and speech rate, reduced loudness and loudness inflection, altered voice quality, hypernasality, imprecise articulation, and reduced naturalness of speech. Listeners detected abnormal speech rate in PreHD compared to healthy speakers on a reading task, which correlated with slower speech rate from acoustic analysis and a lower cognitive performance score. In early-stage HD, continuous speech was characterized by longer pauses, a higher proportion of silence, and slower rate.
Conclusion:
Differences in speech and voice acoustic features are detectable in PreHD by expert listeners and align with some acoustically-derived objective speech measures. Slower speech rate in PreHD suggests altered oral motor control and/or subtle cognitive deficits that begin prior to diagnosis. Speakers with EarlyHD exhibited more silences compared to the PreHD and control groups, raising the likelihood of a link between speech and cognition that is not yet well characterized in HD.
INTRODUCTION
Huntington’s disease (HD) is a hereditary neurodegenerative disorder characterized by motor dysfunction, cognitive dementia, and psychiatric disturbances [1, 2]. Dysarthria is common in people with HD [3]. Subtle changes in speech can be detected prior to a clinical diagnosis of HD, which are thought to be subtle and so far have only been detected using computational acoustic analysis software [4–6]. Objective acoustic variables to measure slight deviations in speech performance [7]. Whether speech differences in the premanifest phase of HD (PreHD) can be detected by human listeners has not yet been studied [8]. Given that voice and speech production is multi-dimensional, a more complete account of vocal functions requires a combination of subjective auditory-perceptual evaluation (i.e., listener ratings) and objective acoustic analysis [9–11].
An auditory-perceptual assessment is a formalized method used by expert listeners (i.e., speech pathologists) to describe the nature and degree of deviation of a speaker’s speech and voice quality from expected norms. Perceptual reports of symptomatic HD indicate deficits in phonatory, prosody, respiratory, and resonance subsystems of speech production [3, 12–15]. Findings regarding the nature and degree of speech impairment at each symptomatic HD stage (i.e., early, middle, late HD) are varied and inconsistent due to mixed study design and methods [8]. It should be noted that auditory-perceptual evaluation is commonly used as a clinical voice assessment method and for documentation of voice disorders, but the perceptual and subjective nature of voice quality can limit inter- and intra-listener reliability [11]. Auditory-perceptual assessment, therefore, is often accompanied with computational acoustic speech analysis when evaluating voice samples.
The acoustic properties and perceptual features of speech in HD have been examined with other clinical measures such as cognition [4, 16]. Deterioration in cognitive functions, most commonly processing speed and attentional deficits, have been recognized as a reliable clinical sign of disease onset in PreHD [17–20]. Irregularity in vocal fold vibration patterns correlate with lower cognitive scores derived from subtests in the Unified Huntington’s Disease Rating Scale (UHDRS) [21], including the phonemic verbal fluency test, Symbol Digit Modalities Test (SDMT), Stroop color, Stroop word, and Stroop interference subtests [22]. There is, however, no information on how these cognitive scores relate to speech measures that require relatively greater cognitive-linguistic processing (e.g., speech rate and pauses).
The primary goal of this study was to determine whether subtle speech changes in PreHD were audible to expert listeners, by comparing speech in PreHD to healthy controls and people with EarlyHD. Our analysis included perceptual domains, rated by listeners, specifically articulation, voice quality, resonance, prosody, naturalness, and intelligibility, as well as speech-timing measures, which we analyzed acoustically. Finally, we examined the relationship between speech and cognitive performances.
MATERIALS AND METHODS
Subjects
Sixty participants were recruited (30 people with HD CAG expansion and 30 age-matched healthy controls). Potential controls were excluded if they reported a history of neurological, speech or language disorder. Potential HD participants were excluded if they presented with: any other neurological disease other than HD; clinical symptoms other than that resulting from HD; a history of communication disorder; a history of alcohol or drug abuse; or a history of learning disability and/or intellectual impairment. Two groups of participants with the HD CAG expansion (16 people with PreHD and 14 people with EarlyHD) were recruited from Monash University Huntington’s Disease Registry, St George’s Health Service Huntington’s Disease Clinic and Huntington’s Victoria, Australia. Due to differences in average age in the PreHD and EarlyHD groups, from our overall control sample, we divided controls into two age-matched groups (Control-A and Control-B). We computed disease burden scores (DBS) using a formula (age x [CAG repeat –35.5]) for CAG-expanded participants, which correlates pathological progression and striatal damage [23] (see Table 1 for clinical characteristics of HD participants).
Characteristics of participants with PreHD and EarlyHD
ISCED, International Standard Classification of Education; DBS, disease burden scores; UHDRS-TMS, United Huntington’s Disease Rating Scale total motor score; SD, standard deviation; DCL, Diagnostic confidence level.
The PreHD group consisted of people who were genetically confirmed to have the HD CAG expansion (39 repeats or more), but had not yet displayed motor signs sufficient to warrant a clinical diagnosis of HD. Diagnostic confidence levels (DCLs) are based on administration of the United Huntington’s Disease Rating Scale (UHDRS) and are aligned with the confidence a clinician has in diagnosing HD in a particular participant (0 = no abnormalities; 1 = non-specific motor abnormalities; 2 = motor abnormalities that may be signs of HD (50–89% confidence); 3 = motor abnormalities that are likely signs of HD (90–98% confidence); and 4 = motor abnormalities that are unequivocal signs of HD) [21]. Classification of PreHD was based on DCL≤3. The PreHD group was age-matched to 16 healthy participants without neurological disorders in Control-A group (age range 25 to 72 years, mean = 42.38, SD = 12.09, male = 44%) for comparison (Table 1). PreHD and Control-A groups were similar in age (p = 0.98). The EarlyHD group included 14 symptomatic participants with DCL of 4 as assessed by a neurologist specialized in movement disorders. EarlyHD group was age-matched to 14 neurologically healthy individuals in Control-B group (Control-B: age range 41 to 71, mean age = 56.29, SD = 10.32, male = 64%) (Table 1). The EarlyHD and Control-B groups were also similar in age (p = 0.95). The International Standard Classification of Education (ISCED) was used to compare educational levels and related qualification levels between participant groups (PreHD: mean = 4.08, SE = 0.31; EarlyHD: mean = 3.44, SE = 0.33; Control: mean = 4.36, SE = 0.25) [24]. There was no significant difference on ISCED mean values be-tween participant groups.
Materials and stimuli
A formalized speech battery was administered with each participant individually in a quiet setting. All participants were asked to (i) sustain a vowel sound /a/, (ii) read aloud a phonetically balanced passage –The Grandfather passage [25], and (iii) produce a monologue. The order of the speech tasks administered was consistent across all participants. The three speech tasks and corresponding acoustic measures have known stability, reliability and sensitivity with repeated exposure and application of the stimuli (i.e., practice effects) [26]. Speech samples were recorded using a laptop computer (PC) (Hewlett-Packard, Palo Alto, CA) with basic factory settings and a Sennheiser PC 135 USB unidirectional head-mounted microphone (Sennheiser Communications, Solrød Strand, Denmark) (minimum sensitivity of –38 dB and a frequency range of 80 Hz–15 kHz), which was positioned at a 45 angle, 8 cm from the mouth [27]. All the data were sampled at 44.1 kHz, coupled with quantization at 16 bits. Data were recorded and segmented using Audacity software (version 1.2.6). In addition to perceptual evaluation of speech and voice, we used an automated script to conduct acoustic analysis, including speech-timing measures for the reading and monologue tasks (See Table 2 for Descriptions of speech-timing measures). The automated script identified the silence intensity contour using a modified version of techniques published previously [28, 29]. Pause sections that were shorter than 15 ms were classified as speech and concatenated with the adjacent speech sections. Speech sections that were shorter than 30 ms were classed as pauses and concatenated with the adjacent pauses [29]. We also removed interjections (i.e., throat clearing, fillers) as part of the pre-processing procedure of speech samples. Detailed explanation of the specific acoustic methodologies adopted in this study can be sourced from previous published papers [26, 28–30].
Descriptions of speech-timing measures in connected speech tasks
Blinded expert listeners (perceptual) ratings of speech
Three speech-language pathologists rated speech samples of sustained vowel and reading tasks perceptually. These raters were practicing speech pathologists with Masters qualification and/or PhDs, with more than 8 years of clinical experience. The raters were blinded to participant group clinical status, age of participants, and other rater’s scores. Raters evaluated speech samples using a five-point scale modified from the Mayo Dysarthria Rating Scale (0, normal; 1, subclinical; 2, mild; 3, moderate; 4, severe impairment) across 22 perceptual speech features across pitch, loudness, voice quality, resonance, prosody and articulation sub-systems [31, 32]. Summative measures of intelligibility (ability to be understood) and naturalness (deviation from healthy norm) were also rated using the same scale. Each participant’s speech was given a rating based on all their speech samples. Where there was disagreement between raters, a consensus was reached through discussion.
Cognitive assessment
All participants with the expanded HD gene completed the oral version of the Symbol Digit Modalities Test (SDMT) [33] for an assessment of attention, executive function, visual scanning, tracking, and processing speed. Participants were given a 90 second limit to say the number corresponding to a series of abstract symbols, as rapidly as possible, as per the coding key. Scoring was based on the number of correct responses out 110.
Statistical analysis
The degree of agreement was calculated between raters using a Two-Way Random Consistency intraclass correlation coefficient (ICC 2, 1). Listener-based perceptual speech ratings were analyzed by a nonparametric test of gamma coefficient, which investigated the link between groups (PreHD, EarlyHD, Control-A, Control-B) and judgements of speech impairment severity (normal, sub-clinical, mild, moderate, severe). We used the Spearman Rank method to examine the degree of association between perceptual ratings and acoustic outcomes.
For the statistical analysis of acoustic data, we used the Levene’s test to assess equality of variances between groups in their acoustic speech outcomes. Results from the Levene’s test revealed equal variances between groups. A one-way analysis of variance (ANOVA) was then used to evaluate speech-timing differences between the three independent groups (PreHD vs. Control-A, EarlyHD vs. Control-B, PreHD vs. EarlyHD). All statistical analyses were performed using Statistical Package for the Social Science (SPSS Statistics 26).
RESULTS
Auditory-perceptual profile of speech in PreHD and EarlyHD
Agreement between raters was 72.2% (ICC =0.72). Perceptually, speakers in the PreHD group exhibited abnormal pitch level (p = 0.007), reduced loudness (p = 0.007), dysphonic voice quality (p < 0.05), hypernasality (p = 0.007), abnormal speech rate (p < 0.001), imprecise articulation (p = 0.007), and reduced naturalness of speech (p = 0.004) compared to their age-matched control group (Table 3). Most of these speech deviations were perceived at the sub-clinical level rather than clinically dysarthric (i.e., mild to severe). Speakers in the EarlyHD group demonstrated mild-to-moderate dysarthric features in their speech, including unchanged or gradual reduction in loudness (p < 0.01), dysphonic voice quality such as harsh, hoarse, breathy, and strained voice quality (p < 0.05), abnormal prosody (p < 0.01), and articulatory breakdowns and speech-sound distortions (p≤0.001) compared to their matched-control group (Table 3). Overall, speech in EarlyHD was less intelligible (p = 0.005) and less natural (p < 0.001) compared to PreHD and healthy control groups (see Fig. 1). Table 3 shows statistical comparison outcomes and Supplementary Table 1 presents raw data on listener-based speech rating.
Comparison of perceptual characteristics between HD and controls groups
aStatistical significance could not be calculated because perceptual ratings between groups were the same or constant, therefore, the comparison between groups were not significantly different from each other. *p < 0.05; **p≤0.01; ***p≤0.001 at α= 0.05 level.

Frequency of speech deficits (in percentages) across PreHD, EarlyHD, and control participants based on perceptual assessment. aParticipants from Control-A and Control-B were combined in Fig. 1 (n = 30). Significant difference between PreHD and control groups: *p < 0.05; **p≤0.01; ***p≤0.001. Significant difference between EarlyHD and control groups: *p < 0.05; **p≤0.01; ***p≤0.001.
Computational acoustic analysis indicated the rate of speech on reading tasks significantly differed between the PreHD group and healthy control group. The effect of group on speech rate in PreHD suggested that the reduced rate of speech was related to disease stage (p < 0.001) (Table 4). In addition to reduced speech rates, people with EarlyHD demonstrated longer pauses and a higher proportion of silence compared to the PreHD and control groups on reading and monologue tasks (p > 0.05) (Table 4). PreHD and control groups performed better than EarlyHD group on oral SDMT (p < 0.001). Faster speech rate on the reading task correlated with higher oral SDMT score (PreHD: r = 0.72, p = 0.01; EarlyHD: r = 0.54, p = 0.04). The relationships between perceptual and acoustic features of speech-timing, and between SDMT and acoustic speech outcomes are described in Supplementary Tables 2 and 3.
Group comparisons on speech-timing measures in reading and monologue tasks
MD, mean difference; SE, standard error; 95% CI, 95% confidence interval. aPreHD group: age range 25 to 71 (mean = 42.06, SD = 11.50). Control-A group: age range 25 to 72 (mean = 42.38, SD = 12.09). bEarlyHD group: Age range 43 to 73 (mean = 56.86, SD = 10.2). Control-B group: age range 41 to 71 ((mean = 56.29, SD = 10.32).
DISCUSSION
Subtle but perceptible differences in speech were detected by expert listeners in PreHD. Most of the speech differences detected in PreHD were not severe enough to warrant a diagnosis of dysarthria and were considered subclinical. Subclinical differences were observed across the phonation, resonance, and prosody speech domains. Perceptually, speech in PreHD was characterized by subtle reductions and less variation in pitch and loudness, abnormal speech rate, changes in voice quality, hypernasality, and imprecise articulation. Subjective observations of slowed speech rate were in line with objective speech-timing outcomes from acoustic analytics.
Speech in EarlyHD was characterized by sub-clinical to mild deficits across most perceptual features of speech production, including loudness, voice quality, resonance, articulation, speech intelligibility, and naturalness. Subjective observations of slowed speech were in line with objective speech-timing outcomes from acoustic analytics. On the monologue task, the EarlyHD group also demonstrated a higher percentage of silence compared to the control group. Acoustic findings in our study were consistent with results from a previous study investigating speech rate and silence ratio on a reading task [34]. It is notable that some perceptual speech features were rated as more severely impacted in the PreHD group compared to the EarlyHD group (e.g., reduced loudness, abnormal pitch level). In these instances, the variability in severity ratings may reflect individual differences in the HD population. It may also indicate that difference domains of performance may progress differently in participants with PreHD and Early HD. For example, motor deficits and decline in daily living skills are present in participants with EarlyHD, but speech might not have progressed linearly to these other symptoms or domains in some participants. Overall, findings from the current study showed that the occurrence of dysarthric features was much lower in the PreHD group compared to the EarlyHD.
The link between cognition and speech-timing in HD
Slower speech rate in participants with the HD gene expansion is associated with a common cognitive measure known for its sensitivity in HD, the SDMT, suggesting that speech rate declines in parallel with cognitive decline in HD. Another study investigating people with multiple sclerosis have also reported similar association between cognitive functions and variances in speech-timing using the Minimal Assessment of Cognitive Function [35]. Slower speech rate is also observed in prodromal idiopathic rapid eye movement (REM) sleep behavior disorder, which is associated with a higher risk of cognitive impairment [36]. In the HD population, PreHD can manifest in deficits of executive function, involving working memory and cognitive control [19, 37, 38]. Data from the PREDICT-HD study suggests that several cognitive measures track disease progression in PreHD [37]. Speech production involves various cognitive-linguistic processes such as lexical (word) processing, syntax system (grammar) processing, and phonemes (speech sounds) encoding [39, 40]. Cognitive decline, specifically cognitive processes theorized to be involved in linguistic planning and formulation such as verbal working memory and attention [41–43], may negatively compromise speech-timing performance in speakers with PreHD and EarlyHD.
Limitations
Future research is needed in HD to distinguish how speech production relates independently to the motor and cognitive changes occurring in HD. The procedure for eliciting responses in the oral SDMT presents a challenge for individuals with a real or potential speech impairment, as they are required to respond to a timed test orally which in turn is likely to influence their performance [44, 45]. Given that performance on oral cognitive tasks is thought to be independent from oral motor control, the precise impact of cognition versus oral motor execution on cognitive task performance requires further delineation. Future studies should mitigate the impact of oral motor control in neuropsychological testing, as well as designing a more comprehensive test battery to measure specific cognitive domains other than attention and processing speed from the oral SDMT, such as executive function. Other cognitive tools that include both accuracy and speed of performance might be beneficial, accuracy of performance should not be impacted by oral motor or fine motor control in upper limb movement. Future studies can also evaluate the relationship between speech and cognition in healthy controls which could present differently compared to the HD population.
Another limitation of the study relates to the potential effect of mental health disorders on speech rate. Depression is one of the most common psychiatric symptoms in HD and it can occur in the premanifest phase of HD [46–48]. Individuals with depression and anxiety have shown to display slower speech rate and increased pauses [49], yet prevalence of depression was not measured in this study.
CONCLUSION
The present study provides a comprehensive listener-based description of speech and voice characteristics of individuals with PreHD. Early changes in speech are detectable by expert listeners and these subtle differences could present potential treatment targets. Early speech intervention may include compensatory approaches such as communication partner training or implementation of communication strategies for people with PreHD. Treatments may aim to maintain or improve intelligibility, naturalness of speech characteristics, and maximize the effectiveness of the individual’s communication. Our findings of audible, premanifest speech changes also highlight the sensitivity of some cognitive-linguistic processes and/or motor speech control to disease stage. Researchers and clinicians may become more aware of specific audible signs in people with PreHD and consider speech as multidimensional marker of disease. Future investigations on the link between cognition and speech production may provide insight into the etiology of speech symptom in PreHD, differentiating deficits arising from cognitive-linguistic processes versus motor speech constraints.
Footnotes
ACKNOWLEDGMENTS
Prof Vogel is funded by National Health and Medical Research Council, Australia Dementia Fellowship (#1135683). Prof Julie Stout is funded by a National Health and Medical Research Council, Australia, Investigator Grant (#1173472)
CONFLICT OF INTEREST
Prof. Vogel is the Chief Science Officer of Redenlab, speech biomarker company. No other authors have any conflicts of interest to report
