Abstract
Keywords
Abbreviations
vowel space area first formant frequency second formant frequency.
INTRODUCTION
Subthalamic nucleus deep brain stimulation (STN-DBS) has potential to drastically improve motor functions, motor complications, non-motor symptoms, and the quality of life (QOL) in Parkinson’s disease (PD) patients [1, 2]. However, previous studies reported a higher incidence of dysarthria (low-intelligibility) in patients treated with STN-DBS than in those treated medically [1, 3–7]. Stuttering [8], excessive vocal fold closure, and respiratory over-drive [9] were reported after STN-DBS. Furthermore, stimulation at high voltage (4.0 V) [10], higher frequencies, or increased amplitude [11], high amplitude [12] significantly reduced speech intelligibility. Stimulation-induced dysarthria may be caused by current diffusion to the corticobulbar fibers [6, 13–15]. Some studies consider dysarthria to be caused by current diffusion to the structures located medially to the subthalamic nucleus (e.g., cerebellothalamic fibers, medial zona incerta, and pre-lemniscal radiations) as the cause of dysarthria [12, 16–19]. Tripoliti et al. [17] demonstrated that predictive factors for deterioration of speech intelligibility following STN-DBS were lower preoperative speech intelligibility, being on-medication, long PD duration, and medially-placed active contacts in the left hemisphere.
We previously demonstrated distinct phenotypes of speech and voice disorders in PD patients after STN-DBS, using auditory-perceptual assessment [3]. Factor analysis and subsequent cluster analysis classified PD patients with STN-DBS into five clusters according to their speech and voice disorder phenotypes: relatively good speech and voice function type; stuttering type; breathy voice type; strained voice type; and spastic dysarthria type. We also investigated voice features of PD patients after STN-DBS using acoustic analysis with a multi-dimensional voice program and laryngoscopic physiological analysis [4]. The characteristic voice and laryngeal findings in the STN-DBS group compared with the medical-therapy-alone were: (1) more widespread voice impairment, particularly in females; (2) poorer voice-related QOL; (3) a worse degree of voiceless (DUV) and strained voice; and (4) abnormal laryngeal muscle contraction. In particular, the laryngoscopic analysis showed that PD patients treated with STN-DBS had a higher incidence of incomplete glottal closure, hyperadduction of the false vocal folds, anteroposterior hypercompression, and asymmetrical glottal movement compared with medically-treated PD patients [6]. This suggests that STN-DBS potentially induces a dynamic change in the laryngeal/voice process, thereby diversifying or increasing the complex of the pattern of speech and voice disorders in PD patients. Furthermore, parkinsonian speech is also thought to include multidimensional impairment in speech processes (i.e., respiration, resonance, articulation, and prosody) [20]. In the present study, we focused on acoustic aspects of the articulation process. One way to determine articulation function is to acoustically analyze the formant frequency.
Vowels are primarily produced by steady position of the mouth/jaw, tongue, and lips. These factors configure the vocal tract resonating cavities, which amplify certain frequency bands in the vibration of the vocal folds. These enhanced frequency bands are called “formants”. Formants define individual vowels by their distinct peaks of acoustic energy. Previous studies have noted referred that the most relevant formant values for vowel perception and production are the first two formant frequencies [21, 22]. Formant frequency values have the potential to change in a predictable way as a function of the articulation organ and as a function of changes in the three-dimensional configuration of the vocal tract that resulting from the articulation organ range. The first formant frequency (F1) is influenced by lowering the mandible (aperture: mouth opening), controlled by the jaw and tongue position. F1 increases when the tongue is lowered, and/or the jaw is moved downward (i.e., vowel /a/). The second formant frequency (F2) mostly reflects the back-and-forth position of the tongue. F2 increases when the tongue moves forward (i.e., vowel /i/) and decreases as the tongue moves backward (i.e., vowel /a/ and /u/) [22]. Therefore, formant analysis may help in evaluating the actual distance of mouth/jaw and tongue movements within the articulation organ range [22, 23]. We used the F1 and F2 formants of the target vowels /a/, /i/, and /u/ to assess vowel articulation. These three vowels are often referred to as the corner vowels. The F1/F2 area of the three vowels used in the present analysis is called the “vowel triangle” and has been used as the value for the articulation organ range in previous studies. The F1 and F2 values of the three vowels form the apexes of a triangle representing the vowel space area (VSA). The three vowels /a/, /i/, and /u/ are used to produce the largest the VSA of all the vowels. The VSA is expressed in Hz2. Some studies assessed articulation organ range using the VSA [5, 24–27]. Formant values are sensitive indicators of the articulation function of patients with speech disorders and are frequently used to measure speech treatment effects and articulation impairments in patients with PD [5, 24–28]. McRae et al. [29] and Tjaden et al. [30] reported that vowel articulation is important for speech intelligibility. Reduced acoustic distinctiveness of vowels has been reported in studies of dysarthric speakers, including patients with PD.
Previous reports demonstrated that STN-DBS may ameliorate articulation in patients with PD, based on acoustic-formant analyses [5, 31]. However, these studies included limited numbers of patients, and no study has assessed the correlations between perceptual assessment and formant values. Therefore, the present study aimed to investigate articulation with a relatively large number of PD patients in both on- and off-stimulation conditions, by assessing formant frequency values using acoustic analysis combined with the auditory-perceptual assessment.
MATERIALS AND METHODS
Participants
Inclusion criteria for this study were as follows: 1) PD diagnosis based on the United Kingdom Parkinson’s Disease Society Brain Bank criteria [32]; 2) no further neurological diseases; 3) follow-up period of ≥6 months after subthalamic implantation; 4) Japanese as the native language; 5) absence of severe cognitive impairment or psychiatric disorders that may hinder speech assessment; 6) bilateral STN implantation at the Department of Neurosurgery, Nagoya University Hospital; and 7) agreement to undergo the on- and off-stimulation assessment. As shown in the paper that outlined the basic algorithm for programming STN-DBS [33], the stabilization period in our institution is around 6 months. Therefore, we defined 6 months after surgery as the inclusion criteria. A board-certified neurologist (TT) evaluated all PD patients using the Unified Parkinson’s Disease Rating Scale (UPDRS). A skilled, certified speech-language-hearing therapist (SLHT; YT) performed the speech analyses and cognitive function examinations using the Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MoCA).
We identified 158 consecutive patients who underwent bilateral STN implantation from 2005–2014. Of those, 35 patients refused the on- and off-stimulation assessment, 27 did not visit our hospital regularly at the time of this study, six were deceased, and six refused to participate. Patients with severe cognitive impairments (n = 9), severe psychiatric disorders (n = 4), further neurological diseases (n = 7), and severe speech disorders, such as an extremely low voice volume that hampered acoustic analysis (n = 8), were also excluded from the study. Finally, 56 patients (21 males, 35 females) participated in the present study as the STN-DBS group. Moreover, we recruited 41 PD patients (15 males,26 females) treated only with medication, who were matched for age, disease duration, motor function, cognitive function, and gender (medical-therapy-alone group). No patients were treated with Lee Silverman Voice Treatment (LSVT®); 8 patients (STN-DBS group, n = 6; medical-therapy-alone group, n = 2) were treated with general speech therapy or pacing boards. The levodopa equivalent daily dose (LEDD) was calculated based on the formula described by Tomlinson et al. [34] (Table 1). All participating patients were taking antiparkinsonian medication. Participants were assessed in the on-state under continued medication. Differences in the speech function between the on- and off-stimulation conditions were evaluated in all STN-DBS group participants. For the acoustic and auditory-perceptual analyses, speeches were first recorded in the on-stimulation condition and then 30 min after stopping stimulation. The DBS parameters in the STN-DBS group were shown in Table 2.
Written informed consent was obtained from all participants patients. This study adhered to the Ethics Guidelines for Epidemiological Studies endorsed by the Japanese government and was approved by the Ethical Committee of Nagoya University Graduate School of Medicine.
Speech analyses
Speech analyses using acoustic and auditory-perceptual measurements were performed for all participants. Speech samples were recorded in a sound-treated room and digitized using a voice recorder (ICD-SX813; Sony, Tokyo, Japan) at asampling rate of 44.1 kHz with 16-bit quantization. A microphone (ECM-MS907; Sony) was appropriately positioned to maintain a constant mouth-to-microphone distance of 15 cm during speech recording. Recorded speech samples were subsequently used for the perceptual and acoustic analyses.
Participants were asked to phonate the vowels /a/, /i/, and /u/, sustaining their habitual pitch and loudness for≥5 s. The digital voice recorder was coupled with a computerized speech lab system (Model 4400; KayPentax, Lincoln park, NJ, USA), and we used the Multi-Speech (KayPentax) program for acoustic formant analyses. The criterion for the capture process was an appropriation of a 3 s interval in the mid-portion of phonation, with the elimination of the first and last 25 ms of phonation. Vowel phonation of 3 s was easily performed by PD patients and was sufficiently long for reliable acoustic analysis. For the analyses of formant frequency, periodic voice signals were extracted using voice period marks, and 3 s formant values (F1 and F2) were automatically derived from the extracted periodic voice signals using the formant history of the Multi-Speech program. It should be noted that the formant values (F1 and F2) of the respective vowels were derived as median values. We observed abnormally and extremely high formant values (i.e. outliers) of formant values in some patients. To eliminate the influence of these outliers, we decided to employ median values instead of mean values. Vocal tract lengths and voice pitches differ between the genders, corresponding to gender differences in formant values [22]. Therefore, we categorized male and female patients separately for formant frequency acoustic analyses. The VSA was calculated based on the formula reported by Skodda et al. [26].
Statistical analysis
We used SPSS version 18 (PASW Statistics for Windows, Version 18.0. Chicago, IL, USA: SPSS Inc.) for the statistical analyses. Age, disease duration, motor function, cognitive function, and maximum phonation time were compared between the medical-therapy-alone and STN-DBS groups, using the Kruskal-Wallis test followed by the Steel–Dwass multiple comparison. The Steel–Dwass test was analyzed with R (http://www.r-project.org/). The acoustic and auditory-perceptual measurements of the STN-DBS and medical-therapy-alone groups were compared using Mann–Whitney U-tests. Correlations between acoustic and perceptual measurements were assessed using Spearman’s rank correlation coefficients. Changes in parameters between the on- and off-stimulation conditions were compared using the Wilcoxon signed-rank test. P values of <0.05 were considered statistically significant.
RESULTS
Acoustic parameters by formant frequency
The F1 value of the vowel /a/ (males, p < 0.05; females, p < 0.05) and the F2 value of the vowel /i/ (males: p < 0.05; females: p < 0.01) were significantly higher, while the F2 value of the vowel /u/ (males, p < 0.05; females, p < 0.05) was significantly lower in the STN-DBS group in the on-stimulation condition than in the medical-therapy-alone group (Fig. 1). However, the first two formant frequencies in three vowels were not significantly different between the medical-therapy-alone group and the STN-DBS group in the off-simulation condition. The VSA of the STN-DBS group in the on-stimulation condition was significantly larger compared with that of the medical-therapy-alone group (medical-therapy-alone group-males, 151959.6±56960.5 Hz2; -females, 471883.1±190475.5 Hz2 vs. STN-DBS group in the on-stimulation condition-males, 224699.6±83567.5 Hz2, p < 0.05; -females, 750788.1±295913.5Hz2, p < 0.001).
Moreover, acoustic parameters in a formant generally became exacerbated in the off-stimulation condition compared to the on-stimulation conditions in STN-DBS group of both genders. The F1 value of the vowel /a/ (males, p < 0.01; females, p < 0.001) and the F2 value of the vowel /i/ (males, p < 0.01; females, p < 0.01) were significantly lower; while the F2 value of the vowel /u/ (males, p < 0.01; females, p < 0.001) was significantly higher in the off- than in the on-stimulation condition in STN-DBS group of both genders. The VSA in the DBS off- was significantly reduced compared with that in the on-stimulation condition (off-stimulation condition-males, 164400.3±82757.7 Hz2, p < 0.001; -females, 512507.4±258126.5 Hz2, p < 0.001). The VSA of the STN-DBS group in the off-stimulation condition was not significantly different from that of the medical-therapy-alone group (males, p = 0.67; females, p = 0.44). According to an individual study, 89.8% (50/56) of the STN-DBS group showed large size of the VSA in the stimulation on- as compared to those in the off-stimulation condition for both genders. By contrast, the VSA of other patients (6/56) in the on- were smaller than the VSA in the off-stimulation condition. We compared the improvement group (50/56) and the reduced group (6/56) with regard to the patient backgrounds but we did not find any significant differences in motor function, cognitive function, disease duration, or the parameters of electrical stimulation between the both groups.
Auditory-perceptual parameters
In the STN-DBS group, the majority of the auditory-perceptual parameters (speech intelligibility, naturalness, imprecise consonants, abnormal rate, variable rate, excess loudness variation, and variable pitch) were significantly worse compared with those of the medical-therapy-alone group (Table 3).
Relationship between the VSA determined by acoustic analysis and auditory-perceptual measurements in the STN-DBS group
Based on the correlation data between the VSA determined by vowel acoustic analysis and auditory-perceptual measurements using AMSD in the STN-DBS group, there are strong correlations between the VSA and intelligibility/naturalness in the off-stimulation condition (males: intelligibility, r = –0.60, p < 0.01; naturalness, r = –0.69, p < 0.01; females: intelligibility, r = –0.41, p < 0.05; naturalness, r = –0.45, p < 0.01), but neither intelligibility or naturalness showed any significant correlations with the VSA in the on-stimulation condition (males: intelligibility, r = –0.33, p = 0.15; naturalness, r = –0.27, p = 0.24; females: intelligibility, r = 0.10, p = 0.59; naturalness, r = 0.04, p = 0.85).
DISCUSSION
In the present study, we characterized the influence of STN-DBS on vowel articulation assessed by acoustic and auditory-perceptual analyses in PD patients. We recruited a larger sample than used in the previous studies [5, 24], and demonstrated that STN-DBS improved maximal articulation range in both female and male patients. This indicates that STN-DBS potentially improves articulation movement. We found that formant-related articulation valuessignificantly differed between males and females [22]. Our previous acoustic analysis demonstrated that STN-DBS induced more widespread vocal changes in females than in males [4]. As there may be gender-based changes in stimulation-related articulation function, we assessed both genders and presented the data separately. Our results showed no significant correlation between VSA and speech intelligibility in the on-stimulation condition. In contrast, there was a significant correlation between VSA and speech intelligibility in the off-stimulation condition.
Martel Sauvageau et al. [5] reported that the mean VSA value in eight PD patients treated with STN-DBS was larger in the on- than in the off-stimulation condition, which is in consistent with our result. Gentil et al. [36] using load-sensitive devices and reported that articulation, assessed by force measurements and force rise time of the tongue and lips, was improved in PD patients who had undergone STN-DBS. In medically treated PD patients, hypokinesia may be responsible for the articulation characteristics of dysarthria, resulting in a smaller articulation working space for vowels as compared with healthy speakers [37]. There is some evidence that the formant frequency-related VSA of PD patients is smaller compared with those of healthy speakers [28, 38–40]. Therefore, STN-DBS may improve hypokinesia of articulation structures including the mouth/jaw and tongue, leading to better movement of these structures. However, our individual analyses showed that VSA changes after stopping stimulation were heterogeneous. Dromey et al. [24] showed similar results, demonstrating that the formant-related VSA was enlarged in the on- compared with the off-stimulation condition in two of six PD patients treated with STN-DBS. These results may be explained by the diversity of stimulation settings, electrode positions, and clinical backgrounds of PD patients.
There was a significant correlation between VSA and intelligibility in the off-stimulation condition in the both genders in our study. In previous studies, improvement in speech intelligibility was accompanied by enlarged VSA after speech rehabilitation [25, 41]. Moreover, the VSA value became larger when asked to speak clearly in PD patients [42]. These results support the correlation between speech intelligibility and the VSA in PD patients. However, in our study, the VSA was not significantly correlated with intelligibility in the on-stimulation condition in the STN-DBS group. STN-DBS is reported to cause dysarthria [12–19], abnormal laryngeal muscle contraction [4, 6], and respiratory-over drive [9], whitch may be attributed to the current diffusion to the surrounding structures of STN such as the corticobulbar and the cerebellothalamic fibers. These factors and the articulation organ range may have contributed to the discrepancy between VSA and speech intelligibility in the on-stimulation condition.
When interpreting our results, several limitations must be considered. First, participants were assessed only in the on-state under continued medication. Therefore, the impact of medication on articulation was not assessed with acoustic and perceptual analyses. Second, STN-DBS groups were evaluated in the off-stimulation condition 30 min after stopping stimulation. We must admit that larger stimulation-related changes may be obtained with a longer off-stimulation period. However, we have confirmed substantial changes of speech and voice after stopping stimulation in our previous studies [3, 4]. To minimize the effect of fatigue and L-dopa cycle, we chose to assess the patients in the on-stimulation condition and 30 min after stopping stimulation. Further studies are needed to determine the most appropriate period to evaluate articulation functions for the off-stimulation condition. Third, VSA-based analyses of formant frequencies do not directly estimate the actual distance of mouth and tongue movement. Yunusova et al. [43] reported that the relationships between F2 range and tongue position were weaker in ALS patients. Weismer et al. [44] reported that the VSA does not provide straightforward information on movement range, speed, or any other derivative of motion. In contrast, other studies in PD patients reported that the VSA reflected the working space of mouth/jaw and tongue movements [45, 46]. Skodda et al. suggested that reduced VSA may be a marker for disease progression in PD [28]. Therefore, the reasons for the inconsistency in the above-mentioned reports should be further investigated. Finally, we evaluated the VSA in sustained vowel tasks. Studies by Dromey et al. [24] and Martel Sauvageau et al. [5] extracted the pronunciation of /a/, /i/, and /u/ from speech tasks. The differences between speech and sustained-vowel tasks may affect the results of the relationship between the VSA and speech intelligibility. The production of sustained vowels is not natural speech but hyperarticulation. However, with speech tasks, the results of formant analysis may be affected by differences in speech rate, articulatory strength, loudness of voice, and the consonants that precede and follow vowels. Importantly, with speech tasks, there is a risk that results would differ depending on the linguistic area (e.g., Japanese/English). In contrast, sustained vowel production provides a relatively stable opportunity to examine the acoustic characteristics of the vowels. Therefore, we measured the pure sustained vowel to prevent our results from being affected by these factors. Further research investigating differences in VSA across vowel and speech tasks is required.
In conclusion, STN stimulation may improve movement of articulation structures including the mouth/jaw and tongue. On the other hand, STN-DBS may induce dysarthria [12–19], abnormal laryngeal muscle contraction [4, 6], and respiratory-over drive [9]. These may explain the discrepancy between VSA and speech intelligibility in the on-stimulation condition. We also demonstrated that VSA may be useful in assessing the impact of STN-DBS on articulation. The advantage of acoustic analyses is that the assessment is noninvasive and technically-easy as compared with other devices such as X-ray, electropalatography, surface electromyogram, and magnetic devices. Since a meta-analysis reported that dysarthria was one of the most frequent adverse effects following STN stimulation [1]. Therefore, we should consider a balance between maximizing the effect on the motor function and minimizing stimulation-induced adverse effects by carefully assessing patients’ speech and voicees. In future, novel devices [47–49] and new programming techniques [50] that adjust the area and direction of stimulation may have the potential to resolve these critical problems. Future studies are needed to examine these possibilities and to elucidate relevantmechanisms.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
RELEVANT CONFLICTS OF INTEREST/FINANCIAL DISCLOSURES
Nothing to report.
FUNDING SOURCES FOR STUDY
Nothing to report.
