Abstract
We investigate the potential association between leucine-rich repeat kinase 2 (LRRK2) mutations and voice. Sustained phonations (‘aaah’ sounds) were recorded from 7 individuals with LRRK2-associated Parkinson’s disease (PD), 17 participants with idiopathic PD (iPD), 20 non-manifesting LRRK2-mutation carriers, 25 related non-carriers, and 26 controls. In distinguishing LRRK2-associated PD and iPD, the mean sensitivity was 95.4% (SD 17.8%) and mean specificity was 89.6% (SD 26.5%). Voice features for non-manifesting carriers, related non-carriers, and controls were much less discriminatory. Vocal deficits in LRRK2-associated PD may be different than those in iPD. These preliminary results warrant longitudinal analyses and replication in larger cohorts.
INTRODUCTION
Voice impairment may be one of the earliest motor indicators of idiopathic Parkinson’s disease (iPD) [1], and is typically characterized by breathiness, roughness, reduced loudness, and vocal tremor [1–4]. It is estimated that between 70 and 90% of people with PD (PWP) experience vocal impairment [2, 6], and nearly one-third of PWP report voice-related problems as one of their main disease-related limitations [6].
Past work has demonstrated that objective measures of vocal impairment can be used to distinguish participants with iPD from controls with a high accuracy (mean sensitivity and mean specificity > 90%) [7–15]. The extent of vocal dysfunction has also been shown to be associated with disease severity [3, 17]. Moreover, for symptom monitoring, voice-based measures have been used to accurately replicate both the motor and total Unified Parkinson’s Disease Rating Scale (UPDRS) assessment (within 2 points from the clinicians’ estimate) [18]. Recently, abnormalities in speech production have also been reported in participants with idiopathic rapid eye movement (REM) sleep behavior disorder [19, 20]. These findings encourage further investigation of voice analysis as a reliable, non-invasive, and scalable tool that may also identify prodromal PD.
Leucine-rich repeat kinase 2 (LRRK2) mutations are the most common cause of genetically-determined PD [21]. The opportunity to intervene with disease-modifying therapy early in the neurodegenerative process makes identifying the prodromal state important. To investigate the presence of voice abnormalities in populations at increased risk to develop PD, in this study we analyze voice-based measures in multiplex families carrying a mutation in the gene for LRRK2. The goals of this pilot study were thus twofold. First, we aimed to determine if voice can be used to discriminate participants with LRRK2-associated PD from idiopathic PD. Second, we examined if there are any differences in voice between non-manifesting carriers of a LRRK2 mutation when compared to related non-manifesting non-mutation carriers and unrelated healthy controls.
METHODS
Study participants
Probands with LRRK2 mutations were identified at Toronto Western Hospital and all available blood relatives were invited to participate. iPD patients and healthy individuals (devoid of any neurologic disease or family history of PD) were recruited at Toronto Western Hospital. iPD was defined as individuals with PD, according to clinical diagnosis by a movement disorder specialist, in the absence of a family history of the disease in a first or second-degree relative. Seven participants with LRRK2-PD (p.P.G2019S (5) or L1795F (2)), 17 participants with iPD, 20 non-manifesting carriers of LRRK2 mutations (p.G2019S (18), L1795F (2)), 25 related non-manifesting non-carriers, and 26 healthy controls were recruited. The presence or absence of a LRRK2 mutation was evaluated in all participants as described previously [22]. In the non-manifesting carrier group, the likelihood of prodromal disease being present was determined [23]. The study was approved by the University Health Network Research Ethics Board and informed consent was obtained from all participants.
Data acquisition
We obtained two audio recordings of sustained vocal phonation from each participant during a study visit at the Toronto Western Hospital using a USB powered microphone (Logitech, model 980186-0403) positioned on a stable surface ∼2 inches from the participant’s mouth. Recordings were collected using Audacity software (Version 2.0.3) in a quiet room. Participants with iPD were evaluated in the ON medication state. Each participant was instructed to “Take a deep breath and then let out a single “aaah” sound for as long as you can.” Each recording was sampled at 44.1 kHz and stored as a de-identified digital audio file (.wav format).
Data processing
Identification of the longest usable segment of sustained phonation for each recording was performed manually. Recordings were discarded from the analysis if they were too noisy or if the phonation duration was shorter than two seconds. For each recording, we extracted 292 summary measures (also referred to as features or dysphonia measures) that have been used for analyzing voice, including in PD [10, 24]. Details regarding these features are provided in the Supplementary Material.
Statistical analysis
We identified 3 pairwise comparisons of interest: (1) LRRK2-PD versus iPD, (2) Non-manifesting carriers versus related non-manifesting non-carriers, and, (3) Non-manifesting carriers versus healthy controls. For each pairwise comparison, salient features were identified using the following 5 feature selection algorithms that help enhance the explanatory power of the analysis by removing redundant and less informative features [25–29]. Each of the 5 feature selection algorithms provided a unique set of feature ranking. To obtain a single ranking of the most salient features to be used for group comparison, we used a majority voting scheme. Pairwise comparisons were performed using a highly nonlinear statistical machine learning algorithm (random forests), used to separate generic feature data into several different classes [30]. Discrimination accuracy was evaluated using a 10-fold cross-validation (CV) scheme (with 100 repetitions for statistical confidence). This scheme helps assess generalizability of the discrimination results to similar, but previously unseen data, and has been used in previous studies on voice analysis in PD [10, 18]. Data was balanced in each cross-validation repetition to eliminate differences in group sample size. The statistical significance level was set to p = 0.05. Statistical analysis of the voice recordings was performed using the Matlab® software (version 2016b). Details regarding statistical analysis focussing on feature extraction, feature selection, and validation are provided in the Supplementary Material.
RESULTS
LRRK2-PD vs idiopathic PD
On average, LRRK2-PD participants were older and had a longer disease duration compared to participants with iPD (Table 1). However, UPDRSIII (motor UPDRS) between the two groups was similar. Two recordings were collected from each participant; however, 3 LRRK2-PD and 2 iPD voice recordings were discarded as they were too noisy for reliable computation of features. Accuracies to distinguish LRRK2 PD from iPD are reported in Table 1 and were computed using the 10 most salient features. Including more features in the classifier (random forest) improved the discrimination accuracy only marginally (Supplementary Figure 2). In discriminating participants using 11 LRRK2-PD (n = 7 individuals) and 32 iPD voice recordings (n = 17 individuals), the mean sensitivity was 95.4% (standard deviation (SD) 17.8%) and mean specificity was 89.6% (SD 26.5%). Results were very similar for males and females (Table 1); in discriminating recordings from female participants, the mean sensitivity was 99.4% (SD 7.1%) and mean specificity was 85.7% (SD 34.1%), whereas for male participants, the mean sensitivity was 100% (SD 0%) and mean specificity was 88.9% (SD 31.6%). Stratification of data based on sex resulted in too few recordings to adequately fit the random forest classifier, which reduces the reliability of analysis and inference, particularly for LRRK2-PD (n = 7). Statistically significant differences were observed between LRRK2-PD and iPD voice features (see Fig. 1, showing clear separation plotting 2 salient features, and Supplementary Figure 1). Details regarding the most salient features are provided (Supplementary Table 3).
Descriptive statistics by group (A), comparison of clinical and demographic characteristics (B), and discrimination accuracy for pairwise comparisons (C)
The table is presented in three different sections (A-C). Section A presents descriptive statistics for the five clinical groups (1. LRRK2-PD, 2. iPD, 3. NMC, 4. RNC, and, 5. Healthy controls). Section B compares the descriptive statistics. Age was analyzed using an unpaired t-test, whereas disease duration and UPDRSIII total were compared using Mann–Whitney U test. p values < 0.05 are highlighted in the bold italic text. Section C presents the out-of-sample discrimination accuracy for the three priority pairwise comparisons (1. LRRK2-PD vs iPD, 2. NMC vs RNC, and, 3. NMC vs Healthy) using a 10-fold cross-validation (CV) scheme (with 100 repetitions), employing only the 10 most salient voice features. The scheme involved repetitive splitting of the data into a training set (90% of the total observations) and a validation set (remaining 10% of the observations). The mean sensitivity and mean specificity across different CV repetitions are presented (along with standard deviation in brackets). The data was balanced to account for differences in group sample size. Accuracies were computed using all recordings, and separately for subgroup analysis using data stratified by sex. Abbreviations used: iPD, idiopathic Parkinson’s disease; LRRK2-PD, LRRK2-associated Parkinson’s disease; NMC, non-manifesting carriers; RNC, related non-carriers; SD, standard deviation; UPDRSIII, Movement Disorders Society Unified Parkinson’s Disease Rating Scale part 3; n.a., not applicable.

Scatterplots and boxplots of salient features for the three pairwise comparisons: LRRK2-PD versus idiopathic PD (iPD) (A and B), non-manifesting carriers (NMC) versus related non-carriers (RNC) (C and D), and NMC versus healthy controls (E and F). Panel A shows two highly discriminatory features that help differentiate LRRK2-PD from iPD. In Panel A, we plot Entropy (entropy computed after wavelet decomposition, quantifies extent of randomness in a signal) and Glottis to Noise Excitation (GNE, degree of signal strength versus noise resulting from incomplete vocal fold closure), both features were significantly different (p < 0.001, denoted by ***) (Panel B). Panel C plots two salient features that help discriminate NMC from RNC. In Panel C, we plot Harmonic to Noise Ratio (HNR, signal to noise ratio) and median shimmer (roughness in voice). HNR between the two groups was significantly different (p < 0.01, denoted by **), whereas shimmer between NMC and RNC was similar (p > 0.05) (Panel D), which indicates that the two cohorts are less different (as reflected in the discrimination accuracies reported in Table 1). Panel E shows two salient features that discriminate NMC from healthy controls. In Panel E, we plot Mel Frequency Cepstral Coefficient (MFCC, quantifies vocal fold dynamics depending on properties of the articulators) and median shimmer. Panel F shows that MFCC and shimmer were significantly different between the two groups. Salient features were identified separately for each pairwise comparison, using five different feature selection algorithms. The above plots were generated using all usable voice recordings. p values reported above were computed using the nonparametric two-sided Kolmogorov-Smirnov (KS) test.
Accuracies were also computed using leave-one-subject-out (LOSO) CV [31]. Using LOSO CV, the mean sensitivity was 83.7% and mean specificity was 88.5% in discriminating LRRK2-PD from iPD (see Supplementary Table 2).
We performed additional analysis whereby non-manifesting LRRK2 carriers and individuals with LRRK2-associated PD were treated as belonging to the same clinical group. This resulted in a larger sample of LRRK2 carriers (n = 27) which helped improve statistical power. The rationale of this analysis was to investigate if vocal deficits in LRRK2 carriers (both manifesting and non-manifesting) were different from iPD (for details, see the Supplementary Analysis).
Non-manifesting carriers (NMC) versus Related Non-carriers (RNC) and Controls
Participants from the three groups were of similar age (Table 1). NMC had a higher UPDRSIII score compared to both healthy controls and RNC. Statistical analyses were performed using 39 NMC recordings (n = 20), 48 RNC recordings (n = 25), and 47 control recordings (n = 26). In discriminating NMC from RNC, the mean sensitivity was 74.9% (SD 24.0%) and mean specificity was 78.0% (SD 23.3%). Moreover, in discriminating NMC from unrelated healthy controls, the mean sensitivity was 75.7% (SD 24.3%) and mean specificity was 81.8% (SD 20.4%). Scatterplots of the most salient features for these pairwise comparisons do not allow readily visible identification of this discrimination (Fig. 1). Compared to LRRK2-PD and iPD features, therefore, the voice features for NMC, RNC, and healthy controls were much less discriminatory.
DISCUSSION
Our preliminary analyses found statistically significant differences between LRRK2-PD and iPD (p < 0.01) in features extracted from sustained phonations (Fig. 1 and Supplementary Figure 1). The differences in the features were less pronounced when non-manifesting carriers were compared with related non-carriers and healthy controls. Thus, voice could potentially be used as a non-invasive and inexpensive biomarker for identifying a LRRK2 mutation in PD participants, but seems to be less promising as a potential marker for the prodromal phase of LRRK2-PD.
A limitation of this study is the small sample size, particularly for participants with LRRK2-PD. We investigated and verified that unique participant identity was not a confounder (see the Supplementary Analysis) [32]. LRRK2-PD participants were older than iPD participants and we cannot rule out the effect of presbyphonia as a potential confound [33]. However, including age as a covariate in the classification model did not result in improved classification accuracy, indicating that if presbyphonia exists in this cohort its effects may be negligible. Moreover, the predictive accuracy obtained using machine learning algorithms and multiple features do not lead to ready etiologically-relevant explanations for why voice impairment might be discriminatory in this context [34]. This hinders our ability to make inferences regarding underlying pathophysiological changes associated with an impaired voice in PD.
We find that statistical analysis of sustained phonations help discriminate LRRK2-PD and iPD. The findings of this study add to the growing evidence supporting clinical and pathological differences between LRRK2-PD and iPD, whereby differences in both motor and nonmotor features (including heart rate variability, tremor, gait, olfactory identification) have been previously reported [35–40]. To the best of our knowledge, this is the first proof-of-concept study that investigates potential vocal deficits in manifesting and non-manifesting LRRK2-carriers. These results warrant further investigation into the potential of using voice for the delineation of PD subtypes in larger cohorts.
CONFLICT OF INTEREST
SA is employed by the University of Oxford and has been funded by Parkinson’s UK.
