Abstract
Background:
Parkinson’s disease (PD) is a chronic, disabling neurodegenerative disorder.
Objective:
To predict a future diagnosis of PD using questionnaires and simple non-invasive clinical tests.
Methods:
Participants in the prospective Kuakini Honolulu-Asia Aging Study (HAAS) were evaluated biannually between 1995–2017 by PD experts using standard diagnostic criteria. Autopsies were sought on all deaths. We input simple clinical and risk factor variables into an ensemble-tree based machine learning algorithm and derived models to predict the probability of developing PD. We also investigated relationships of predictive models and neuropathologic features such as nigral neuron density.
Results:
The study sample included 292 subjects, 25 of whom developed PD within 3 years and 41 by 5 years. 116 (46%) of 251 subjects not diagnosed with PD underwent autopsy. Light Gradient Boosting Machine modeling of 12 predictors correctly classified a high proportion of individuals who developed PD within 3 years (area under the curve (AUC) 0.82, 95%CI 0.76–0.89) or 5 years (AUC 0.77, 95%CI 0.71–0.84). A large proportion of controls who were misclassified as PD had Lewy pathology at autopsy, including 79%of those who died within 3 years. PD probability estimates correlated inversely with nigral neuron density and were strongest in autopsies conducted within 3 years of index date (r = –0.57, p < 0.01).
Conclusion:
Machine learning can identify persons likely to develop PD during the prodromal period using questionnaires and simple non-invasive tests. Correlation with neuropathology suggests that true model accuracy may be considerably higher than estimates based solely on clinical diagnosis.
INTRODUCTION
Parkinson’s disease (PD) is a chronic, progressive and disabling neurodegenerative disorder beginning in mid to late life [1, 2]. Classical motor features result primarily from degeneration of dopaminergic neurons in the substantia nigra pars compacta (SNpc), and include rest tremor, slowness and paucity of movement, rigidity, impaired balance and autonomic symptoms. PD is now well-recognized to be a systemic disease [1, 2] with widespread intraneuronal accumulations of aggregated phosphorylated alpha-synuclein protein (Lewy bodies and Lewy neurites) and associated symptomatology detected throughout the spinal cord and autonomic nervous system, myenteric plexus and gut, olfactory system, visual system, pancreas and skin [3–9]. Peripheral pathology likely precedes central pathology [10].
There is no diagnostic test for PD, and motor signs and symptoms required to meet diagnostic criteria manifest only after extensive loss of striatal dopamine [1, 12]. By the time of diagnosis, 50–80%of nigral dopaminergic neurons are dead or dying [13]. Therapeutic agents designed to slow disease progression may be less effective at this late stage, and indeed more than three decades of clinical trials targeting early PD have failed to identify a disease modifying intervention. A prodromal period lasting many years precedes the onset of motor parkinsonism [14]. Interventions to delay the onset of parkinsonism could be implemented during this long pathologic evolution if persons at risk for developing PD could be identified with confidence.
Only a few prior studies have collected a broad range of prodromal features and risk factors prospectively, and the correlation between these risk factors and Lewy pathology at autopsy is understudied [15–18]. Importantly, incidental Lewy pathology has been detected in up to 20%of otherwise clinically normal individuals, and is thought by many to reflect early PD [19, 20]. We hypothesized that it is possible to identify those at risk of PD and Lewy body pathology using machine learning modeling of data obtained by questionnaire and simple clinical tests conducted during medical examination of participants in the prospective population-based Kuakini Honolulu-Asia Aging Study (HAAS) [21].
MATERIALS AND METHODS
Study cohort
The Kuakini Honolulu Heart Program (HHP) was established as a prospective cohort study in 1965 with enrollment of 8,006 Japanese-American men born 1900–1919. The original goals were to examine rates and risk factors for heart disease and stroke [22]. In 1991 with establishment of the Kuakini HAAS, the focus shifted to neurodegenerative diseases including PD. Environmental, lifestyle, and physical characteristics were ascertained in 1991 and at follow-up exams every 2–3 years through 2012. Detailed case finding methods have been published [23, 24]. Briefly, during the course of follow-up, all subjects were questioned about a diagnosis of PD and the use of PD medications by structured interview. Study participants received further screening by a technician trained in the recognition of the clinical symptoms of parkinsonism. Those with a history or sign of parkinsonism were referred to a study neurologist for a comprehensive neurologic examination and application of standard diagnostic criteria for PD [25]. For the current study, the exam conducted during 1994–1996 (Exam 5) was set as the index date. Individuals diagnosed with PD at or before the index date were excluded. Cohort members were followed until the latter of death or 2012.
Neuropathological evaluation
Autopsies have been sought on all Kuakini HAAS deaths since 1991 and obtained on about 20%. Full neuropathological methods are reported in Petrovitch et al. [16]. Briefly, examinations of multiple brain regions were performed by neuropathologists unaware of clinical diagnoses. Formalin-fixed hematoxylin and eosin stained sections of the mid-brain were prepared at the level of the exit of the third cranial nerve and mid-pons at the level of the locus coeruleus. Lewy bodies were identified by microscopic evaluation of single sections through the substantia nigra and locus coeruleus. If Lewy bodies were found in any region, then alpha-synuclein immunohistochemistry was also performed on sections of anterior cingulate, insula, frontal, temporal, and parietal lobes, and entorhinal cortex. Cortical Lewy bodies in these regions were then quantified [16]. SNpc neuron counts were determined as previously described for dorsomedial, ventromedial, dorsolateral, and ventrolateral quadrants, and neuron density expressed as neurons/mm2 [26].
Case/control definitions
Final diagnosis of incident PD ascertained after the index date was determined by neurological exam and PD experts using published diagnostic criteria [23, 24]. Participants who did not manifest signs of parkinsonism or dementia were classified as controls. Among controls who went to autopsy, some were found incidentally to have nigral Lewy pathology. Thus, controls comprised three mutually exclusive groups: 1) autopsied with incidental Lewy pathology (iLB-Yes), 2) autopsied with no incidental Lewy pathology (iLB-No), and 3) not autopsied (iLB-Unknown). In order to best investigate the relationship of predictive models with Lewy pathology we included all PD cases (n = 58) and all iLB-Yes controls (n = 84), and age-matched them to iLB-No (n = 32) and iLB-Unknown (n = 135) controls at approximately a 1 : 1 ratio.
Clinical variables
Clinical variables analyzed in the current study were collected during follow-up exams at or prior to the index date (Exam 5, 1994–1996). For variables collected at multiple exams, we selected the measurement temporally closest to the index date. Available variables from Exam 5 included age, simple reaction time [27], and choice reaction time (measured using a computerized reaction time test [28], and modeled as continuous variables (s) [29, 30]), and olfactory discrimination (assessed with the brief smell identification test [31] (BSIT) (total score; range 0–12) [27]). Variables from Exam 4 (1991–1993) included body mass index (BMI; kg/m2) [32], smoking history (never, past, or current [33]), excessive daytime sleepiness (response to the question “are you sleepy most of the day” [23, 27]), bowel movement frequency (defined as < every other day, every other day, once per day, 2–3 times per day, or > 3 times per day [34], modeled as an ordinal variable), and cognitive impairment (evaluated using the Cognitive Abilities Screening Instrument (CASI), with total score modeled as a continuous variable [35]). We also incorporated three variables from the mid-life 1967–8 HHP exam including presence of hypertension (systolic blood pressure ≥140 or diastolic blood pressure ≥90 or taking antihypertensive medication [27] [33]), self-reported history of a head injury with loss of consciousness [36]) and daily average coffee consumption (ounces per day, analyzed as a continuous variable [33]).
Machine learning classifier of a future diagnosis of PD
We implemented Light Gradient Boosting Ma-chine (LGBM) as a classifier, a decision tree-based ensemble method that iteratively builds decision trees with the main goal of reducing classification (or prediction) error from the previous step. LGBM consists of individual shallow decision trees that avoid overfitting problems [37, 38]. In a classification task, LGBM produces a value between 0 and 1 representing the probability of belonging to each class. These values may be further transformed to predicted class labels by processing via a threshold (or cutoff) value. We did not implement a separate missing data imputation since LGBM can automatically handle missing data. We implemented a five-fold cross-validation strategy to avoid overfitting models. A Bayesian optimization algorithm was used for hyperparameter tuning [39] to find optimal values of parameters such as number of trees, tree depth, learning rate, and boosting rate.
We built several LGBM models to predict PD using different sets of controls. In Model 1, we considered all participants without a clinical diagnosis of PD as controls. Model 2 excluded controls who had Lewy bodies at autopsy (iLB-Yes). In Model 3, we excluded both iLB-Yes controls and controls who did not have an autopsy (iLB-Unknown) and included only controls known to be free of Lewy bodies at autopsy (iLB-No). Finally, we generated a Model 4, in which we re-annotated iLB-Yes controls as cases for model development. To avoid any circularity, iLB-Yes controls were excluded from assessments of Model 4 classification accuracy. To ensure that models have practical applicability to future studies of disease modifying interventions, primary analyses used two separate prediction windows, 3- and 5- years from index date, to model both short- and mid-term predictors of PD risk.
Overfitting is a common problem when using machine learning. It occurs when trained machine learning models learn the patterns that are specific to the training sample, rather than learning the patterns representing the input-outcome relationship. Overfitting manifests as high accuracy in training data but poor performance in testing. We implemented a 5-fold cross-validation strategy to ensure generalizability within our study cohort. At each run of the 5-fold cross-validation, we randomly selected 10%of the training data to be used as a validation set for early stopping to minimize overfitting. To avoid information leak during the five steps of the cross validation, we built five different models from scratch independent from the parameters of the model developed in other steps. We assessed classification models using various performance metrics including sensitivity, specificity, and area under the receiver operating characteristics (ROC) curve (AUC) statistics. We additionally analyzed the correlation of predictive models with neuron density. We performed LGBM’s default variable importance analyses to rank variables based on their contribution to the predictions, which calculates an average “gain” value (relative importance) of the corresponding variable to the model. A higher “gain” value of a variable shows its importance over other variables of lesser gain value [40]. We further extended the variable importance analyses to estimate the independent magnitude and direction of effect of the predictors on the risk for PD. This was done by quantifying the average change (with 95%confidence interval) on predicted risk for PD corresponding to one standard deviation increase in continuous predictors or one unit increase in ordinal categorical variables.
All analyses were carried out using Python programming language.
The study was approved by the Institutional Review Boards of Loyola University Chicago (LU 212399), the University of California-San Francisco, and the Kuakini Medical Center.
RESULTS
The analytic sample consisted of 309 Kuakini HAAS participants with complete data and who did not have a PD diagnosis at the index date (Exam 5, 1994–1996). A total of 58 individuals were diagnosed with incident PD. Among these, 25 were diagnosed with PD within 3 years, and an additional 16 were diagnosed within 5 years of the index date. Eleven of 41 participants who developed PD within 5 years had autopsies, all of whom had Lewy body pathology. Among 251 clinically-defined controls, we included 84 with Lewy body pathology at autopsy (iLB-Yes), 32 without Lewy body pathology (iLB-No), and 135 without an autopsy (iLB-Unknown). Cohort characteristics are summarized in Table 1. For most predictor variables, values were most extreme among those who developed PD within 3 years of the index date, and values for iLB-Yes were intermediate between iLB-No and PD. Relative to all controls, cases had significantly lower bowel movement frequency and BSIT olfaction scores, and greater daytime sleepiness. Relative to controls without LB pathology, controls with LB pathology scored lower on the CASI and the BSIT.
Study Sample Characteristics
*Significantly different (p < 0.05) between all cases and all controls. **Significantly different (p < 0.05) between LB-Yes and LB-No controls.
Predicting a future diagnosis of PD within 5 years
Five-fold cross-validation accuracies for predicting a diagnosis of PD within 3 or 5 years after index date using several different control subgroups are presented in Table 2. The majority of misclassification in Model 1 occurred among controls with Lewy pathology (iLB-Yes), 38 (45%) of whom were classified by the model as PD. As noted in the Methods, we used this information to generate a Model 4, in which we re-annotated these individuals as cases for model development. Model 4 yielded the best accuracy for predicting future PD with AUC and 95%CIs of 0.82 (0.76–0.89) for a 3-year prediction window and 0.77 (0.71–0.84) for a 5-year prediction window. iLB-Yes controls were not included when calculating the AUCs for this model so as to avoid any circularity. Figure 1 depicts AUCs and examples of the sensitivity and specificity for predicting a clinical diagnosis of PD for several cut-points.
Machine Learning Models for Prediction of Incident Clinical PD
*We also ran a model by using LB-Yes patients as cases and obtained AUC of 0.63 (0.56–0.70) for 3-year prediction window and AUC of 0.61 (0.55–0.68) for 5-year prediction window. **Controls with LB at autopsy (LB-Yes) were reannotated as PD for model development but excluded from tests of model performance. When we implemented Model 3, among 84 LB-Yes controls, 38 were classified as cases (PD) and 46 as controls). Using this evidence, in Model 4, we rebuilt a model by using these 38 as cases and 46 as controls. In addition, to compare the robustness of Model 4 with Model 3, we further excluded LB-Unknown patients in the AUC calculation and obtained an AUC of 0.91 (0.82–0.99) for 3-year prediction window and 0.80 (0.70–0.90) for 5-year prediction.

ROC curve of Model 4 for 3-year (left) (AUC 0.82) and 5-year (right) (AUC 0.79) prediction windows.
Although our main goal in this study was to develop models to identify patients at risk for being clinically diagnosed with PD within a specified time frame (3 or 5 years), as a sensitivity analysis, we also repeated Model 4 including 17 additional individuals who were diagnosed with PD beyond 5 years of index date (median 8 years, Range 6–17). As expected, the classification accuracy was lower, with an AUC of 0.70 (95%CI 0.64–0.77).
Censor-time based subgroup analysis for controls
Because the time from index date to autopsy was as long as 14 years for some participants, we calculated prediction performance for different time intervals until autopsy. For iLB-Yes controls whose autopsy was within 3 years of the index date, our model classified 79%as PD. This declined to 67%of iLB-Yes controls autopsied within 4 years, 55%within 5 years, and 40%of those autopsied > 7 years after index date. Thus, those with incidental Lewy pathology who were identified closer to the index date were more likely to be classified as PD.
Correlation of predicted PD risk probability with neuron density
Neuron densities and their correlations with predicted 5-year PD risk probability are shown in Table 3. Age at autopsy was not significantly correlated with any of the neuron density variables (all |correlations| < 0.1; data not shown). As expected, neuron density was highest in controls without LBs and lowest in PD cases. The classification scores correlated inversely with neuron density in all SNpc quadrants, with ventromedial neuron density being most strongly correlated to predicted PD risk. This negative correlation suggested that a greater predicted probability of PD is associated with lower nigral neuron density at death. As above, since autopsies were performed after a variable number of years following index date, we further investigated how the correlation of predicted PD risk and ventromedial neuron density varied by the time since the index date. As shown in Fig. 2, correlations were stronger for autopsies performed closer to the index date.
SNpc neuron densities and correlations with estimated PD risk

Correlation between predicted PD risk and ventromedial neuron density is stronger closer to index date.
Variable importance analysis
Figure 3 depicts the relative contributions of each variable to LGBM Model 4 at 3 and 5 years after index date. The reaction time variables were most important, followed by olfaction score and BMI. Figure 4 depicts the magnitude, precision (95%CI) and direction (inverse or direct relationship) of independent contributions to the classification model, in which each point with associated bar represents the mean change in predicted PD risk with 95%CI when the variable value was artificially increased by 1 standard deviation for continuous variables and by 1 unit for categorical variables. Most variables contributed in the expected direction with the exception of CASI score and age. The inverse directionality of age may reflect interaction between age and other model variables and/or may be due to the narrow age range of the study cohort. For example, slower reaction time is likely to be a stronger predictor of future PD in younger individuals.

Variable Importance for predicting PD within 3 (Top) and 5 years (Bottom). The length of the bar depicts the relative importance.

Independent direction and magnitude of effect for 3-year (Top) and 5-year (Bottom) prediction windows.
DISCUSSION
Identifying PD in its earliest stages, before significant motor symptoms manifest, may be essential for the development and implementation of disease modifying interventions. However, current criteria for prodromal PD have been validated in only a handful of studies [41–43], and performance has varied among populations [44]. The most specific prodromal indicators, such as dopamine transporter imaging or ultrasonography, can be costly and invasive, while others such as REM sleep behavior disorder are relatively rare in the general population and definitive diagnosis requires polysomnography. In the current study, we applied machine learning techniques to accurately classify persons at risk for developing PD. Because our models relied exclusively on non-invasive and inexpensive tests, many of which could be implemented remotely, such as in an online or mobile phone-based assessment, and historical variables easily determined by self-report, this approach could be efficiently implemented in large, targeted populations.
To our knowledge, this is the first time that post-mortem pathologic findings have been combined with the clinical diagnosis of PD to explore model classification performance. We have also used these pathologic data to tune model performance and maximize classification accuracy. Remarkably, 45%of clinical controls who were misclassified as having PD by our initial model were found to have nigral Lewy bodies at autopsy. Because incidental Lewy bodies likely reflect early-stage PD [25, 46], we propose that the model is in fact correctly classifying people with prodromal PD, though we cannot rule out the possibility of another evolving neurodegenerative synucleinopathy. Further supporting this interpretation, model prediction probabilities correlated significantly with lower nigral neuron density, and correlations were strongest in those whose autopsies were within three years of the index date. Similarly, the proportion of iLB-Yes controls classified as PD was highest for those with the shortest time from index date until autopsy. Although we obtained classification AUCs of 0.82 and 0.77 at 3- and 5-years after the index date, many of the variables included in our model were collected as long as several decades before the index date. The most important predictors—simple reaction time, choice reaction time, and olfactory discrimination—were all collected at the index date. Prediction accuracy would likely have been considerably higher had all variables been collected closer to the index date, as has been previously observed in this cohort for olfactory dysfunction.[47]
The International Parkinson’s Disease and Movement Disorder Society (IPMDS) has identified 23 individual factors in a proposed research definition of prodromal PD [14]. While many of those features were not assessed here, remarkably, the variables with the highest importance for predicting PD in this machine learning derived model based on clinical and pathological outcomes parallel many of those assigned greater importance in the IPMDS model, represented as higher likelihoods [14]. The variables with greatest importance for predicting PD within 3 or 5 years in our model were two quantitative motor tests (simple and choice reaction time). Abnormal quantitative motor tests are also a feature in the IPMDS criteria, but with only moderately strong likelihood ratios. Impaired smell recognition and increasing age are recognized as important in both models. BMI, among the top four most important factors in this model, may be a surrogate for diabetes mellitus and physical inactivity, two IPMDS criteria. Hypertension is the only variable identified in this model that is not represented in the IPMDS criteria. In fact, orthostatic hypotension is strongly weighted in the IPMDS criteria. This may reflect the fact that our model included hypertension in midlife, determined more than 25 years before index date.
Our study has some limitations. Most importantly, the Kuakini HAAS cohort is comprised entirely of Japanese-American men in Hawaii, and with an index date mean age over 80, they are substantially older than study populations likely to be enrolled in disease modifying therapeutic trials. Thus, although our modeling approach may be widely applicable, our model weightings are not likely to be generalizable outside of this population. Additionally, as noted above, predictor variables were collected at differing timepoints before the index date. Although we would expect this to have biased our models toward the null, it nonetheless further hinders generalizability. Further, despite the fact that all individuals defined here as controls (with or without incidental Lewy pathology) did not have clinical evidence of parkinsonism or dementia, we did not consider other types of neuropathology in these analytic models. Finally, although we implemented a comprehensive cross-validation strategy, our sample size was relatively small and variable coefficients imprecise.
Machine learning techniques may provide opportunities to identify individuals during prodromal PD [48], as well as to predict disease progression [49]. In prior work, we successfully implemented a machine learning approach to detect PD autonomic features prior to diagnosis using a single lead of a standard 10-s 12-lead electrocardiogram [50]. Karabayir et al. implemented an LGBM algorithm to accurately classify PD using data generated from a simple speech test [51]. Although some have criticized machine learning methods as non-intuitive, the development of compact models using a small number of clinical variables increases their utility and potentially their portability across healthcare settings and systems [52].
Advances in digital technology are increasingly being applied in the assessment of health outcomes [53]. Many of the variables with highest predictive value in this model, as well as our prior finding associating reduced heart rate variability and future risk of PD [50] can now be determined using personal technologies such as online computerized testing, mobile phone applications or wrist-worn sensors [53]. In the future, machine learning algorithms such as those reported here may be effectively combined with self-reported health measurements and digital assessments to develop an efficient, low cost method for population screening and prospective monitoring of those with prodromal PD.
Investigators wishing to test our model in other study populations can access the source code at https://github.com/akbilgic/AI_PD_ClinicalModel.
Footnotes
ACKNOWLEDGMENTS
We thank Michael J Fox Foundation for supporting this research (MJFF Grant ID 17267, PI Akbilgic). This work was also supported by the National Institute on Aging and National Institute of General Medical Sciences.
We also would like to extend our thanks to the Kuakini Medical Center in Honolulu, HI for providing Kuakini HAAS data, and to the HAAS study participants.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
