Abstract
Objective
This study evaluates ML approaches for insomnia classification using physiological and behavioral data from wearable devices. SHAP analysis identifies key predictors, highlighting the relationship between sleep disturbances and digital phenotypes and emphasizing clinical plausibility as a criterion for model selection.
Methods
Three hundred thirty-eight participants (249 with insomnia and 89 controls) aged 19–70 years were instructed to wear Fitbit Inspire 3 devices for 4 weeks to record heart rate, activity, and sleep metrics. Insomnia classification was based on Insomnia Severity Index scores (≥8 insomnia and ≤7 controls). Filter- and wrapper-based feature-selection methods were applied to the 120 extracted features. Multiple ML algorithms were evaluated using five-fold cross-validation, with the clinical plausibility of the feature relationships explicitly considered in the final model selection.
Results
LightGBM model trained on 60 ANOVA-selected features achieved the highest performance (F1 score = 0.868 ± 0.027). The key predictive features identified by SHAP analysis included delayed acrophase of the heart rate cosinor rhythm, higher self-reported stress and maximum heart rates that aligned with sleep-wake physiology. However, several features exhibited patterns that contradicted previously known clinical expectations, highlighting the disconnection between statistical optimization and clinical utility.
Conclusion
Machine learning models trained on wearable data can effectively classify insomnia. SHAP analysis suggested that altered activity patterns reflect sleep disturbance, while also highlighting the necessity for further clinical validation. Clinical plausibility must be integrated as a fundamental criterion in model development, to ensure clinically trustworthy ML applications in sleep medicine.
Keywords
Introduction
Insomnia affects approximately 10–30% of the population; however, clinical diagnosis relies primarily on subjective self-report instruments that are prone to recall bias.1–3 Although wearable technologies and digital phenotyping offer objective approaches for sleep assessment, the translation of high-performing machine learning (ML) models into clinical practice remains challenging owing to interpretability issues.4,5
Recent advances in wearable artificial intelligence (AI)-powered solutions have demonstrated promising results in the detection of sleep disorders.6–8 Although growing emphasis on explainable AI in healthcare has focused primarily on technical interpretability methods, few studies have examined whether ML-identified patterns align with clinical knowledge or how contradictory findings influence model selection.9–11 This gap is particularly concerning in sleep medicine, where complex interactions between behavioral, physiological, and psychological factors require nuanced clinical interpretations.
While predictive performance remains a central benchmark in machine learning research, the clinical deployment of AI systems requires evaluation criteria that extend beyond statistical accuracy. Prior work has emphasized that machine learning systems trained on large observational datasets may generate patterns that appear statistically valid yet lack clinical validity. When model optimization targets a specific performance metric, models may exploit dataset-specific artifacts or latent confounders that improve predictive performance but do not reflect meaningful physiological relationships. 12 Moreover, in the context of healthcare machine learning, excessive optimization may encourage models to capture spurious associations that fail to generalize across populations or clinical contexts. 13
When models provide recommendations based on counterintuitive patterns, clinicians may lose confidence in AI-assisted decision making, potentially leading to inappropriate interventions or limiting the adoption of beneficial technologies.14,15 In high-stakes healthcare environments, comprehending the rationale behind AI-driven decisions is essential for maintaining clinical trust and ensuring patient safety. 16 In addition, prior studies have highlighted the risk of a “false sense of certainty” in AI-assisted clinical decision-making, where clinicians may inadvertently rely on model outputs despite underlying inaccuracies or spurious patterns, underscoring the importance of incorporating clinically grounded validation criteria beyond predictive performance alone. 17 These concerns have been increasingly reflected in emerging regulatory and methodological frameworks for trustworthy medical AI. Regulatory guidance from the U.S. Food and Drug Administration for AI/Machine learning-based Software as a Medical Device emphasizes transparency, including the logic or explainability of model outputs. 18 Similarly, the World Health Organization has highlighted explainability as core ethical principles for AI in health. 19 Reporting guidelines such as TRIPOD+AI recommends transparent reporting of interpretability methods and their validation, along with conventional performance measures. 20
In this study, we adopt this perspective by explicitly considering clinical plausibility as an additional model-selection criterion when developing machine learning models for insomnia classification using wearable-derived digital phenotypes. Rather than selecting models solely based on predictive performance, we examine whether the relationships identified by each model align with established clinical and physiological knowledge about sleep, circadian rhythms, and behavioral activity patterns.
This study explored the integration of ML models for insomnia classification based on data derived from wearable-derived digital phenotypes, with explicit consideration of clinical plausibility. Our primary objective was (1) to develop and evaluate ML models for insomnia classification using wearable-derived digital phenotypes; and (2) to demonstrate the necessity of integrating clinical plausibility as a fundamental criterion in model selection, beyond predictive performance alone to ensure clinically trustworthy applications.
Methods
Participants and data collection
Study design
This single-center prospective observational study was conducted at Korea University Anam Hospital, Seoul, Republic of Korea, between January 2023 and July 2024. This study is part of an ongoing research program employing a deep phenotyping approach to comprehensively characterize insomnia, conducted from March 2023 to October 2024 (registered at the Clinical Research Information Service: KCT0009175; protocol published in Lee et al., Front. Psychiatry 2025). 21
Study population
A total of 338 participants aged 19–70 years were recruited from Korea University Anam Hospital between January 2023 and July 2024 as part of the ongoing CRIS-registered study (KCT0009175). Based on Insomnia Severity Index (ISI) scores, 249 participants (73.67%) were classified as the insomnia group (ISI ≥8) and 89 participants (26.33%) as controls (ISI ≤7). In this study, using a lower ISI cutoff score of ≥8 to define the insomnia group allows identification of individuals with clinically relevant insomnia symptoms including subthreshold insomnia that may precede the development of chronic insomnia. 2 Individuals with intellectual disabilities, organic brain damage, schizophrenia spectrum disorder, ongoing sleep disorder treatment, or those without a smartphone were excluded.
Data collection
Data collection was conducted over 4 weeks using three sources: wearable device monitoring, smartphone-based ecological momentary assessment, and self-reported case report forms (CRFs). Upon enrollment, the participants completed structured CRFs and provided demographic data, family history, current illnesses, and sleep characteristics, and ISI scores were calculated.
Participants wore a wearable device (Fitbit Inspire 3, Fitbit Inc., USA) continuously, which passively recorded their heart rate every 5 seconds and also step count, moving distances, and exercise time every 5 minutes. Daily sleep metrics included total sleep time, REM/light/deep sleep time, sleep onset/offset times, wake after-sleep onset episodes, and sleep efficiency. Data were segmented into weekdays/holidays, daytime (8:00 AM to 6:00 PM), and bedtime (6:00 PM to 8:00 AM) periods.
In addition, participants installed a custom smartphone application ‘SOMDAY’ (Lumanlab Inc, Seoul, Republic of Korea) developed specifically for this study to capture subjective daily lifestyle data, complementing objective wearable data. Daily lifestyle factors, including caffeine and alcohol intake, stress levels, and total nap time, were reported using SOMDAY.8,22,23
This study was conducted in accordance with the principles of the Declaration of Helsinki. All procedures were reviewed and approved by the Institutional Review Board (IRB) of Korea University Anam Hospital (No. 2022AN0587). All participants provided written informed consent at the beginning of the study, following a clear explanation of the study’s purpose, procedures, potential risks and benefits, data-handling procedures, and the voluntary nature of participation.
Data preprocessing
The heart rate data were used to compute descriptive metrics (maximum, minimum, mean, variance, and standard deviation) for each time segment. Cosinor analysis within 72-h intervals estimated the following circadian rhythm parameters: amplitude, acrophase, MESOR, and goodness of fit. 24 Exercise intensity was calculated based on the heart rate relative to the maximum heart rate and classified according to established criteria.
Activity levels were analyzed using step count and moving distance with basic descriptive metrics (maximum, minimum, mean, variance, and standard deviation). Nonparametric circadian features including interdaily stability (IS), intradaily variability (IV), relative amplitude (RA), and mean activity during the least active 5 h (L5) and most active 10 h (M10) captured patterns of daily rest-activity rhythms. 11 Cosinor analysis was also performed using the step count data. 11
This process yielded a total of 120 features (for a complete list, see Supplementary Material 1). Missing values were primarily due to device non-wear or signal disconnection, and the feature-wise missingness rates are provided in Supplementary Material 2.
To address missing values, a sensitivity analysis was performed comparing four imputation strategies: groupwise mean imputation, MissForest, multivariate imputation by chained equations(MICE), and k-nearest neighbors(KNN). The MissForest algorithm was selected as the primary method because it captures complex, non-linear interactions between variables without requiring prior distributional assumptions. Furthermore, we avoided mean imputation to prevent potential data leakage. Imputation was performed separately within strata defined by ISI score to preserve potential differences in activity patterns across sleep disturbance severity. The performance of different imputation methods was evaluated using a baseline logistic regression model.
Model construction
Feature selection
Python software, version 3.12.3 was used for all analyses. To examine the impact of different feature selection strategies on model performance and interpretability, both filter and wrapper methods were applied. Three statistical filtering methods were applied to score and rank the features, including mutual information, ANOVA, and chi-squared statistics. The models were trained on feature subset sizes ranging from k=1 to k=120, based on the top-k ranked features from each filter method. A model-specific wrapper approach was implemented using the Optuna framework. This method utilized categorical suggestions to determine the inclusion or exclusion of each feature, directly optimizing the model’s objective function based on the resulting feature subsets. 25
Machine learning models
Five supervised ML algorithms were compared: logistic regression, Random Forest, XGBoost, LightGBM, and Support Vector Machine.
Model training and validation
Model training was performed using Python scikit-learn v.1.6.1, XGBoost v.2.1.4, Lightgbm v.4.6.0, and Shap v.0.48.0. To perform a binary classification of the insomnia and normal sleep groups, the dataset was split into a training set (70%) and an external validation set (30%). Hyperparameter tuning and feature selection parameters were optimized simultaneously using the Optuna optimization framework. The Tree-structured Parzen Estimator (TPE) sampler was utilized to navigate the search space for 300 trials per model.
Statistical analysis
Performance was assessed using accuracy, precision, recall, and F1 score, with the F1 score serving as the primary selection metric owing to class imbalance. The models were evaluated using stratified five-fold cross-validation with metrics averaged across folds with ten-time repetition. Wilcoxon signed-rank test was conducted in order to compare the F1 score of the best performing model against alternative models. 26 Discrimination was further assessed using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Calibration of the model was evaluated using the Brier score and calibration curves.
Model interpretability assessment
The interpretability of models was assessed using Shapley’s additive explanation (SHAP) values and summary plots. Clinical plausibility of the feature relationships was explicitly evaluated and integrated into the final model selection decisions, prioritizing alignment with established clinical knowledge alongside statistical performance.Figure 1. Schematic diagram of the study method.
Results
Sensitivity analysis of imputation methods
The sensitivity analysis revealed that both Mean Imputation and MissForest yielded comparable predictive performance, with MissForest showing a slight advantage in overall accuracy. Although the F1-score for mean imputation (0.763) was marginally higher than that of MissForest (0.761), the difference was negligible. We prioritized MissForest for the final model development due to its superior ability to maintain the multivariate distribution and data integrity. The full results are presented in Supplementary Material 3.
Model performance
Optimal number of selected features and model performance metrics for each model trained with different feature selection methods.
k, number of selected features; ANOVA, analysis of variance.

(a) The AUC-ROC and (b) the calibration curve of the best performing ANOVA-filtered LightGBM model.
Clinical interpretability assessment
The SHAP summary plot of the best performing LightGBM model (Figure 3) showed that the most influential feature was HR_CR_acrophase, which clearly showed that delayed acrophase of the heart rate cosinor rhythm elevated the possibility of insomnia prediction. Higher self-reported stress also contributed to prediction of insomnia, while lower stress did not show a clear correlation. Higher maximum heart rates during daytime in both weekdays and holidays were related with lower risk of insomnia. SHAP summary plot of the ANOVA-filtered LightGBM model (k=60), demonstrating the best F1 score. The top 20 features with the highest SHAP values. Details of the interpretation of SHAP analysis are available in Supplementary Material 5.
While most heart rate related features aligned with clinical knowledge and were clinically interpretable, several features directly related to sleep contradicted previously known patterns. Higher intradaily variability (IV_week, IV_holiday) was associated with reduced insomnia risk, opposing the established understanding that elevated IV reflects circadian disruption linked to greater sleep disturbances. 27 The sleep efficiency paradox, in which insomnia patients showed better efficiency than controls, raises concerns about relying on wearable-derived metrics without clinical validation. 28
These counterintuitive patterns suggest that the model may be capturing systematic biases or artifacts inherent in wearable-derived features rather than true physiological mechanisms. Although the model demonstrated acceptable predictive performance, its outputs may be misleading if interpreted without caution.
To further validate the stability of feature importance, we conducted a sensitivity analysis by comparing the top 5 SHAP values across the 20 best-performing models. Feature consistency analysis revealed that HR_CR_acrophase and self-reported stress were consistently identified as top predictors in 15 (75%) and 16 (80%) models, respectively. This high level of consensus underscores that circadian rhythm phase shifts and psychological stress are robust predictors of insomnia, independent of model architecture or hyperparameter configurations. In contrast, sleep_efficiency_week appeared in only 9 (45%) models. This relatively low consistency, combined with the observed “sleep efficiency paradox,' suggests that while sleep efficiency is a relevant factor, its predictive value may be more susceptible to model-specific biases or inherent measurement noise in wearable-derived sleep metrics. The SHAP summary plots of all models are presented in Supplementary Material 6.
Final model selection: Integration of clinical reasoning
The final selection of the best performing model integrated the following clinical reasoning: (1) adequate performance (F1 score = 0.868 ± 0.027), (2) interpretable relationships allowing clinical evaluation, and (3) transparency enabling informed clinician decision-making when predictions contradict expectations.
Discussion
The evaluation of symptoms and diagnosis in clinical psychiatry largely depends on patients’ self-reported symptoms. By utilizing digital phenotyping in psychiatry, many limitations of traditional clinical evaluation and diagnosis can potentially be addressed. In this study, we developed insomnia severity classification algorithms using automatically recorded passive data. This underscores the potential of wearable device data, digital phenotyping, and machine learning in providing a more reliable and scalable solution for insomnia classification.
The interpretable predictors recovered by our model can be situated within the two canonical pathophysiological frameworks of insomnia. The two-process model of sleep regulation describes sleep as the alignment of a homeostatic sleep drive (Process S), which accumulates during wakefulness, with a circadian arousal–sleep propensity rhythm (Process C). 29 Insomnia is increasingly understood as a disorder in which Process S–C alignment is disrupted and superimposed on a state of chronic 24-hour hyperarousal involving cognitive, emotional, and autonomic dimensions. 30 Notably, the most robust predictors identified in this study by SHAP analysis — delayed heart rate circadian acrophase, self-reported stress, and elevated tonic heart rate — map directly onto these two axes. This convergence suggests that the model, despite being trained without any mechanistic constraint, has recapitulated the principal physiological signatures of insomnia in a fully data-driven manner.
The leading predictor in our model, delayed acrophase of the heart rate cosinor rhythm, is best interpreted as a marker of circadian phase delay in autonomic nervous system activity. Heart rate exhibits a strong circadian rhythm driven by sympathetic–parasympathetic balance, with peak sympathetic activity typically occurring during the active phase and progressive sympathetic withdrawal during the rest phase. 31 A delay in this acrophase implies that sympathetic dominance persists into the late evening and early sleep period — a state physiologically incompatible with the rapid sleep onset and consolidated nocturnal sleep expected under intact Process C. This pattern is consistent with longstanding observations of evening-type chronotype, weakened nocturnal parasympathetic tone, and circadian misalignment in insomnia, and it provides physiological grounding for the predictive value of HR_CR_acrophase.32,33 Importantly, this interpretation positions the feature not as a statistical correlate but as a wearable-accessible proxy of Process C dysregulation, a property that is particularly attractive for longitudinal monitoring in real-world settings.
A second cluster of robust predictors — higher self-reported stress, elevated daytime minimum heart rate, and blunted maximum daytime heart rate — coheres tightly with the hyperarousal model of insomnia. Hyperarousal is characterized by sustained activation of the hypothalamic–pituitary–adrenal axis and the sympathetic nervous system across the 24-hour cycle, manifesting as elevated tonic heart rate, reduced heart rate variability, heightened emotional reactivity to daily stressors, and impaired downregulation of arousal in response to environmental cues. 34 The combination of an elevated minimum heart rate (indicating tonic sympathetic dominance) and a relatively suppressed maximum heart rate (indicating a diminished autonomic dynamic range and blunted reactivity) is a recognizable signature of this state. Coupled with the strong predictive contribution of subjective stress, these features suggest that the model is sensitive to both the autonomic and the psychological correlates of hyperarousal — factors that are widely regarded as precipitating and perpetuating in chronic insomnia. 35 The simultaneous identification of circadian phase delay and hyperarousal-related autonomic features therefore positions the model not as a black-box classifier of subjective sleep complaint, but as a tool whose decisions are anchored in the canonical pathophysiology of insomnia.
Several features, in contrast, exhibited associations that contradict established clinical expectations and warrant a more cautious interpretation. The “sleep efficiency paradox,' in which higher wearable-derived sleep efficiency was associated with greater predicted insomnia risk, is unlikely to reflect a genuine physiological phenomenon. It is most plausibly explained by two convergent factors. First, consumer-grade wearable devices systematically over-detect sleep during periods of motionless quiet wakefulness, a well-documented limitation of actigraphy-based sleep estimation 36 ; this leads to inflated sleep efficiency in patients whose hyperarousal manifests as cognitively active but motorically still wakefulness. Second, this pattern is reminiscent of paradoxical insomnia, a recognized clinical subtype in which patients report severe sleep disturbance despite relatively preserved objective sleep parameters37,38; if such cases are present in our cohort, the model may be detecting a phenotype in which subjective complaint and objective wearable measurement diverge. The paradoxical inverse association between intradaily variability and insomnia risk admits a complementary interpretation grounded in Process S: healthy individuals may engage in a more diverse mix of weekend and holiday activities (social interaction, exercise, outdoor activity), naturally producing high intradaily variability, whereas individuals with insomnia — often experiencing daytime fatigue and reduced motivation — may show monotonous, sedentary patterns that depress intradaily variability despite worse sleep at night. The paradoxical relative amplitude finding likely extends the same dynamic. None of these patterns dismiss the underlying clinical concern; rather, they highlight that wearable-derived sleep and activity metrics are noisy proxies of the constructs they are intended to capture, and that their predictive use is most defensible when paired with mechanistic interpretation rather than treated as direct clinical readouts.
Taken together, these mechanistic and paradoxical findings inform the clinical translation of wearable-based machine learning for insomnia. The discrimination–calibration profile of our model — strong recall-driven F1, moderate AUC-ROC, and a Brier score only modestly above the no-skill reference implied by the marked class imbalance — is consistent with a screening-stage decision-support role rather than autonomous diagnosis. Such a role is well aligned with the mechanistically interpretable predictors we have identified: features that explicitly index Process C dysregulation and autonomic hyperarousal can be communicated to clinicians, cross-checked against subjective report and clinical history, and integrated into existing diagnostic workflows, whereas features that behave paradoxically (sleep efficiency, intradaily variability, relative amplitude) should be flagged for clinical scrutiny rather than acted upon directly. This stratified handling of model outputs — privileging predictors with established mechanistic anchoring while contextualizing wearable artifacts as artifacts — exemplifies the clinical plausibility verification stage of a structured validation framework for healthcare AI and offers a concrete template for translating wearable-derived ML into sleep-medicine practice.
Our findings argue for a validation framework that extends beyond conventional performance metrics. Building on recent clinical AI implementation guidelines, 39 we frame healthcare AI evaluation as a three-stage process: (i) data integrity — representative, unbiased datasets reflecting the target clinical population; (ii) statistical performance — conventional metrics including accuracy, sensitivity, specificity, discrimination, and calibration across patient subgroups; and (iii) clinical plausibility verification — an explicit assessment of whether identified patterns align with established physiological knowledge. This approach directly addresses the “implementation gap' that has limited real-world deployment of healthcare AI, 40 and is consistent with the multidimensional criteria advocated by the British Standard BS30440 and the European ITFoC consortium’s seven-step framework.39,41,42 Our SHAP-based plausibility assessment operationalizes the third stage in a sleep-medicine context: it simultaneously surfaced biologically coherent predictors (delayed heart rate circadian acrophase, self-reported stress, daytime maximum heart rate) and paradoxical patterns (intradaily variability, sleep efficiency, relative amplitude). This dual outcome illustrates that high statistical performance—however quantified—is an incomplete signal of clinical readiness, and that the third stage is operational rather than aspirational.
This study has certain limitations that warrant consideration. The single-site Korean cohort used in the study may limit the generalizability to diverse populations with different sleep patterns and cultural contexts. The 4-week observation period provides only a cross-sectional snapshot of behavioral patterns and potentially misses seasonal variations or long-term trends relevant to chronic insomnia assessment. Our focus on clinical plausibility, while essential for trust and safety of the findings, may undervalue novel biological patterns that contradict the current understanding but may represent genuine discoveries. The challenge lies in distinguishing between statistical artifacts and previously unrecognized biological relationships, a distinction that requires careful clinical validation and mechanistic understanding. Future studies should employ comprehensive sensitivity analyses that compare multiple imputation strategies to ensure robust findings across different missing data scenarios.
Future research should prioritize prospective multisite validation studies employing randomized controlled trial designs with clinical outcomes as the primary endpoints. These studies must evaluate model generalizability across diverse populations and healthcare settings to ensure equitable AI deployment and identify potential algorithmic biases. Methodological advances should focus on developing interpretability-aware modeling frameworks that integrate automated feature selection with clinical knowledge constraints. The development of standardized evaluation metrics able to quantify clinical plausibility along with statistical performance represents a crucial next step in incorporating expert clinical judgment and alignment with established pathophysiological mechanisms. Finally, collaboration with regulatory agencies is essential to establish standardized clinical validation requirements for the application of AI in sleep medicine.
Conclusion
In this study, we developed insomnia prediction algorithms using automatically recorded passive data. This underscores the potential of wearable device data, digital phenotyping, and machine learning in providing a more reliable and scalable solution. This study contributes to the digital health field by emphasizing clinical trustworthiness in machine learning models. The models used can achieve excellent accuracy despite exhibiting patterns that may contradict clinical knowledge, potentially compromising physician trust and patient safety. This approach recognizes that healthcare AI must be conducive to clinical decision-making and patient care and not merely achieve statistical benchmarks. Future ML applications in sleep medicine must prioritize the integration of clinical knowledge and interpretability to ensure that advances in AI translate into safe, effective, and widely adopted clinical applications.
Supplemental material
Supplemental material - Original research article heart rate circadian phase and hyperarousal as wearable digital phenotyping of insomnia: An interpretable machine learning study
Supplemental material for Original research article heart rate circadian phase and hyperarousal as wearable digital phenotyping of insomnia: An interpretable machine learning study by Minji Kim , Seojin Yun, Hyungju Kim, Emma Matsushita
Footnotes
Authors contributions
Conceptualization: MK, SY, CHC.
Data curation: HK, SPP, CHC.
Formal analysis: MK, SY, HK, EM, CHC.
Funding acquisition: SPP, CHC.
Investigation: MK, SY, HK, EM, JWY, SK, SPP, HJL, TC, and CHC.
Methodology: MK, SY, HK, EM, TC, CHC.
Project administration: SPP, HJL, TC, CHC.
Resources: CHC.
Software: MK, SY, HK, TC, CHC.
Supervision: SK, SPP, HJL, TC, CHC.
Validation: MK, SY, HK, CHC.
Visualization: MK, SY, HK, CHC.
Writing – original draft: MK, SY, HK, CHC.
Writing – review & editing: MK, SY, HK, CHC.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the National Research Foundation (NRF) of Korea (grant number: NRF-2021R1A5A8032895, NRF-2022M3C1B6080866, and RS-2026-25471696).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated and/or analyzed in this study are available from the corresponding author upon reasonable request.
Trial Registration
Clinical Research Information Service (CRIS) KCT0009175 (Registration data: Feb-152024) (https://cris.nih.go.kr/cris/search/detailSearch.do?search_lang=E&focus=reset_12&search_page =M&pageSize=10&page=undefned&seq=26133&status=5&seq_group=26133).
Declaration of AI use
Generative AI (Google Gemini, OpenAI ChatGPT) was utilized solely for linguistic polishing and code refinement to enhance the presentation of the results. The final manuscript and all computational outputs were critically reviewed and validated by the authors, who maintain complete accountability for the integrity of the work.
Supplemental material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
