Abstract
Objective
Fatigue is a critical indicator in modern health management, and efficient, accurate methods for predicting fatigue levels using wearable devices have garnered increasing attention. Although recent advancements have enabled non-invasive cortisol measurement via wearable sensors, it remains unclear how effectively cortisol, in combination with other physiological biomarkers, predicts fatigue. Therefore, this study aimed to evaluate the effectiveness of a multimodal machine learning model that integrates cortisol levels and heart rate variability (HRV) for fatigue prediction.
Methods
Data from 336 participants who completed the Fatigue Severity Scale (FSS) were analyzed. Missing data mechanisms for cortisol were examined, and multivariate imputation by chained equations (MICEs) were applied. A TabNet deep-learning model was used to predict low and high fatigue levels based on HRV and cortisol data.
Results
The model using only HRV variables achieved a test AUC of 0.774, whereas the model incorporating both HRV and cortisol levels achieved 0.741, indicating a minimal overall performance difference. Feature importance analysis revealed that, in the cortisol-included model, predictions relied on a limited set of features. When feature selection was applied to this model, a reduced set of variables—age, cortisol, and logarithmic very low frequency—achieved comparable predictive performance (AUC = 0.759) without performance degradation.
Conclusion
This study demonstrated that a fatigue prediction model based on cortisol and HRV can maintain significant predictive power with a reduced number of variables. These findings suggest the potential for practical implementation in wearable devices, enabling accurate fatigue monitoring while minimizing sensor count and computational burden.
Keywords
Introduction
Fatigue is a prevalent concern in modern society that significantly affects both physiological and psychological well-being.1,2 Chronic fatigue, distinct from transient tiredness, is associated with a persistent reduction in quality of life and productivity and is frequently comorbid with conditions, such as depression, sleep disorders, and immune dysfunction.3–8 These associations underscore the critical need for early detection and effective management of fatigue. Although self-report instruments, such as the Fatigue Severity Scale (FSS), are widely used due to their practicality, their reliance on subjective input introduces limitations in reliability and precision.7,9,10 Consequently, there has been a growing interest in developing objective and quantifiable methods for fatigue assessment based on physiological data. 11
Heart rate variability (HRV), an established indicator of autonomic nervous system activity, has emerged as a valuable physiological marker in fatigue research. 12 Specific HRV indices, such as very low frequency (VLF) and low-to-high frequency ratio (LF/HF), have been linked to chronic fatigue symptoms and autonomic dysregulation,8,10,13 which are hallmarks of fatigue pathology rather than general stress responses. However, the predictive utility of HRV is limited due to its sensitivity to individual physiological differences and environmental factors.7,10,12 Furthermore, employing HRV with basic statistical or binary classification approaches may not adequately capture the complexity of fatigue dynamics.
Although HRV primarily reflects cumulative autonomic activity rather than immediate physiological responses, it may provide a novel diagnostic perspective for fatigue, distinct from conventional hospital-based static HRV assessments, particularly when continuously measured and analyzed as time-series data using wearable technologies. Accordingly, the main objective of this study is to evaluate fatigue, not general stress, using these physiological markers. However, two major challenges hinder the implementation of HRV analyses using wearable devices. First, real-world feasibility is often limited by hardware constraints and the lack of robust algorithms capable of handling signal variability. Second, although blood-based biomarkers, such as complete blood count (CBC) and thyroid function tests, are commonly used to assess fatigue in clinical settings, they are impractical for continuous monitoring in nonclinical environments. 14 Among these biomarkers, cortisol—a key hormone regulated by the hypothalamic-pituitary-adrenal (HPA) axis—has gained attention for its central role in stress physiology and its increasing accessibility through non-invasive measurement techniques. Recent advances in sensor technology have enabled cortisol detection in sweat or saliva via wearable devices, thereby presenting new opportunities for integrated, real-time fatigue monitoring. 12 For example, platforms like MyWear have successfully integrated multimodal wearable sensors with machine learning (ML) algorithms for accurate real-time stress and HRV monitoring across diverse environments. 6
Researchers have proposed the integration of HRV with additional physiological markers—particularly endocrine biomarkers, such as cortisol—as a strategy to overcome these limitations. While HRV reflects autonomic regulation, cortisol captures the endocrine dimensions of the stress response, making these biomarkers complementary. 12 The convergence of wearable technology and non-invasive biosensing allows for the simultaneous monitoring of both HRV and cortisol in everyday environments, facilitating comprehensive fatigue assessment. 13
ML techniques offer substantial advantages for modeling complex nonlinear physiological signals, such as HRV. 15 ML models, particularly when leveraging multimodal data, can flexibly capture intricate feature interactions and account for interindividual variability, thereby overcoming the limitations of traditional statistical approaches. 16 The use of ML models is particularly beneficial for integrating heterogeneous data sources, such as physiological signals and laboratory biomarkers, to achieve more accurate and personalized fatigue predictions. 17
Wearable health technologies have revolutionized the collection of continuous physiological data, including HRV, activity levels, and stress markers, thereby enabling remote patient monitoring (RPM) and wellness applications. 18 Although wearable-derived HRV has demonstrated validated efficacy in stress prediction and fatigue monitoring, 19 studies integrating HRV and cortisol for fatigue prediction are still limited. Furthermore, current research employing HRV-cortisol integrated models faces methodological constraints, including small sample sizes, and limited application of analytical approaches, such as ML techniques.6–8,20
Considering these limitations, the present study aimed to develop a predictive model for fatigue by integrating HRV and blood cortisol data collected in a clinical setting. Using machine-learning techniques optimized for multimodal physiological inputs, we assessed whether a compact set of features could reliably classify fatigue severity, thereby supporting the potential for practical fatigue monitoring via wearable technologies.
Methods
Participants
We analyzed data from a total of 336 patients (191 males and 145 females) who underwent both self-reported assessments and HRV measurements during routine health examinations at the Health Promotion Center of Seoul National University Bundang Hospital in Korea. The study population of this ambispective cohort study consisted of two cohorts: a retrospective cohort of 236 patients identified from routine health examination records, and a prospective cohort of 100 patients who were newly recruited with written informed consent. This study was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB No. B-2302-810-301).
Assessments
Data collection followed a standardized protocol across all participants. The FSS was administered within 7 days prior to the clinical examination to capture recent fatigue status. On the examination day, venous blood samples were collected before 10:00 a.m. to control for cortisol's circadian variation, followed by anthropometric measurements and HRV assessment. HRV data were acquired using 3-minute short-term recordings with the Medicore SA-3000P system (Medicore Co., Ltd., Korea) according to the manufacturer's validated protocol.
In this study, features were extracted based on demographic variables, including age and sex, as well as HRV indices related to autonomic nervous system activity and stress. Indicators related to stress and autonomic nervous system function included autonomic nervous system activity, autonomic balance, stress resistance, stress index, and cardiac stability. A total of 26 HRV features were automatically extracted by the measurement device and categorized as follows: heart rate-based indicators included average heart rate, standard deviation of normal-to-normal intervals (SDNN), root mean square of successive differences (RMSSD), approximate entropy (ApEn), sympathetic reactivity difference (SRD), and total sympathetic reactivity difference (TSRD). These variables were used to quantify the temporal stability and variability of the heart rhythm. Frequency-domain features and their logarithmic transformations included total power (TP), VLF, LF, HF, normalized low frequency (LF norm), normalized high frequency (HF norm), LF/HF, power spectrum index (PSI), and their logarithmic values TP(ln), VLF(ln), LF(ln), and HF(ln).
The FSS is a widely used nine-item self-report questionnaire designed to evaluate the severity and impact of fatigue on daily functioning over the past week. Each item was rated on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree), and the final score was calculated as the average of the nine items. Higher scores indicated more severe fatigue. The FSS has demonstrated excellent internal consistency (Cronbach's α = 0.93) and test–retest reliability, and it has been validated across various clinical populations. In this study, the FSS was used to quantify subjective fatigue levels among the participants due to its ease of administration and robust psychometric properties. 21
Data processing
Participants were classified into two groups based on the presence or absence of cortisol values: the missing group (n = 108) and the non-missing group (n = 228). We then compared the distribution of relevant variables between these groups to evaluate statistical differences. An FSS score ≥ 4.0 is the most widely accepted cut-off, indicating clinically significant fatigue. The FSS scores were binarized using a predefined cut-off point (0 = low fatigue, 1 = high fatigue), and the association between cortisol missingness and fatigue classification was assessed using the chi-square test. The results showed a statistically significant relationship between the two variables (p < 0.001), indicating a nonrandom pattern of missingness.
Missing data handling
Among the 336 participants, cortisol data were missing for 108 individuals. We first examined the underlying mechanism of missingness to minimize the impact of missing data on model performance. The normality of continuous variables was assessed using the Shapiro–Wilk test and Q–Q plots; due to violations of normality assumptions, non-parametric Mann–Whitney U tests were applied. Significant differences (p < 0.05) were observed between the cortisol-missing and non-missing groups in several demographic and physiological variables; specific details are provided in the Supplementary Material. Little's MCAR test and the score-based test by Wang et al. 22 indicated a Missing Not At Random (MNAR) mechanism. Therefore, the missing cortisol values were imputed using multivariate imputation by chained equations (MICEs). A total of 336 samples with completed imputations were used to train the TabNet model, which predicted binary fatigue status based on 25 variables, including demographic, autonomic nervous system, stress-related, and heart rate-based indicators (Supplementary Table S1, Supplementary Figure S1).
Statistical analysis
Statistical analyses were conducted to compare baseline characteristics between groups. The normality of continuous variables was examined using the Shapiro–Wilk test and Q–Q plots; due to violations of normality assumptions. Statistical analyses were conducted using the Mann–Whitney U test for continuous variables and the chi-square test for categorical variables. A significance level of p < 0.05 was considered statistically significant. All statistical analyses were performed using Python (version 3.10).
Reporting guideline
This study adhered to the TRIPOD-AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis—Artificial Intelligence) reporting guideline. 23 A completed TRIPOD-AI checklist is provided in the Supplementary Material.
Machine learning training
In this study, we developed a TabNet-based binary classification model to predict fatigue status, defined by a binarized FSS, using a comprehensive set of features, including demographic variables, stress and autonomic nervous system indicators, heart rate-based metrics, frequency-domain features and their logarithmic transformations, and cortisol levels.
Model training was conducted using stratified five-fold cross-validation, with validation and test AUC values serving as the main evaluation metrics. Depending on the experimental settings, the imputed cortisol variable was either included or excluded, resulting in three experimental conditions: (1) Model without Cortisol: a model trained using only 25 HRV-based physiological features, excluding cortisol. (2) Model with Cortisol: a model including the imputed cortisol variable obtained through MICE. (3) Feature Selection Model: a model with a reduced set of input variables selected to minimize model complexity while maintaining predictive performance. The interpretation of the models was facilitated by employing SHapley Additive Explanations (SHAP) values and permutation importance analyses to assess the contribution of each feature to model predictions quantitatively.
Results
Demographic features
Among the 336 participants, 180 were classified as having low fatigue (FSS < 4.0), while 156 were classified as having high fatigue (FSS ≥ 4.0) based on the binarized FSS. The baseline demographic and physiological characteristics of the participants, including age, sex, body mass index, antihypertensive and antidiabetic medication use and autonomic, and HRV features, were compared between the two fatigue groups and are summarized in Table 1.
Summary of participant characteristics: low vs. high fatigue.
Table 1 summarizes the demographic and physiological characteristics of participants stratified by fatigue status (low vs. high fatigue). A total of 180 participants were classified as low fatigue and 156 as high fatigue based on the binarized FSS score. Among the variables examined, age and average heart rate showed statistically significant differences between the two groups (p < 0.05), with the low fatigue group being older and having a lower average heart rate. No significant differences were observed in cortisol level, sex distribution, or other autonomic, stress-related, and frequency-domain indicators.
Abbreviations used in feature names: Sdnn (ms) = standard deviation of normal-to-normal intervals; Psi = power spectrum index; Tp (ms2) = total power; Vlf (ms2) = very low frequency; Lf (ms2) = low frequency; Hf (ms2) = high frequency; Lf/Hf (ratio) = ratio of low to high frequency power; LfNorm (%) = normalized low frequency; HfNorm (%) = normalized high frequency; Rmssd (ms) = root mean square of successive differences; Apen (unitless) = approximate entropy; Srd = sympathetic reactivity difference; Tsrd = total sympathetic reactivity difference; Tp (ln, ms2) = log-transformed total power; Vlf (ln, ms2), Lf (ln, ms2), Hf (ln, ms2) = log-transformed frequency domain features.
Model performance
Model performance was compared based on the validation and test AUC values (Table 2). The Model without Cortisol achieved an AUC of 0.774, whereas the Model with Cortisol, which included the imputed cortisol variable, yielded an AUC of 0.741. The feature selection model, which utilized only three variables—age, cortisol, and VLF (ln)—demonstrated an AUC of 0.76 while maintaining competitive performance with a minimal feature set. The ROC curves for each model condition are shown in Figure 1 (see also Supplementary Table S2, Figure S2).

Receiver operating characteristic (ROC) curves for each model condition. (A) The model, including the imputed cortisol variable (Model with Cortisol), yielded an AUC of 0.74. (B) The model excluding the cortisol variable (Model without Cortisol) achieved an AUC of 0.77. (C) The Feature Selection Model, which used only age, cortisol, and VLF (ln), demonstrated an AUC of 0.76. All models showed moderate discriminative performance, with minimal differences across experimental conditions.
AUC comparison across models.
Validation and test AUCs across three model configurations. Despite using only three input variables, the feature selection model achieved a performance comparable to the full-feature models.
Feature importance
SHAP analysis revealed that, in the Model with Cortisol, the impact of top-ranked features was clearly distinguishable, whereas lower-ranked features exhibited distributions tightly clustered around the center (Figure 2). This pattern suggests that a small number of key features contribute the most significantly to the model's predictions. The widespread of SHAP values among the top-ranked features, as opposed to the tight clustering of lower-ranked features, indicates that a few key variables, particularly cortisol, play a dominant role in prediction.

SHAP summary plots for feature importance in different model settings. (A) In the model, including the imputed cortisol variable, a small number of top features—including cortisol and age—exhibited wide SHAP value dispersion, indicating a strong influence on model predictions. Lower-ranked features showed values tightly clustered around zero. (B) In the Model without Cortisol, the overall distribution of SHAP values was more uniform, and no single feature showed dominant predictive power. These findings support the role of cortisol as a key contributor and justify its inclusion in the final feature set.
Forward feature selection further confirmed that high predictive performance could be maintained with a minimal set of variables, supporting the importance of cortisol as an informative feature. Based on this analysis, a final lightweight model was constructed using only three features: age, cortisol level, and the VLF (Supplementary Figure S3).
Discussion
In this study, we developed and evaluated ML models for fatigue prediction by integrating HRV and cortisol data collected from a cohort of 336 participants. Participants were stratified into low- and high-fatigue groups according to the FSS, 9 and subsequent feature comparisons revealed statistically significant differences in age and heart rate. In contrast, no significant group differences in cortisol levels were observed. Notably, a feature selection model incorporating age, cortisol, and VLF (ln) achieved a robust balance of predictive performance (AUC = 0.76), closely approximating the full model (AUC = 0.77) while utilizing only three features. SHAP analysis 24 further substantiated the pivotal role of cortisol in prediction, even in the absence of group-level statistical differences.
Beyond the overall predictive performance, the analysis yielded important insights into model stability and generalizability. The feature selection model that excluded cortisol exhibited the largest discrepancy between the validation and test AUCs, suggesting potential overfitting. In contrast, including cortisol resulted in more consistent AUCs across datasets, indicating superior generalization. 12 Moreover, the model incorporating cortisol achieved the most balanced outcomes across precision, recall, and F1 scores. In the absence of cortisol, the precision remained high, but the recall was substantially lower, leading to a greater number of missed positive cases. This finding underscores the practical importance of cortisol in fatigue detection, particularly in applications where false negatives may lead to overlooked fatigue-related risks. 14
Table 3 summarizes previous HRV-based studies, outlining their clinical associations.
Summary of HRV-based studies investigating clinical symptom associations.
HRV: heart rate variability; HF: high-frequency power; LF: low-frequency power; LF/HF: ratio of low- to high-frequency power; SDNN: standard deviation of normal-to-normal intervals; RMSSD: root mean square of successive differences; VLF: very low-frequency power; HR: heart rate; SampEn: sample entropy; PPG: photoplethysmography.
Although models utilizing only age as a predictor yielded reasonably good results, age is a static characteristic that cannot capture real-time physiological responses. In contrast, cortisol levels reflect acute stress-related changes and offer greater sensitivity to short-term physiological variations, which are particularly pertinent for real-time fatigue alert systems. 25 Furthermore, the cortisol-based model achieved near-optimal performance with only three features, rendering it well-suited for lightweight real-time applications in wearable environments.7,20,26
Despite its moderate predictive accuracy and relatively limited dataset, the present study represents a meaningful conceptual advance in the non-invasive prediction of fatigue using biomarkers compatible with wearable technologies. 11 The significance of this research extends beyond current performance metrics, as it demonstrates the feasibility of predicting fatigue using a minimal set of input features through machine-learning models. 27
Another strength of this study is its rigorous approach to handling missing data. In real-world RPM environments, data loss is an inevitable challenge owing to device limitations, sensor errors, and connectivity issues. 28 The modeling approach maintained robust performance despite incomplete datasets through the use of MICEs, 29 thereby demonstrating the practical viability of fatigue prediction under imperfect data conditions—a scenario frequently encountered in wearable-based health monitoring systems. 30
This study has several limitations. First, the temporal alignment of assessments differed across measures. The FSS captures subjective fatigue experienced over the preceding 7 days, while cortisol and HRV measurements reflect physiological status at a single time point on the examination day. Although FSS administration occurred within 7 days prior to biomarker collection, this temporal discordance may have introduced measurement error and potentially attenuated the observed associations between subjective fatigue and physiological markers. The cross-sectional nature of our analysis precludes causal inference, and the temporal misalignment further limits interpretability of the observed relationships. Second, the moderate sample size (n = 336) and single-center design may limit generalizability to broader populations. Third, we utilized only morning cortisol measurements, which may not capture the full dynamics of HPA axis dysregulation associated with chronic fatigue. Future studies should prioritize synchronized data collection protocols, larger multi-site cohorts, and multiple cortisol sampling timepoints to address these limitations.
A parallel can be drawn with the recent advancements in continuous glucose monitoring (CGM). Metwally et al. demonstrated that CGM data, when combined with ML, could accurately classify metabolic subphenotypes of type 2 diabetes—including muscle insulin resistance and β-cell dysfunction—with AUCs up to 0.95, using at-home oral glucose tolerance tests. 31 This approach has transformed complex hospital-based metabolic assessments into scalable real-world monitoring tools.
Analogously, integrating HRV and cortisol data for fatigue prediction in this study may facilitate more precise identification and stratification of fatigue syndromes, such as chronic fatigue syndrome, which currently lacks definitive diagnostic criteria and reliable biomarkers. 2 As future research incorporates larger datasets and more stable time-series data, the predictive performance of such algorithms is expected to improve. 32 This study contributes to the development of scalable, personalized fatigue monitoring systems utilizing RPM technologies. 33
Conclusion
This study demonstrated that fatigue levels can be effectively predicted using a minimal set of physiological features—specifically age, HRV-derived VLF (ln), and blood cortisol—through ML techniques. Despite the moderate sample size and inherent limitations of the cross-sectional data, restricted assessment of comorbid illnesses, the model achieved stable performance, suggesting the feasibility of developing lightweight, scalable fatigue monitoring systems for real-world applications. The integration of HRV and cortisol offers a promising foundation for personalized fatigue assessment tools, particularly in wearable or remote health monitoring environments. Further studies using larger longitudinal datasets are warranted to validate and enhance the clinical utility of these predictive models.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076251395570 - Supplemental material for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring
Supplemental material, sj-docx-1-dhj-10.1177_20552076251395570 for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring by Joung Eun Kim, Na Hyeon Kim, Soo Kyung Choi, Ji-Yoon Lee, Keehyuck Lee and Jong Soo Han in DIGITAL HEALTH
Supplemental Material
sj-png-2-dhj-10.1177_20552076251395570 - Supplemental material for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring
Supplemental material, sj-png-2-dhj-10.1177_20552076251395570 for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring by Joung Eun Kim, Na Hyeon Kim, Soo Kyung Choi, Ji-Yoon Lee, Keehyuck Lee and Jong Soo Han in DIGITAL HEALTH
Supplemental Material
sj-png-3-dhj-10.1177_20552076251395570 - Supplemental material for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring
Supplemental material, sj-png-3-dhj-10.1177_20552076251395570 for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring by Joung Eun Kim, Na Hyeon Kim, Soo Kyung Choi, Ji-Yoon Lee, Keehyuck Lee and Jong Soo Han in DIGITAL HEALTH
Supplemental Material
sj-png-4-dhj-10.1177_20552076251395570 - Supplemental material for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring
Supplemental material, sj-png-4-dhj-10.1177_20552076251395570 for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring by Joung Eun Kim, Na Hyeon Kim, Soo Kyung Choi, Ji-Yoon Lee, Keehyuck Lee and Jong Soo Han in DIGITAL HEALTH
Supplemental Material
sj-pdf-5-dhj-10.1177_20552076251395570 - Supplemental material for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring
Supplemental material, sj-pdf-5-dhj-10.1177_20552076251395570 for Machine learning-based fatigue classification using heart rate variability and cortisol: A multimodal approach to wearable health monitoring by Joung Eun Kim, Na Hyeon Kim, Soo Kyung Choi, Ji-Yoon Lee, Keehyuck Lee and Jong Soo Han in DIGITAL HEALTH
Footnotes
Acknowledgments
The authors would like to thank all contributors who participated in the study. No additional acknowledgments to declare.
Ethical considerations
This study was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB No. B-2302-810-301).
Consent to participate
This study was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB No. B-2302-810-301). For the retrospective cohort (n = 236), the requirement for written informed consent was waived. For the prospective cohort (n = 100), written informed consent was obtained from all participants prior to enrollment.
Consent for publication
Written consent for publication of anonymized data was obtained from all participants in the prospective cohort. For the retrospective cohort, consent for publication was waived by the IRB.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Industrial Technology Innovation Program (Project Number: 20020423), funded by the Ministry of Trade, Industry and Energy (MOTIE), Republic of Korea [Grant ID: 501100003052].
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The raw data will be made available by the corresponding authors upon request.
AI tool disclosure
The authors declared that no AI-assisted technologies were used in the writing, editing, data analysis, or figure generation of this manuscript.
Supplemental Material
All supplemental material mentioned in the text is available in the online version of the journal.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
