Abstract
Heart rate variability (HRV) is an effective tool for objectively evaluating physiological stress indices in psychological states. This study aimed to develop multiple linear regression equations to predict HRV variables using physical characteristics, body composition, and heart rate (HR) variables (eg, sex, age, height, weight, body mass index, fat-free mass, percent body fat, resting HR, maximal HR, and HR reserve) in Korean adults. Six hundred eighty adults (male, n = 236, female, n = 444) participated in this study. HRV variable estimation multiple linear regression equations were developed using a stepwise technique. The regression equation’s coefficient of determination for time-domain variables was significantly high (SDNN = adjusted R2: 73.6%, P < .001; RMSSD = adjusted R2: 84.0%, P < .001; NN50 = adjusted R2: 98.0%, P < .001; pNN50 = adjusted R2: 99.5%, P < .001). The coefficient of determination of the regression equation for the frequency-domain variables was high without VLF (TP = adjusted R2: 75.0%, P < .001; LF = adjusted R2: 77.6%, P < .001; VLF = adjusted R2: 30.1%, P < .001; HF = adjusted R2: 71.3%, P < .001). Healthcare professionals, researchers, and the general public can quickly evaluate their psychological conditions using the HRV variables prediction equation.
Keywords
The function of the autonomic nervous system related to stress can be evaluated using a heart rate variability (HRV) test.
It has been confirmed that HRV can be predicted using physical characteristic variables without using expensive equipment to measure HRV in clinical trials or in daily life.
HRV is an effective tool for objectively evaluating psychological health and can be used as a physiological stress index.
Introduction
In modern societies worldwide people experience considerable stress. Over the past 100 years, research on stress has attracted attention from many fields, including physiology, chemistry, medicine, genetics, neuroscience, endocrinology, epidemiology, psychology, and psychiatry, reflecting structural complexity both theoretically and biologically. 1 Based on a relatively new perception of the effects of various types of stress (eg, acute and chronic) on physical and mental health, public health researchers have explored the effects of stress on the health of people.2,3 High-intensity or continuous stress can cause physiological dysfunction in the human body, causing chronic diseases such as obesity, allergies, and metabolic syndrome, as well as psychophysiological diseases such as depression, insomnia, and chronic fatigue.4,5 Recently, it has negatively affected mental health due to the Covid-19 pandemic.6-8
Physical response to stress can be explained by a normal adaptation mechanism that maintains homeostasis in the body. 1 Mental tension and stress are often accompanied by various physical symptoms, mainly those related to the autonomic nervous system (ANS).9,10 Cardiovascular and nervous system diseases may occur due to abnormal reactions of the ANS,11,12 and the function of the ANS related to stress can be evaluated using a heart rate variability (HRV) test. 13 HRV refers to the degree to which the interval between the beats of the heart rate (HR) changes over time. 14 The evaluation of HRV is known to be a significant indicator of the clinical interaction of sympathetic and parasympathetic nerves, even with a short record of tens of seconds, providing important information for diagnostic evaluation.15,16 Over the past few decades, HRV has been a useful indicator in the field of clinical diagnosis. 17 The HR of healthy people changes according to the interaction of the sympathetic and parasympathetic nerves, which are branches of the ANS, and the HR interval changes slightly and continuously, even at rest, reflecting the physiological mechanism that maintains homeostasis. 18 Since sympathetic nerve activation increases the risk of heart disease and parasympathetic nerves are known to protect heart function, it is necessary to manage stress and reduce the excessive activity of sympathetic nerves caused by mental stress. 18 HRV is a useful tool for objectively evaluating psychological health and can be used as a physiological stress index.19-22
With recent developments in science and technology, the range of HRV utilization has expanded to mobile and wearable devices, and the measurement time required for analysis has emerged as a major problem. 23 When measuring HRV as a diagnostic test item in a hospital for medical purposes, a one-time measurement is required, so there are relatively few restrictions on the measurement time. However, in the case of measuring HRV for health care purposes in daily life, maintaining a stable state for at least 5 min can cause measurement hassle and fatigue, resulting in reduced usability. This study aimed to develop multiple linear regression equations to predict HRV variables using physical characteristics, body composition, and HR variables.
Material and Methods
The regression analysis used in this study is a quantitative research method used when modeling and analyzing multiple variables, and the analysis includes dependent and one or more independent variables.
Participants
This study was conducted in 680 adults (male: n = 236, female: n = 444), and there were no diseases (eg, weight changes for at least 3 months, depression, cardiovascular disease, diabetes) that could affect the measurement variables. All participants understood everything about the study and signed the IRB agreement. The participants were assigned in a ratio of 7:3 in the Bernoulli trial. 24 Approximately 70% of the assigned data (total: n = 476, male: n = 165, female: n = 311) was used in the development of the HRV estimation equation with sex, age, height, weight, body mass index (BMI), fat-free mass (FFM), percent body fat (%BF), resting HR, maximal heart rate (HRmax), and heart rate reserve (HRR), and approximately 30% of the data (total: n = 204, male: n = 71, female: n = 133) was used for the validity test. 24 The power test was performed using G*Power 3.1.9.2 (Franz Faul, University of Kiel, Kiel, Germany) at the tails of 2, H1 ρ 2 of 0.3, H0 ρ 2 of zero, significance level of 0.05 (α = .05), power of .9, and 22 predictors for all statistical tests. 24 G*Power showed that 95 participants had sufficient power for this study. All participants signed the informed consent form before proceeding with the study. The participant characteristics are presented in Table 1.
Characteristics of the Study Population.
Note. Values are expressed as mean ± SD.
BMI = body mass index; FFM = fat-free mass; %BF = percent body fat; HR = heart rate; HRmax = maximal heart rate; HRR = heart rate reserve; SDNN = standard deviation of the NN interval; RMSSD = square root of the mean of the sum of the square of differences between adjacent NN intervals; NN50 = number of interval differences of successive NN intervals greater than 50 ms; pNN50 = proportion derived by dividing NN50 by the total number of NN interval; TP = total power; VLF = very low-frequency power (0.00-0.04 Hz); LF = low-frequency power (0.04-0.15 Hz); HF = high-frequency power (0.15-0.040 Hz).
Measurements
Independent variables
Height was measured to the nearest 0.1 cm using a stadiometer (BSM 330, Inbody, Seoul, Korea). Weight, FFM, and %BF were measured using bioelectrical impedance analysis (Inbody 770; Inbody, Seoul, Korea). All participants fasted overnight prior to anthropometric and body composition measurements. BMI was calculated by dividing the weight (kg) by the height squared (m 2 ). Resting HR was measured using V800 (Polar Electro OY, Kempele, Finland). HRmax and HRR were calculated for individuals using the Karvonen et al 25 formula (HRmax: 220—age; HRR: HRmax—resting HR). 25 HRR is the difference between the HRmax during maximum exercise and the resting HR. 26 The larger the difference, the higher the reserves and the wider the range of exercise. In the preliminary study, independent variables of combinations using various formulas of HR, HRmax, and HRR were utilized in the HRV prediction equation. 27 These independent variables were verified to predict HRV. Therefore, various independent variables were calculated using resting HR, HRmax, and HRR (HR + HRmax, HR + HRR, HRmax + HRR, HR + HRmax + HRR, HRR − HR, HRmax − HRR, HR + HRmax − HRR, HRmax + HRR − HR, HRmax/HR, HR/HRmax, HRR/HR, and HR/HRR). 27
Heart rate variability
HRV was measured after arriving in the laboratory and resting for 30 min. The R-R interval data were recorded using a V800 with a Polar H10 chest strap. 27 Unfiltered R–R interval data measured in V800 were exported from the Polar Flow web service as a time-delimited CSV file. 27 For the calculation of HRV variables, an identical last 5 min segment of R-R intervals was selected from the total 10 min of the corrected V800 recordings without outlier data. 27 These selected segments were analyzed using Kubios HRV (version 3.3.1) for time- and frequency-domain components.27,28 Time-domain measures are the easiest to calculate. 27 Several parameters can be calculated: SDNN is the standard deviation of the NN intervals, RMSSD is the root mean squared of successive differences of NN intervals; NN50 is the number of interval differences of successive NN intervals greater than 50 ms; and pNN50 is the number of successive differences of intervals that differ by more than 50 ms, expressed as a percentage of the total number of successive differences of intervals.27,29,30 The frequency-domain analysis identified the effect of the sympathetic and parasympathetic paths of the ANS on HRV. 27 Total power (TP) is a short-term estimate of the total power of the power spectral density in the frequency range of 0.00 and 0.40 Hz. 27 Additionally, 3 spectral components were calculated: very low-frequency (VLF: 0.00-0.04 Hz), low-frequency (LF: 0.04-0.15 Hz), and high-frequency (HF: 0.15-0.40 Hz). 27
Statistical Analysis
Means and standard deviations were calculated for all measured variables. 27 The distribution normality of all outcome variables was verified using the Kolmogorov–Smirnov test. 27 To perform a multiple linear regression analysis, we verified whether the independent variables had explanatory power by checking the β value, which is each independent variable’s regression coefficient. 27 Multiple linear regression analysis using the stepwise technique was used to predict HRV variables (mean R-R, SDNN, RMSSD, NN50, pNN50, TP, VLF, LF, and HF) using sex, age, height, weight, BMI, FFM, %BF, resting HR, HRmax, HRR, HR + HRmax, HR + HRR, HRmax + HRR, HR + HRmax + HRR, HRR − HR, HRmax − HRR, HR + HRmax − HRR, HRmax + HRR − HR, HRmax/HR, HR/HRmax, HRR/HR, and HR/HRR. 27 We rigorously conformed to the basic assumptions of the linear regression equation: linearity, independence, continuity, normality, homoscedasticity, autocorrelation, and outliers. 31 Outlier data in the multiple linear regression equation were identified and deleted when the absolute value of the studentized residual (SRE) was ≥2. 24 Residual analysis was conducted to evaluate the appropriateness of the multiple linear regression equation. The normality of the residuals was analyzed in the P-P plot. The homoscedasticity of the residuals was confirmed by the scatter plot for homoscedasticity of the final regression equation.
The validation test calculated the predicted values of the HRV variables using the regression equation, and the mean error, standard error of estimation (SEE), root mean squared error (RMSE), and mean absolute error (MAE) were calculated using Formulas 1, 2, 3, and 4. 24 In addition, a 2-tail Pearson correlation analysis was performed to estimate the correlation between the independent and the measured HRV variables (Table 2) and the correlation between the measured HRV and the predicted HRV variables. 24 Statistical Package for the Social Sciences (SPSS) version 26.0 (IBM Corporation, Armonk, NY, USA) was used for statistical analyses, and the level of significance was set at 0.05. Representations of all figures were performed in GraphPad Prism version 7.0 (GRAPH PAD software Inc, California, USA, https://graphpad-prism.software.informer.com/7.0/) and SPSS version 26.0.
Correlation Between Independent Variables and HRV Variables for Estimating a Multiple Linear Regression Model.
Note. Significant correlation between measured HRV parameters and dependent variables, * P < .05, ** P < .01. BMI: body mass index; FFM: fat-free mass; %BF: percent body fat; HR: heart rate; HRmax: maximal heart rate; HRR: heart rate reserve; SDNN: standard deviation of the NN interval; RMSSD: square root of the mean of the sum of the square of differences between adjacent NN intervals; NN50: number of interval differences of successive NN intervals greater than 50 ms; pNN50: proportion derived by dividing NN50 by the total number of NN interval; TP: total power; VLF: very low-frequency power (0.00-0.04 Hz); LF: low-frequency power (0.04-0.15 Hz); HF: high-frequency power (0.15-0.040 Hz).
Formula 1. The calculation formula for the mean error
Formula 2. The calculation formula for the standard errors of estimation
Formula 3. The calculation formula for the root mean squared error
Formula 4. The calculation formula for the mean absolute error
Results
Performance Evaluation of Regression Equations
The detailed results of the multiple linear regression analysis of HRV variables are shown in Table 3. The performance of the mean R-R regression equation was 84.3%, and the SEE was 52.14 ms (F = 2546.057, P < .001). Furthermore, the performance of the SDNN regression equations was 18.9%, and the SEE was 20.73 ms (F = 37.929, P < .001). In addition, the performance of the RMSSD regression equation was 18.2%, and the SEE was 27.03 ms (F = 27.488, P < .001). The performance of the NN50 regression equations was 41.9%, and the SEE was 38.52 ms (F = 49.875, P < .001). Moreover, the performance of the pNN50 regression equation was 48.4%, and the SEE was 11.65% (F = 90.135, P < .001). The performance of the TP regression equation was 20.2%, and the SEE was 1572.62 ms 2 (F = 31.039, P < .001). Furthermore, the performance of the VLF regression equations was 16.9%, and the SEE was 170.97 ms 2 (F = 17.096, P < .001). The performance of the LF regression equation was 19.7%, and the SEE was 759.37 ms 2 (F = 24.255, P < .001). Finally, the performance of the HF regression equation was 23.6%, and the SEE was 875.95 ms 2 (F = 37.705, P < .001).
Performance Evaluation of Regression Models and Regression Equations of HRV Variables.
Note. Sex: 1 = male, 2 = female.
BMI = body mass index; FFM = fat-free mass; %BF = percent body fat; HR = heart rate; HRmax = maximal heart rate; HRR = heart rate reserve; SDNN = standard deviation of the NN interval; RMSSD = square root of the mean of the sum of the square of differences between adjacent NN intervals; NN50 = number of interval differences of successive NN intervals greater than 50 ms; pNN50 = proportion derived by dividing NN50 by the total number of NN interval; TP = total power; VLF = very low-frequency power (0.00-0.04 Hz); LF = low-frequency power (0.04-0.15 Hz); HF = high-frequency power (0.15-0.040 Hz).
Significant difference, P < .05.
Performance Evaluation of Regression Equations Without Outlier Data
The detailed results of the multiple linear regression analysis of HRV variables without outlier data are shown in Table 4. The performance of the mean R-R regression equation (SRE 18: n = 365) was 100%, and the SEE was .0003 ms (F = 51 929 947 102 190.61, P < .001). Furthermore, the performance of the SDNN regression equations (SRE 16: n = 353) was 73.6%, and the SEE was 5.91 ms (F = 165.546, P < .001). Also, the performance of RMSSD regression equations (SRE 12: n = 339) was 84.0%, and the SEE was 5.08 ms (F = 254.322, P < .001). In addition, the performance of NN50 regression equations (SRE 37: n = 95) was 98.0%, and the SEE was 0.56 ms (F = 1541.369, P < .001). The performance of the pNN50 regression equation (SRE 38: n = 105) was 99.5%, and the SEE was 0.18% (F = 4001.579, P < .001). The performance of the TP regression equation (SRE 19: n = 275) was 75.0%, and the SEE was 165.35 ms 2 (F = 165.173, P < .001). Further, the performance of VLF regression equations (SRE 14: n = 261) was 30.1%, and the SEE was 10.92 ms 2 (F = 112.757, P < .001). Moreover, the performance of the LF regression equation (SRE 20: n = 269) was 77.6%, and the SEE was 79.69 ms 2 (F = 155.864, P < .001). Finally, the performance of the HF regression equations (SRE 19: n = 282) was 71.3%, and the SEE was 73.30 ms 2 (F = 140.816, P < .001). The residual analysis results of the multiple linear regression equation’s homoscedasticity and normality are presented in Figures 1 and 2.
Performance Evaluation of Regression Models and Regression Equations of HRV Variables Without Outlier Data.
Note. Sex: 1 = male; 2 = female.
BMI = body mass index; FFM = fat-free mass; %BF = percent body fat; HR = heart rate; HRmax = maximal heart rate; HRR = heart rate reserve; SDNN = standard deviation of the NN interval; RMSSD = square root of the mean of the sum of the square of differences between adjacent NN intervals; NN50 = number of interval differences of successive NN intervals greater than 50 ms; pNN50 = proportion derived by dividing NN50 by the total number of NN interval; TP = total power; VLF = very low-frequency power (0.00-0.04 Hz); LF = low-frequency power (0.04-0.15 Hz); HF = high-frequency power (0.15-0.040 Hz).
Significant difference, P < .05.

Normal P-P plot of regression standardized residual: (A) SDNN, (B) RMSSD, (C) NN50, (D) pNN50, (E) TP, (F) VLF, (G) LF, and (H) HF.

Test of homoscedasticity of residuals (scatter plot for homoscedasticity).
Validity of Multiple Linear Regression Equations
The validity of the developed regression equations was calculated using data not included in multiple linear regression analyses. 31 In all regression equations of HRV variables, the mean error was −80.60% to 4.07% (SDNN: −5.37%, RMSSD: −11.24%, NN50: 4.07%, pNN50: −24.76%, TP: −20.96%, VLF: −13.32%, LF: −40.57, and HF: −80.6%), and the SEE was higher than that of the developed regression equation (Table 5).
Validity of Estimating the Multiple Linear Regression Model.
SEE = standard error of estimation; RMSE = root mean squared error; MAE = mean absolute error; SDNN = standard deviation of the NN interval; RMSSD = square root of the mean of the sum of the square of differences between adjacent NN intervals; NN50 = number of interval differences of successive NN intervals greater than 50 ms; pNN50 = proportion derived by dividing NN50 by the total number of NN interval; TP = total power; VLF = very low-frequency power (0.00-0.04 Hz); LF = low-frequency power (0.04-0.15 Hz); HF = high-frequency power (0.15-0.040 Hz).
Relationship Between Measured and Predicted HRV Variables
Figure 3 shows the relationship between the measured and predicted HRV variables. The measured time-domain variables were positively related to predicted SDNN (r = .743, P < .01), predicted RMSSD (r = .790, P < .01), predicted NN50 (r = .605, P < .01), and predicted pNN50 (r = .709, P < .01). Furthermore, a positive relationship was found between the measured frequency-domain variables and predicted TP (r = .719, P < .01), predicted VLF (r = .266, P < .01), predicted LF (r = .623, P < .01), and predicted HF (r = .741, P < .01).

Relationship between measured and predicted HRV variables: (A) SDNN, (B) RMSSD, (C) NN50, (D) pNN50, (E) TP, (F) VLF, (G) LF, and (H) HF.
Discussion
Stress is considered an essential factor in everyday life in modern society, and its management through fast and accurate evaluation is required. HRV analysis is a clinical tool that can easily evaluate stress by evaluating overall ANS health. Owing to the development of industrial technology, wearable technology has developed equipment that can evaluate stress by measuring HRV, such as mobile phone applications, wristbands, smartwatches, and other devices. However, as wearable technology equipment is expensive, it is necessary to develop an economical and simple measurement method for daily use. Recently meta-analysis studies reported that many studies had been conducted to evaluate stress as a reliable mental health index in clinical settings. 13 If the HR is monotonous and regular, the HRV shows a stable low value. In addition, low HRV negatively affects the regulation and homeostasis of the ANS function, which reduces the ability to cope with stressors in the body. 13 Therefore, it is possible to evaluate ANS in various clinical settings by measuring HRV as a noninvasive electrocardiographic method.32,33
Tsuji et al 34 reported that average HR and age were significant determinants, accounting for 37% to 51% of the 2-h SDNN, LF, and HF variance. 3 Also, HR for male and female of all ages was inversely correlated to time- and frequency-domain variables of HRV. 34 HR was related to 12.5% - 22.6% of the variation in 2-h SDNN, LF, and HF. 34 In this study, multiple linear regression equations were developed to estimate HRV variables using physical characteristics, body composition, and HR variables without clinical measurement. The coefficient of determination of HRV variables in the newly developed multiple linear regression equations was high (adjusted R 2 : 71.3%-99.5%) without VLF.
The time- and frequency-domain indices of HRV variables are standard clinical parameters. 13 Time-domain analysis involves quantifying the values of inter-beat intervals recorded continuously at any given point in time or the R-R interval recorded at a specific time (within 5-10 min).35,36 Among the time domain variables of the HRV variable, the simplest analysis is SDNN. The SDNN values decrease when the HRV is lower and regular. Therefore, SDNN has been evaluated as an indicator of physiological resilience to stress. 13 The performance of the SDNN multiple linear regression equation estimated in our study was 74.0% (R 2 ) and 73.6% (adjusted R 2 ). Among the time-domain variables of HRV, RMSSD, NN50, and pNN50 are analyzed from the difference between nearby NN intervals. The NN intervals depend on the bit-to-bit change affected by the parasympathetic nerve system (PNS).37,38 Our study showed that the multiple linear regression equation performance for the HRV variables evaluating PNS activity was significantly high (RMSSD = R 2 : 84.3%, adjusted R 2 : 84.0%; NN50 = R 2 : 98.1%, adjusted R 2 : 98.0%; pNN50 = R 2 : 99.5%, adjusted R 2 : 99.5%).
The frequency-domain method estimates the distribution of absolute and relative power as a frequency component. 39 The TP variable in the frequency domain evaluates the overall ANS functionality, similar to the SDNN variable in the time domain. The performance of the TP regression equation estimated in our study were 75.4% (R 2 ) and 75.0% (adjusted R 2 ). Among the frequency domain variables, the 2 main components representing ANS activity are LF and HF, which reflect the activity of the sympathetic nervous systems and PNS.40-42 Previous studies have shown that sympathetic nerve activity increases with decreasing HF, increasing LF, and increasing LF/HF ratio in situations causing negative emotions and stress and that HF increases when positive emotions are induced, increasing the control ability of parasympathetic nerves.43-47 In clinical practice, LF are regarded as indicators of sympathetic to parasympathetic autonomic balance, but continuous controversy exists. Our study showed that the multiple linear regression equation performance for the HRV variables evaluating ANS activity was significantly to be moderate to high without VLF (LF = R 2 : 78.1%, adjusted R 2 : 77.6%; VLF = R 2 : 30.3%, adjusted R 2 : 30.1%; HF = R 2 : 71.8%, adjusted R 2 : 71.3%). Furthermore, the correlation between the measured and predicted HRV variables was moderate to high and statistically significant without VLF (time-domain variables: r = .605-.790, P < .01; frequency-domain variables: r = .266-.741, P < .01).
Conclusion
This study demonstrated that HRV variables (time and frequency domains) in adults could be explained by sex, age, height, weight, BMI, FFM, %BF, HR, HRmax, and HRR. The performance of the time- and frequency-domain variables excluding VLF in the multiple linear regression equations developed were very high (adjusted R 2 : 71.3%-99.5%). Therefore, it has been confirmed that HRV can be predicted using physical characteristic, body composition, and HR variables without using expensive equipment to measure HRV in clinical trials or in daily life.
Limitations and Suggestions
This study had some limitations. Nonlinear regions among HRV variables require higher computational complexity. 18 Also, researchers should ensure the accurate estimation of high-frequency spectra in short-time recordings, low signal variability, and low signal-to-noise ratios when using non-linear methods. 18 For this reason, the nonlinear region was not considered in this study. HRV varies depending on the posture being measured.48-51 In our study, HRV measurements were performed in the supine position at rest. Therefore, this study’s results should be interpreted considering the measurement posture. Multi-collinearity is one of the important analyses in multiple linear regression analysis. Multi-collinearity is problematic when VIF is 10 or higher. However, this study’s purpose was to develop a high-performance predictive equation that well predicts HRV variables. Therefore, we developed a prediction equation without considering multi-collinearity.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Culture, Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture, Sports and Tourism in 2020 (Project Name: Development of customized smart fitness service to support the personal life span, Project Number: SR202006002, Contribution Rate: 50%). This research was funded by Grant No. PJ016144022021 from the Rural Development Administration of Korea.
Ethics
The study was conducted in accordance with the guidelines of the Declaration of Helsinki and was approved by the Institutional Review Board of Konkuk University (7001355-201903-HR-305).
