Abstract
OBJECTIVE
Poor sleep quality is thought to be a contributor to medical student stress. The authors evaluated the effect of high and low periods of academic stress on sleep quality and quantity in first-year medical students.
METHODS
A group of 25 students in their first year of medical school were provided Fitbit Charge 3 activity trackers for continual use and were surveyed at 4 intervals to assess stress level, sleep quantity, and sleep quality. Fitbit data were collected through the Fitbit mobile app and uploaded to the Fitabase (Small Steps Labs, LLC) server. Data collection times were scheduled around the academic exam schedule. Weeks in which testing occurred were identified as high-stress periods. Results from assessments were compared to nontesting periods of low stress.
RESULTS
During stressful periods, students slept an average of one hour less per 24-h period, took more naps, and reported poorer sleep quality than during the low-stress periods. No significant change was seen in the 4 surveyed intervals in sleep efficiency or sleep stages.
CONCLUSION
Students slept less and had poorer quality sleep in their main sleep event during stressful periods but attempted to compensate with increased napping and weekend catchup sleep. The objective Fitbit activity tracker data were consistent with and validated the self-reported survey data. Activity trackers could potentially be used to optimize the efficiency and quality of both student napping and main sleep events as one component of a stress reduction program for medical students.
Introduction
The mental health of medical students has been an ongoing concern for decades, with high rates of burnout, 1 depression,2,3 suicidal ideation,4,5 substance abuse, 6 and other responses to the high levels of stress inherent in the medical education of physicians. 7 Many factors have been associated with burnout including poor sleep, 7 long work and study hours, 8 and high stress. 9 There are many different definitions of burnout, but as it relates to medical training, the most relevant definition is “a syndrome resulting from work-related stress characterized by emotional exhaustion, feelings of cynicism and detachment toward patients (depersonalization), and a low sense of personal accomplishment.” 9 More than one-third (35.2%) of medical students experience symptoms of burnout and are at increased odds of experiencing depressive symptoms, a statistic that only shows minimal improvement when compared to residents and fellows. 9 More specifically, more than half of medical students experience burnout during their medical education, with symptoms persisting well after medical school. 1 Stress and symptoms of burnout can affect many aspects of a student's life, including academic performance. 10 Many medical students report symptoms of burnout and poor sleep, and students who sleep less than 6 h nightly experience a higher risk of adverse effects. 11 As sleep quality and stress are interrelated, with a cycle that can occur with increased stress leading to poor sleep and poor sleep leading to more stress, we sought a better method of quantifying sleep to understand the effect of stress on our population of medical students with the goal of understanding the sleep component of burnout.
New commercial models of activity trackers are showing promise in accuracy of sleep measurements and in cost effectiveness, as opposed to more traditional trackers in research. Many studies have used the Fitbit (Fitbit, Inc.) in particular for measuring sleep as they have the advantage of cost effectiveness, ease of data collection, and measurement of activity. 12 In a study by Cook et al, the Fitbit Altra was used by a population of patients affected by central disorders of hypersomnolence, where Fitbit sleep stage data were compared to polysomnography in multiple sleep latency tests. 12 Results of the previously mentioned study showed a “high sensitivity (0.96 ± 0.02) and accuracy (0.90 ± 0.04), with moderate-to-low specificity (0.58 ± 0.16).” Fitbits have been used in a variety of projects from measuring the effects of overnight pages on residents 13 to measuring the impact of duty hours and sleep for residents. 14 Fitbits have demonstrated “high intradevice reliability”; however, there are limitations as both Fitbit and actigraph have a tendency to overestimate the amount of sleep and sleep quality so their use has only been recommended for “normative populations.”15,16
The Pittsburg Sleep Quality Index (PSQI) is often used to track sleep quality and quantity. 17 A prior study in a large group of college students showed that “stress accounted for 24% of variance in the PSQI score, whereas exercise, alcohol and caffeine consumption, and consistency of sleep schedule were not significant predictors of sleep quality.” 18 There is a significant correlation between sleep quality and academic burnout, where poor sleep quality is strongly predictive of burnout. 19 Since most sleep quality studies have used the PSQI or similar survey instruments to gather self-reported (subjective) data reflective of personal perception, our hope was that the activity trackers would provide objective data that would correlate with, or at least compliment, the survey data. The trackers would thus potentially become a more objective measure of sleep quality and, by proxy, academic burnout.
Methods
Participant Selection
Twenty-five first-year Doctor of Osteopathic Medicine students in Mississippi were randomly selected from students who expressed an interest in the project and met eligibility requirements. To be eligible, participants had to be at least 21 years of age; be in good academic standing; be free of any chronic illness or physical disability which would preclude normal adult daily activity; be free of any diagnosed sleep disorders; not be taking any medication that would potentially impact heart rate; and agree to wear a wrist-worn activity tracker at all times excluding during tracker cleaning or while bathing, showering, during medical school exams, or during other activities which might damage the tracker.
Once students were selected, they attended an onboarding session during which they were given information about study requirements, potential risks, and potential benefits of participation. They provided written informed consent, and activity trackers (Fitbit Charge 3, Fitbit, Inc.) were distributed. The study protocol was approved by the William Carey University Institutional Review Board on January 11, 2019 (protocol nu. 2019-004) and renewed on March 25, 2022 (protocol nu.2022-042). Study participants were not incentivized outside of their ability to keep the activity trackers at the conclusion of the study. Participants were asked to wear activity trackers (Fitbit Charge 3, Fitbit, Inc.) for a period of 6 months during academic year 2020 to 2021 all hours of the day excluding during activities when they are not allowed (during school exams), or during those activities outlined above that might damage the trackers. When users synced their device with the Fitbit app, Fitabase (Small Steps Labs, LLC) would then gain access to that data and aggregate it by device serial number so that it could be downloaded for de-identified analysis.
Several assessments and a demographic survey were administered via an online survey software platform (Qualtrics, 2020) at 4 time points. The time points coincided with 2 periods of low stress and 2 periods of high stress. These designations were based on the first-year students’ academic workload and exam schedule and were confirmed with the Perceived Medical School Stress (PMSS) Assessment. Data were extracted from a 9-day window that spanned each of these time periods. Daily data were considered invalid if the participant failed to wear the device for less than 16 h in a 24-h period. Sleep-related variables collected by the activity tracker included time to fall asleep, total time asleep, and the percentage of time spent in light, deep, and REM sleep.
Instruments
The PSQI 17 is an indicator of adult sleep quality and patterns over the last month. It measures 7 domains of sleep including subjective sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, use of sleep medication, and daytime dysfunction. A global sum of 5 or greater indicates “poor” sleep and scores under 5 indicate “good” sleep. The PSQI has good internal consistency 20 and a reliability coefficient (Cronbach alpha) of .83 for its 7 components. 17 One item on the PSQI (Question 7) asks individuals how often they have taken prescription and/or over-the-counter medication as a sleep aid over the last 30 days. This item was used to validate the medication information provided on the demographic survey.
The 20-Item Short Form Survey (SF-20)21,22 was developed for the Rand Medical Outcomes Study, a longitudinal study of patients with chronic disease conditions. The instrument measures health across 6 domains including physical functioning, role functioning, social functioning, mental health, current health perceptions, and pain.
The PMSS Scale was also administered.23,24 This instrument assesses 4 areas of stress including medical school curriculum and environment, personal competence and endurance, social and recreational life, and finances by asking respondents to rate their level of agreement with 11 negatively-worded items on a 5-point Likert scale (0 = strongly disagree to 4 = strongly agree). Higher scores indicate increased stress.
Statistical Analysis
Data were analyzed with SPSS (IBM SPSS Statistics for Windows, Version 25, 2017). Categorical demographic variables were analyzed and reported with frequency and percent. Longitudinal variables were analyzed and reported as mean and standard deviation, or median and range as appropriate. Main sleep was defined as the longest sleep recorded by the activity tracker that provided sleep stage information in a 24-h period. Average sleep time was calculated by summing the main sleep and nap times for all days during each 9-day collection window that provided at least 16 h of data and dividing the total by the number of days of valid data available. A nap was defined as a sleep event within the 9-day collection period that lasted longer than 20 min in duration that was shorter in duration than the main recorded sleep event. Percentage of subjects who napped was calculated by identifying those participants who had two or more nap events divided by the total number of participants enrolled in the study at the time of each data collection point. Average daily nap time was calculated by adding up the total time spent napping and dividing by the number of naps taken during each of the 9-day collection periods. Average percent of nap days was calculated by determining the number of days where a nap took place within the 9-day collection period and dividing by 9 days. Sleep efficiency was calculated by dividing the minutes asleep by total number of minutes in bed less minutes after waking. Percent of “poor” sleepers was determined by calculating the number of individuals with a PSQI score of 5 or more and dividing by the total number of participants at each time period. For data available across 4 time points, a 1-way ANOVA was performed with significant differences between groups determined by Bonferroni post hoc testing. For data available across 2 time points (one high stress time point and one low stress timepoint), the Wilcoxon signed rank test for related samples 25 was used to determine whether a significant difference existed between times of high stress and times of low stress for variables that were not normally distributed. The dependent t test was used to determine whether a significant difference existed between times of high stress and times of low stress for variables that were normally distributed. A correlation analysis was conducted to determine how strong the relationship was between activity tracker data reflecting total time asleep per night and self-reported data collected via the PSQI.
Results
Demographics of Study Population
Twenty-five participants, all first-year medical students, wore activity trackers (Fitbit Charge 3, Fitbit, Inc.) for 6 months. The median age of participants was 25.6 years and ranged from 22 to 29 years. Sixty percent of participants (n = 15) were Caucasian, and 40% (n = 10) were Asian. Among the 25 participants, 52.0% (n = 13) were male, and 48.0% (n = 12) were female.
Results of the sleep analysis are reported in Table 1. The average sleep time during the 2 periods of high stress as collected over a 24-h period by the activity tracker ranged from 5.39 h to 9.97 h with an average of 6.35 (∓1.16) hours (Time 1), and from 5.28 h to 9.9 h with an average of 6.84 (∓1.61) hours (Time 2). The average sleep time during the 2 periods of low stress ranged from 4.58 h to 9.86 h with an average of 7.65 (∓1.61) hours (Time 3), and from 3.20 h to 12.09 h with an average of 7.34 (∓2.23) hours (Time 4). There was a significant difference between the average sleep time across the 4 time periods (DF: 1, 11; F: 365.27, P < .001). The post hoc test indicated that average sleep time at Time 1 (high stress) was significantly different from Time 4 (low stress). Participants slept significantly longer during periods of low stress (P = .03).
Differences in Sleep Behavior, Perceived Medical School Stress, and Overall Health Between Periods of High Stress and Periods of Low Stress.
Average main sleep time as recorded by the activity tracker (exclusive of time spent napping) was significantly correlated with average sleep time over the last thirty days as self-reported on the PSQI (Pearson Correlation 0.356, P < .001). Self-reported average sleep time during the 2 periods of high stress ranged from 4.0 to 8.0 h with an average of 6.48 (∓1.16) (Time 1), and from 5.0 to 9.0 h with an average of 6.28 (∓1.41) (Time 2). The average self-reported sleep time during the 2 periods of low stress ranged from 4.0 to 11.0 h with an average of 6.62 (∓1.88) (Time 3), and from 5.0 to 11.0 h with an average of 8.12 (∓1.73) (Time 4). There was a significant difference between the self-reported average sleep time across the 4 time periods (DF: 1, 21; F: 691.26, P < .001). The post hoc test indicated that average sleep time at Time 2 (high stress) was significantly different from Time 4 (low stress). Participants reported significantly longer sleep during periods of low stress (P = .004).
There were 18 participants for which there was valid napping data during both a time of high stress and a time of low stress. During times of high stress, the median nap length was 130.5 min and ranged from 87.0 min to 473.8 min. During times of low stress, the median nap length was 102.7 min and ranged from 67.0 min to 521 min. Participants napped significantly longer during times of high stress (P < .048). The average percentage of participants who napped was also significantly different over the 4 time points (DF: 1, 11; F: 25.36, P < .0001). During times of high stress, the percentage of participants who napped was 68.00% during Time 1, and 72.20% during Time 2. During times of low stress, the percentage of participants who napped was 66.70% during Time 3, and 70.60% during Time 4. The post hoc test indicated that the percent who napped at Time 1 (high stress) was significantly different from the percent who napped at Time 4 (low stress). Significantly more participants napped during periods of high stress (P = .025). Of the 17 people that had valid data across all 4 time points, 9 (52.94%) were consistent nappers and napped across all 4 time points.
There was a significant difference in the percent of nap days between the 4 time points. The average percentage of nap days during the 2 periods of high stress were 14.20% (Time 1) and 18.76% (Time 2). The average percentage of nap days during the 2 periods of low stress were 30.21% (Time 3) and 25.56% (Time 4). There was a significant difference between the average percentage of nap days across the 4 time periods (DF: 1, 11; F: 25.36, P < .0001). The post hoc test indicated that the average percentage of nap days at Time 1 (high stress) was significantly different from Time 4 (low stress). Participants had a significantly greater percentage of nap days during periods of low stress (P = .041).
The results from the global scores on the PMSS indicate that there is a significant difference between the 4 time points (DF: 1, 12; F: 226.62, P < .0001). During the 2 periods of high stress, the average scores were 33.68 (Time 1) and 35.28 (Time 2). During times of low stress, the average scores were 29.95 (Time 3) and 29.94 (Time 4). The post hoc test indicated that the Time 2 score (period of high stress) was significantly different from the Time 3 score (period of low stress) (P = .005).
The scores for the PSQI were analyzed using 2 different methods. First, continuous global scores were compared across the 4 time points. During times of high stress, the average score was 6.44 (Time 1) and 6.67 (Time 2). During times of low stress, the average scores were 4.48 (Time 3) and 4.53 (Time 4). There were significant differences across the 4 time points (DF: 1, 12; F: 133.26, P < .0001). Post hoc test shows the average scores at Time 2 were significantly different than at Time 4 (P = .044). This indicates that students had significantly higher scores on the PSQI during times of high stress. Higher scores indicate lower sleep quality.
The second method used to analyze the PSQI was to transform the continuous global score variable into a dichotomous variable that represented poor sleep. A score of 5 or more is indicative of poor sleep. During periods of high stress, the percentage of students that had scores indicative of poor sleep was 88.00% (Time 1) and 94.40% (Time 2). During periods of low sleep, the percentage of students that had scores indicative of poor sleep was 85.70% (Time 3) and 41.20% (Time 4). There were significant differences between the 4 time periods (DF: 1, 12; F: 58.91, P < .0001) Post hoc tests indicated that the percentage of students with poor sleep at Time 1 (period of high stress) was significantly different than at Time 4 (period of low stress) (P = .008) and Time 2 (period of high stress) was significantly different from Time 4 (period of low stress) (P = .001).
There is a significant relationship between napping and poor main sleep during times of high stress. This indicates that during stressful times, those that nap are significantly more likely to have poor main sleep quality (χ2 = 4.19, DF: 1, P < .04). This relationship does not hold true in times of low stress where students may be napping more from habit or enjoyment (appetitive napping).
The scores on the PMSS validated that Time 1 and Time 2 were higher stress time periods than Time 3 and Time 4. There was a significant relationship between PMSS score and poor sleep. For every one-unit increase in PMSS, the odds of poor sleep increased by 1.137 (P = .04) across all time points. When stratified by the expected category of stress, during the 2 stressful time periods, for every one unit increase in PMSS, the odds of poor sleep increased by 1.258 (P = .04), and during the 2 low stress time periods, for every one unit increase in PMSS, the odds of poor sleep increased by 1.923 (P = .05).
No significant differences were found with regard to sleep efficiency, or percentage of time spent in sleep stages (light, deep, and REM) across the 4 time points (Table 1). Additionally, no significant difference was found with regard to overall student general health nor at the subscale level as measured by the SF-20 across the 4 time points (Table 2).
Results From SF-20 Over 4 Time Periods.
Discussion
This longitudinal pilot study of stress and sleep in first-year osteopathic medical students provided a comparison of subjective self-report data of PMSS and sleep quality to objective data recorded via wrist-worn activity trackers. Other studies have examined sleep quality among undergraduate students using wrist-worn activity trackers, 26 but have not done so longitudinally to determine whether stressful academic situations modify sleep patterns and quality in medical students.
This study has several limitations that should be considered. Firstly, this was a longitudinal pilot study with a small sample size (n = 25). Neither a sample size calculation nor a power analysis was performed for this study. All participants were in good health as evidenced by scores on the SF-20. Our study did not have sufficient power to detect significant differences in subscales of the SF-20 over the 4 time periods. If the study was repeated with a larger sample, or in a group of students that were more variable with regard to health, then different results are possible.
Approximately two-thirds of participants took naps regularly across the 4 time points. When the activity tracker records naps, it does not provide the stages of sleep for those events. The percentage of time spent in light, deep, and REM sleep is only recorded for main sleep events. We reported average sleep for each time point in Table 1 2 ways, one which included time spent napping as recorded by the activity tracker, and one which included only the main sleep event. The data in Table 1 that reports the percentage of time spent in each stage of sleep are only reflective of the main sleep event. Data provided by Fitbit, Inc. 27 indicate that the majority of people spend about half of their sleep (50%-60%) in light sleep, approximately 10% to 15% in deep sleep, and the remaining time in REM sleep (20%-25%). They also provide normative data stratified on age and gender. The data from our study that was retrieved from Fitabase (Small Steps Labs, LLC) were de-identified and as such was not analyzed by age and gender. Generally, our participants fell within the acceptable ranges of the various sleep stages across all 4 time points. With such a high percentage of consistent nappers, it would be helpful if sleep quality indicators (sleep staging) were available for the time during which the students were napping, especially considering a large portion of the secondary sleep events were only slightly shorter in duration than the main sleep. Had that data been available, it might have provided insight into the role naps play in overall sleep efficiency, and the quality of sleep achieved during naps.
Another potential limitation is the difference in data collection periods for the PSQI and the activity tracker data. The PSQI asks respondents to reflect on their sleep patterns and habits over the last 30 days. We aggregated data from the activity trackers over 9-day periods corresponding to what should have been either a more stressful time period or a less stressful time period. The difference in “look back” period may have encouraged respondents to report their sleep habits more generally for the PSQI than what was recorded from the activity trackers.
During times of high stress, n = 28 (65.1%) individuals reported their quality of sleep as good or very good. Based on a minimum standard of 6 h of sleep, 28 the Activity Tracker data reflect that only 37.2% (n = 16) are experiencing good or very good sleep per night. The percent agreement during times of high stress between perceived, self-reported quality of sleep, and objective activity tracker data is 43.8%. During times of low stress, 65.8% (n = 25) of individuals reported their sleep as good or very good. The Activity Tracker data during this time indicate that 81.6% (n = 31) are experiencing good or very good sleep per night. The percent agreement for times of low stress is 63.16%. These results indicate that perceived sleep quality is not in line with actual sleep quality as measured by the Activity Tracker when based on a minimum of 6 h of sleep. During periods of high stress, students are overestimating their sleep quality; during periods of low stress, students are underestimating their sleep quality.
Insufficient sleep duration and sleep disorders like insomnia are highly prevalent in the general US population. 29 Previous studies have suggested that those who identify as someone who is of a racial or ethnic minority may be more likely to experience sleep patterns leading to adverse health outcomes and thus they may be more likely to experience health disparities. 30 Ethnic minority medical students have also reported significantly lower levels of sleep adequacy and sleep quantity and significantly higher levels of sleep somnolence than Caucasian students. 31 One of the limitations of our pilot cohort was the small size and lack of ethnic and racial diversity. Future sleep studies should seek to include minority populations to the greatest extent possible. Our sample is not fully representative of the racial make-up of osteopathic medical school students in general. The latest data (2018) from all Colleges of Osteopathic Medicine matriculants nationwide 32 showed that 48.6% were female, the mean age was 23 (down from 24 in 2017), and 9.6% of matriculants were underrepresented minorities (down from 10.3% in 2017). With regard to race and ethnicity, 61.0% were Caucasian, 23.8% were Asian, 6.6% were Hispanic/Latino(a), 3.2% were African American, 3.5% were 2 or more races, 0.2% were American Indian or Alaska Native, and 0.1% were Pacific Islander. Allopathic schools in 2019 reported matriculant demographics as being 54.2% female, having a median age of 23, 60.1% Caucasian, 25.5% Asian, 11.1% Hispanic/Latino(a), and 8.3% African American. 33
We did not attempt to correlate academic performance with sleep quality or stress. Results from studies conducted with medical school students in other countries have shown that the grade point average for poor sleepers was significantly lower than that of good sleepers. 34 Another study 35 found a significant relationship between sleep quality as defined by the Global PSQI score and self-reported academic performance (r = 0.26; P ≤ .001).
A recent study of allopathic medical students through all training years and multiple schools conducted by Ayala et al showed that even with the majority of students reporting main sleep length only slightly lower than the adult US population, most students have poor sleep quality. 31 This was confirmed in our study of first-year osteopathic medical students. In addition, this study has also demonstrated a noticeable pattern of increased daytime napping and extended weekend sleep to make up for lost sleep during times of stress. Although properly structured daytime napping is restorative, weekend recovery sleep does not have a positive metabolic effect on sleep-restricted individuals. 36
Conclusion
In conclusion, the percentage of first-year medical students who experience poor sleep quality is high. As might be expected, poor sleep was significantly more prevalent during times of high stress when compared to times of low stress. Perceived stress (self-reported on the PMSS) was significantly higher during times of stress (Times 1 and 2) than times of low stress (Time 3 and Time 4). Students who napped showed clear patterns of “catch up” sleep. Those that nap slept less during the week and had longer sleep sessions on the weekends. There was a clear pattern for most.
There is an argument to be made that the students would benefit from education on healthy sleep habits. It would be interesting to see how sleep patterns changed after an intervention that included healthy sleep education. For instance, education could be provided on maximizing effectiveness of napping for sleep deficit recovery, refreshment, learning enhancement, 37 retention of information, teaching napping skills, 38 and discussing nap duration. 39 Maximizing effectiveness of main sleep using techniques such as cognitive behavioral therapy7,40 to cope with anxiety or stress could increase sleep efficiency; better quality sleep should lead to greater resilience against stress and burnout. Additionally, administrators could potentially use activity trackers as a way to track high-risk students; perhaps, a threshold could be set, and high-risk students could receive additional counseling or other form of stress reduction.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: William Carey University College of Osteopathic Medicine.
