Abstract
Objective
Sleep quality is a crucial concern, particularly among youth. The integration of health coaching with question-answering (QA) systems presents the potential to foster behavioural changes and enhance health outcomes. This study proposes a novel human-AI sleep coaching model, combining health coaching by peers and a QA system, and assesses its feasibility and efficacy in improving university students’ sleep quality.
Methods
In a four-week unblinded pilot randomised controlled trial, 59 university students (mean age: 21.9; 64% males) were randomly assigned to the intervention (health coaching and QA system; n = 30) or the control conditions (QA system; n = 29). Outcomes included efficacy of the intervention on sleep quality (Pittsburgh Sleep Quality Index; PSQI), objective and self-reported sleep measures (obtained from Fitbit and sleep diaries) and feasibility of the study procedures and the intervention.
Results
Analysis revealed no significant differences in sleep quality (PSQI) between intervention and control groups (adjusted mean difference = −0.51, 95% CI: [−1.55–0.77], p = 0.40). The intervention group demonstrated significant improvements in Fitbit measures of total sleep time (adjusted mean difference = 32.5, 95% CI: [5.9–59.1], p = 0.02) and time in bed (adjusted mean difference = 32.3, 95% CI: [2.7–61.9], p = 0.03) compared to the control group, although other sleep measures were insignificant. Adherence was high, with the majority of the intervention group attending all health coaching sessions. Most participants completed baseline and post-intervention self-report measures, all diary entries, and consistently wore Fitbits during sleep.
Conclusions
The proposed model showed improvements in specific sleep measures for university students and the feasibility of the study procedures and intervention. Future research may extend the intervention period to see substantive sleep quality improvements.
Introduction
Sleep is a daily-living activity that plays a vital role in maintaining the health and well-being of individuals. 1 However, university students often encounter various stressors that can disrupt quality sleeping patterns. This includes emotional and academic demands, 2 new living arrangements, financial stress 3 and issues such as alcohol use 4 or stimulant abuse. 5 Despite compelling evidence highlighting the benefits of adequate sleep, 6 poor sleep quality is highly prevalent among university students. Many are at high risk for sleep disorders, 7 with over 60% of university students experiencing sleep disturbances. 2 Moreover, daytime sleepiness affects over half of university students compared to 36% of adolescents and adults. 8 The majority of students report having less than the recommended eight hours of sleep, 2 despite the National Sleep Foundation recommendations for young adults to sleep between seven and nine hours (although individual variability exists). 9 Young adults also often display poor sleep quality characteristics such as high levels of wake after sleep onset (WASO). 10
Poor sleep quality has been linked to various negative psychological consequences including depression, 11 anxiety 12 and loneliness. 13 It is also commonly associated with numerous chronic health problems such as heart disease, 14 high blood pressure, 15 stroke 16 and obesity. 17 These outcomes lead to further impacts on common sleep quality indicators, such as sleep efficiency (SE), 18 night-time sleep awakening frequency 19 and sleep latency (SL). 20 Furthermore, additional contributors to poor sleep behaviours include a lack of sleep education and sleep hygiene knowledge among university students.21–23 Given these prevalence rates and the adverse effects of poor sleep, there is a cogent need to develop effective sleep interventions in university students that increase behaviours that improve sleep quality among university students.
Achieving behavioural change for healthy sleep quality is complex and challenging, with lack of knowledge often being a key contributor. 23 Understanding the importance of sleep health is the initial step, but the abundance of information can be overwhelming. To bridge this gap, question-answering (QA) systems have emerged as a tool to enhance knowledge and understanding on numerous topics by providing short and precise answers to questions posed in natural language. 24 This is achieved through natural language processing (NLP), a branch of artificial intelligence (AI) with rapid developments and vast applications using large language models (LLM) for QA. Question-answering systems possess an abundance of domain knowledge, where biomedical QA systems can be trained on evidence-based medical information to increase the accessibility of expert opinions. This mimics direct access to an expert by providing timely and accurate responses to user's queries, allowing them to access evidence-based information in real-time. These QA systems have been applied for use in clinical decision support,25,26 medical examinations,27,28 consumer health questions 29 and to improve numerous health outcomes,24,25 including sleep outcomes in university settings. 30 However, despite the QA system's abundance of knowledge, providing information to patients alone in this form is unlikely to be sufficient to promote behavioural change. 31
Hence, health coaching, a person-centred approach, offers valuable human interaction that maximises an individual's potential, increases health knowledge and awareness and improves health outcomes in a sustainable and positive manner.32,33 This can empower patients for long-term behavioural change 34 through active learning, involvement in social support and problem-solving. 35 Health coaches are adept at developing personalised plans with tailored behavioural change techniques for the client's needs, with abilities to synthesise evidence-based information to create actionable insights and goals. This has displayed promise in intervening in various health conditions, leading to improvements in diet, 36 mental well-being, 37 physical activity 38 and sleep. 39 In university settings, health coaching implemented by peers has proven effective in promoting healthy lifestyles among students.40–42 By employing peer health coaches to intervene on health outcomes like sleep, it serves as an effective, scalable and sustainable approach to intervention delivery.
Although QA systems can effectively provide information, behavioural interventions rely heavily on psychological principles for their successes. Hence, to enhance these interventions, incorporating human health coaches who can incorporate evidence-based data obtained from the QA system and personalise the information to each participant's individual needs is crucial. Thus, AI solutions aiming to change individual behaviours are best complemented and overseen by human involvement. Question-answering systems have additional potential to augment peer coaching by providing extra information and insight for decision-making.43,44 Furthermore, they can assist in overcoming any knowledge gaps that may arise from coaches not being domain experts. By integrating QA systems, health coaches can access evidence-based and medically reviewed information for effective, accurate and comprehensive coaching. Therefore, we propose a human-AI symbiosis approach to combine the benefits of human health coaching (with weekly peer coaching interactions), with the advantages of AI (via the QA system) to facilitate improvements in sleep quality.
To our knowledge, there is no human-AI sleep coaching approach designed specifically to address sleep quality among university students. This study aimed to assess the feasibility and efficacy of the human-AI sleep coaching model to improve sleep quality among university students. The primary outcome assessed is change in sleep quality as measured by the Pittsburgh Sleep Quality Index (PSQI). Secondary outcomes include: (1) changes in objective and subjective measures of sleep as measured by Fitbit and sleep diaries; and (2) feasibility of study procedures to inform the design of a full-scale randomised controlled trial (RCT).
Methods
Study design
The study was a two-arm pilot RCT investigating the human-AI sleep coaching model to improve sleep quality among university students across four weeks. The study was conducted at a local university in Singapore, and ethical approval was obtained from the Institutional Review Board of Nanyang Technological University (IRB-2021-739). This study was not pre-registered. This study followed the reporting guidelines for pilot studies set by Consolidated Standards of Reporting Trials and a corresponding checklist is available in Supplemental Appendix 1.
Sleep QA system
The extractive Sleep QA system45,46 developed by the study team, utilises NLP techniques to provide answers on sleep-related queries presented in a factoid manner (i.e., questions that elicit answers that can be succinctly expressed in short texts). 47 The system was built and trained on an expert-annotated dataset comprising over 7000 medically reviewed online articles related to sleep health to curate an extensive and evidence-based dataset. A data-centric approach was adopted in the development of the QA system to improve its accuracy by refining the identification of negative passages in retrieval fine-tuning, and question reformulation (e.g., paraphrasing, back translation).
To use the system, users can enter their questions into the text box, and the system generates an immediate response. An online web interface was created to provide easy access to the QA system via computer, tablet or smartphone. It is important to note that the system is currently intended for research purposes only and should not be considered as a substitute for professional healthcare or medical advice. It was designed as a supplementary tool for care and all users were required to sign and acknowledge a medical disclaimer acknowledging this prior to accessing the system.
Intervention protocol
Control (QA system)
Participants in this group were advised to engage directly with the QA system (see Figure 1) on a weekly basis and were informed to access the interface on their preferred platform (e.g., phone computer or tablet). They were given guidance to ask factoid questions relating to aspects of sleep (e.g., sleep quality, sleep hygiene).

Screenshot of the developed question-answering (QA) system.
Intervention (QA system and health coaching)
Participants in the intervention group engaged in a weekly 30 min health coaching session across the study period conducted via an online synchronous text-based platform (Microsoft Teams) on their computer, tablet or smartphone. During the sessions, health coaches would initiate the weekly interaction with their assigned peers and ask them questions on various sleep aspects (e.g., duration, patterns, quality) while incorporating behaviour change techniques (e.g., motivational interviewing, goal setting) to adjust to their respective needs. Participants were also instructed to seek additional sleep advice by asking their coaches questions that pertain to sleep. When participants posed factoid questions, health coaches could retrieve answers from the QA system for additional evidence-based information to aid their response. If the answers and explanations provided by the system were accurate, they were advised to incorporate the information obtained into their coaching conversations. However, if the system's response was deemed incorrect or unhelpful, coaches were instructed to use their expertise and reliable sources (e.g., clinical guidelines) to answer participants’ questions and provide tailored coaching. Health coaches were undergraduate and postgraduate current students from various backgrounds (e.g., psychology, medicine, business) who had completed and passed the 8-week university health coaching course: ‘Health Coaching: Introduction to Being Cool & Sleep as a Power’, which covered how to apply coaching skills in areas such as sleep, mental health and exercise. Health coaches also received training to provide an overview of how to use the QA system in their coaching. Additionally, they attended weekly group supervision with professional health coaches to communicate concerns and receive advice for their respective coaching sessions. Lastly, the health coaches had access to the study team's dedicated sleep specialists, providing them with a valuable resource for addressing any specific questions or concerns related to sleep.
Participant recruitment
Participants were recruited by sending university-wide emails and by approaching student organisations. Through a link provided in the email, participants were directed to a form on Qualtrics (an online survey platform), where they were provided with information including the study aim, objectives and participation requirements. Students who were interested to participate were able to register their interest and were subsequently screened for eligibility. Written informed consent was obtained from participants before the study commenced.
Participant eligibility
Participants were deemed eligible to participate if they met the following inclusion criteria: (a) were at least 21 years old (in compliance with Singapore's legal age of consent); (b) did not display depressive symptoms (as assessed by having a score below 10 on the Patient Health Questionnaire-9 (PHQ-9); (c) were a current student at this university; and (d) were not undergoing any treatment for sleep and/or mental disorders and not under the care of a psychologist or psychiatrist as our intervention was not tailored for severe sleep disorders (e.g., narcolepsy, parasomnias) which require more intensive treatments. Individuals at screening who: (a) indicated depressive symptoms above PHQ-9 > 10; (b) indicated suicidal ideation on the PHQ-9 via Item 9; (c) were not a current student at this university; or (d) were undergoing any treatment for sleep and/or mental disorders, or under the care of a psychologist or psychiatrist were deemed ineligible to participate. In the case that the participants demonstrated depressive symptoms or suicidal ideation, they were referred to professional support services at the university's counselling centre.
Sample size
As a pilot study, this study aimed to inform the development of a future larger trial. Therefore, a formal sample size calculation was not conducted. Based on best practices for sample sizes of pilot studies, 30 participants per group is recommended. 48 As such, we planned to recruit 60 participants (30 per group).
Trial procedures
Participants meeting the eligibility criteria for study participation remotely provided written electronic informed consent and completed the baseline survey. Participants then scheduled for an onboarding session for study briefing and were loaned a Fitbit Inspire 2 for data collection during the study period. Attendance of the onboarding session was mandatory for participants to proceed with the study procedure. Eligible participants were randomly assigned to the intervention group or the control group in a 1:1 ratio using a random number generator. Participants were not blinded to their allocated group and were notified during the onboarding session of their group assignment.
During the intervention period, all participants were asked to complete a daily sleep diary and weekly sync their Fitbits for data collection. Participants received weekly reminders via email to upload sleep diaries and to sync their Fitbit data by accessing the Fitbit mobile app. After the intervention period, participants were asked to complete the post-intervention questionnaire. They were also asked to schedule and attend an off-boarding session for study debriefing and to return the Fitbit. Participants were reimbursed with vouchers for participant burden.
Safety net procedure
A safety net procedure was put into place throughout this trial. Health coaches were trained to identify signs of mental distress, sleep disorders or at-risk behaviours (e.g., suicide ideation, harm to self and others). While participants were not explicitly asked about their mental distress or sleep disorders, health coaches were instructed to ask their clients about their sleep health and well-being and monitor any changes in sleep behaviours. In the case that participants demonstrated any signs of mental distress or sleep disorders (e.g., irregular sleep patterns, difficulty falling asleep and severe mood swings), health coaches were instructed to inform the study team. They were told to not proceed with coaching and to consult with the professional health coach on stand-by. A referral would be made for the participant to the university's counselling centre (in the case of severe mental distress) or to sleep physicians (in the case of sleep disorder) if necessary. If participants demonstrated at-risk behaviours, health coaches were instructed to follow a university-wide protocol for at-risk students.
Primary outcome
Sleep quality
The PSQI was administered at baseline and post-intervention. The self-report questionnaire assesses subjective quality of sleep through 19 questions with scores ranging from 0 to 21. Total scores between 0 and 4 represent good sleep quality, while scores 5 and above are the threshold indicating poor sleep quality. The PSQI assesses seven components, namely subjective sleep quality, SL, sleep duration, SE, sleep disturbances, use of sleep medication and daytime dysfunction. 49 A clinically significant change in PSQI is a change of three points or more. 50
Secondary outcomes
Fitbit sleep measures
The Fitbit Inspire 2 was used to measure sleep metrics objectively. Sleep efficiency, total sleep time (TST), time in bed (TIB), SL, number of awakenings during the night (NWAK) and WASO were derived from the Fitbit data for the purposes of this study. These variables are commonly reported in the literature to assess sleep objectively among university students.51,52 By definition, SE is a percentage and could take values from 0 to 100%. Total sleep time, TIB, SL and WASO are measured in minutes and take non-negative values.
Self-reported sleep
The Sleep Foundation Sleep Diary 53 was adopted as a daily self-monitoring tool to record sleep habits and sleep hygiene. A sleep diary is considered the ‘gold standard' in terms of subjective sleep assessment 54 and in non-laboratory conditions.54,55 Data collected included subjective sleep quality, sleep disturbances, caffeine and medication consumption, daily exercise and bedtime routine. Additional data collected included bedtime, wake-up time, number and duration of daytime naps and sleep disruptions. For the purposes of this study, analyses on sleep diary data were only conducted on the following metrics: SE, TST, SL, NWAK and WASO, in alignment with the Fitbit data collected. TIB was not calculated for sleep diary data due to the absence of questions relating to ‘What time did you get out of bed for the day?’.
Feasibility
The feasibility of the study procedure and the intervention was measured based on existing feasibility indicators.56,57 The feasibility of the study procedure measured the following: (a) recruitment rate (as calculated by the percentage of people who agree to participate in the study among those who are approached and are eligible to participate); (b) retention rate (as calculated by the percentage of participants who completed the post-intervention questionnaire); and (c) adherence to sleep diary and Fitbit data collection. The feasibility of the intervention (defined as intervention adherence and engagement) was measured by the following: (a) participants’ weekly attendance to health coaching sessions; and (b) time spent per coaching session.
Statistical analyses
To assess treatment effect for the primary sleep quality outcome PSQI, linear regression models with post-treatment measurement as dependent variable and pre-treatment measurement as independent variable were fitted. Interaction between pre-treatment measurement and interventions was not included to allow for better power. For sensitivity analysis, a proportional odds model for PSQI was also fitted.
For the Fitbit and self-reported sleep outcomes, we fitted mixed regression models with baseline measurement, time (number of days from baseline), interaction of treatment and baseline as well as interaction of treatment and time as independent variables, with all models having a random intercept to account for the repeated measurements across subjects. For these outcomes, interaction terms were included since the analyses were exploratory by nature and thus statistical power is less of a concern. Fitbit and self-reported SE, TST, WASO as well as Fitbit measured TIB and self-reported SL, were fitted with linear mixed models. Sleep latency from Fitbit was fitted with mixed Tweedie regression. The NWAK measures were fitted with mixed Poisson model. The significance level for hypothesis testing was chosen at 5%, where all p-values are reported unadjusted. Treatment effects were tested by comparing the difference between the groups at Day 26 (D26) using least square means (with Kenward–Roger degree of freedom for linear models).
In the model building stage for the secondary outcomes, before we arrived at the aforementioned models, linear mixed regression was first evaluated for both Fitbit and sleep-reported TIB, TST, WASO, SL and SE measures since these variables are non-negative and continuous, and Poisson regression for NWAK measures, since these outcomes are count variables. The models for TIB, TST and NWAK fit the data rather well, which to some degree could be expected from their Gaussian- or Poisson-like distributions. However, linear mixed regression did not yield a good fit for WASO, SL and SE measures. We thus proceeded to evaluate whether (a) Tweedie regression would be a more appropriate choice for WASO and SL, since their support is [0;∞), and (b) beta mixed regression with Smithson and Verkuilen's outcome correction 58 would be a more appropriate choice for SE, since their support is [0,1]. For this evaluation, only Tweedie regression fits Fitbit-measured SL data well, and we thus used linear regression as the finalised choice for SE measures, WASO measures and self-reported SL for simplicity. Model fits were assessed with residual QQ-plots and plots of residuals versus predicted values, where DHARMa residuals 59 were used for non-Gaussian models.
For daily diary data, to rectify diary entry errors for SL and WASO, the highest 1% data points were removed for each of these variables. Analysis sets were obtained separately for each of the following categories of analysis: (a) PSQI; (b) Fitbit sleep outcomes; and (c) self-reported sleep outcomes, where only subjects with complete baseline observations across all measures of the respective category were retained.
Results
Baseline characteristics
Table 1 provides a detailed overview of participant baseline characteristics. The majority of the participants were males (63%), of Chinese ethnicity (92%), and undergraduate students (98%). Prior to the intervention, sleep quality was poor among university students, with an average PSQI score of 5.68 (SD = 2.78). Notably, 40 participants (68%) reported sleeping less than 7 h a day, falling below the minimum recommendation for individuals in this age group. 60 On average, participants slept for 6.05 (SD = 1.83) hours a day.
Demographic and baseline characteristics of participants.
PSQI: Pittsburgh Sleep Quality Index.
Participant flow
Study recruitment for the participants occurred between 31 January and 7 February 2023. A total of 89 participants were screened for eligibility, of which 12 were excluded as they did not meet the eligibility criteria (>10 on PHQ-9 or currently seeking mental health treatment) and 17 did not register for onboarding. Consequently, 60 participants consented to participate and entered the study procedure. 30 participants were randomly allocated to the intervention group and 30 participants to the control group. However, one participant from the control group did not attend the onboarding session; hence 59 participants received their allocated intervention. All participants completed the baseline questionnaire, and 29 of the 30 participants (97%) in the intervention group and 29 participants (100%) in the control group completed the post-intervention questionnaire. The full recruitment flow is shown in Figure 2.

Consolidated Standards of Reporting Trial (CONSORT) flow diagram of recruitment flow of participants.
Intervention efficacy
Table 2 presents baseline and post-intervention values for PSQI, Fitbit and self-reported sleep measures, as well as analyses conducted on the aforementioned measures. For the primary outcome, PSQI, linear regression revealed no statistical significance between the intervention and control group. The mean for PSQI was 5.32 (SD = 2.18) for the intervention group and 6.22 (SD = 2.65) for control group, and the difference was not statistically significant (adjusted mean difference = −0.51, 95% CI: [−1.55, 0.77], p = 0.40). Similar results were observed for the proportional odds model for PSQI (odds ratio of intervention group versus control group = 0.71, 95% CI: [0.28, 1.81], p = 0.47).
Pre- and post-treatment analyses of all sleep measures between groups.
PSQI: Pittsburgh Sleep Quality Index; SE: sleep efficiency; TST: total sleep time; TIB: time in bed; SL: sleep latency; NWAK: number of awakenings during the night; WASO: wake after sleep onset.
Estimate, calculated based on the fitted mixed regression models, refers to: (a) the expected ratio of the intervention group compared to the control group for objective SL and NWAK; (b) the expected mean difference in log scale for subjective NWAK; and (c) to the expected mean difference for the remaining variables.
For TST and TIB measured using Fitbit, we observed significant differences between the intervention and control group, where the adjusted mean difference assessed at mean baseline value was 32.5 [95% CI: 5.9, 59.1, p = 0.02] and 32.3 [95% CI: 2.7, 61.9, p = 0.03], respectively. Additionally, on average, the longer a person spent in bed and asleep at baseline, the more pronounced effect our intervention would have on their TST (Supplemental Appendix 2) and TIB (Supplemental Appendix 3) at D26 compared to if they were assigned to the control group. Our analyses suggest that the differences between the two groups for TST and TIB resulted from an increasing trend in the intervention group and a decreasing trend in the control group over time – the estimated slope of time trend equals 0.39, 95% CI: −0.55, 1.34 for TST of intervention group; −0.53, 95% CI: −1.43, 0.37 for TST of control group (group B); 0.36, 95% CI: −0.72, 1.43 for TIB of intervention group; −0.342, 95% CI: −1.37, 0.68 for TIB of control group). For the remaining sleep variables (e.g., WASO, SE, SL), no significant differences were observed. Similarly, for sleep variables assessed by sleep diary, no significant differences were observed for variables although TST was approaching statistical significance (p = 0.06).
Data collection adherence
Overall, the adherence to daily sleep diary completion and wearing the Fitbits was high throughout the study. Out of 59 participants, 50 (85%) consistently filled the sleep diary every day during the study duration, with an average of 27.5 (SD = 2.43) entries completed per participant. Among the remaining nine participants who did not complete the sleep diary daily, six individuals missed only one diary entry, two individuals missed three entries and one participant had 16 entries unrecorded. This highlights the overall robustness of data collected, as only 28 days (1.69%) of sleep entries were missed of the total 1652 entries expected. For Fitbit, each participant recorded an average of 24.0 (SD = 5.27) nights of sleep. However, only 11 participants (19%) wore the device consistently to sleep every day during the study duration.
Intervention adherence
Adherence to the intervention was high among participants. The average number of coaching sessions attended was 3.83 (SD = 0.53), with 27 of the 30 participants (90%) having attended all four sessions with their assigned health coach. For the remaining participants, one participant (3.33%) attended three sessions, and two participants (6.67%) attended only two sessions. Reasons for not attending the coaching session include participants failing to remember their scheduled session time and being unable to identify a suitable alternative appointment time within the week. The average duration of the coaching sessions across four weeks was 37.0 (SD = 13.3) minutes which was above the instructed 30 min.
Attrition rate
The attrition rate in the study was low. Only one participant (3.45%) from the control condition did not receive the allocated intervention for not attending the mandatory onboarding session. As such, 59 participants received their allocated intervention (30 in the intervention group and 29 in the control group).
Discussion
This pilot study assessed the feasibility and efficacy of a novel human-AI coaching model, which synergises and integrates human health coaching with a QA system, leveraging both psychological principles and evidence-based data, to improve sleep quality among university students. Our findings illustrate that health coaching, when combined with the QA system, did not achieve improvements in sleep quality as measured by the PSQI (adjusted mean difference = −0.51, 95% CI: [−1.55, 0.77], p = 0.40). However, improvements were observed in objective measures as measured by Fitbit for TST and TIB, which demonstrated better results in the intervention group than the control group across the study duration (p < 0.05). Other Fitbit measures of sleep did not show statistical improvements among university students. Self-reported sleep assessed by the sleep diary also showed no significant difference between the groups. However, it is worth noting that self-reported TST was approaching significance in favour of the intervention group (p = 0.06). Notably, adherence to the intervention and study protocol was high in both groups. Participants exhibited a strong commitment to the intervention suggesting its feasibility. Although the sleep outcomes in PSQI did not align with expected changes, the changes observed in Fitbit measures of sleep, TST and TIB, and the engagement observed between participants and health coaches highlights the potential value of incorporating human support with AI information interventions.
Overall, we observed similar university student characteristics of sleep quality through the PSQI as well as Fitbit and self-reported sleep measures. First, we observed an overall poor sleep quality among university students, indicated by an average baseline PSQI score of 5.68 (SD = 2.78). These findings align with previous literature consistently demonstrating that university students’ sleep quality tends to be suboptimal.2,61 Furthermore, the Fitbit and self-reported measures of sleep included in this study are consistent with existing sleep interventions among university students.52,61 The average TST reported in this study is also similar to that reported in other studies utilising Fitbit devices for university students.62,63 TIB, on the other hand, was slightly lower in our study compared to a similar study. 51 Nonetheless, our findings are inconsistent with our initial hypothesis and diverge from existing literature in this field as we did not observe improvements in participant sleep quality after the assigned intervention.
Several factors may explain the lack of improvements in sleep outcomes observed in this study. Firstly, it is important to consider the duration of the health coaching intervention. Previous studies implementing sleep interventions for university students have used intervention periods ranging from 1 week to 10 weeks, 64 while health coaching interventions in the literature range from 3 weeks to 18 months 65 with varying significance. The intervention duration within this study was four weeks and within the range of previously shown to be effective interventions, albeit on the shorter end of the spectrum. However, behavioural change is a complex process, and changes in sleep may not have immediate, detectable outcomes.66,67 Therefore, four weeks may have been insufficient to observe significant improvements in sleep quality, and a more long-term intervention period may be necessary.
Another crucial factor to consider is the expertise of the health coaches involved. While health coaching provided by peers has shown promise in improving lifestyle behaviours among university students,40,68 it is important to acknowledge that they may have relatively less training and coaching experience compared to professional health coaches.40,68 The level of training and experience undertaken by these health coaches within this study may have impacted their ability to effectively address sleep-related concerns. For these coaches, this was their first experience applying their coaching skills with a client after attending and passing the 8-week health coaching course, which consisted of online lectures, problem-based learning and role-play. Alongside, it was the first time for the health coaches to use a QA system, and subsequently they had to also learn how to integrate the system into their health coaching practices. To uphold the quality of the health coaching delivered, we arranged weekly health coaching supervision sessions facilitated by professional health coaches to provide their support and expertise.
Nevertheless, it is commonly reported that both QA systems69,70 and health coaching30,68 are often effective in improving health outcomes, including sleep. Question-answering systems have been shown to provide accessible treatment support and management for individuals,69,70 whereas health coaching provided by peers has demonstrated effectiveness in promoting behaviour change.30,68 Moreover, previous research has emphasised the benefits of incorporating a humanistic aspect into digital interventions for increased intervention adherence and effectiveness. 71 Consequently, we proposed a human-AI symbiosis model that merges the QA system for quick retrieval of evidence-based information with health coaching to foster behavioural change in individuals’ sleep quality. Despite efforts to promote human-AI symbiosis within the sleep human-AI coaching model, we did not observe the anticipated improvements in sleep outcomes.
However, it is imperative to note that the limitations observed in our study may be attributed to the nature of the QA system utilised – an information retrieval-based system constrained by the text corpus it was trained on. Additionally, within the current landscape, field of NLP lacks models that are inherently interpretable and explainable.72,73 Introducing models with interpretability and transparency in AI decision-making processes hold promise for addressing the limitations observed in this study and bridge the gap between questions extracted by our QA system and changes in sleep outcomes observed in students.
Limitations
There are several limitations of the study. As a pilot RCT, a power calculation was not conducted. Hence, the sample size may have been underpowered to detect any significant effect. As a result, caution should be exercised when interpreting these findings. Moreover, the study participants were predominantly of Chinese ethnicity and undergraduate students. This limits the generalisability of our findings across individuals of various ethnicities or across diverse university student populations. Further research should aim to recruit more diverse groups to enhance the validity of the findings. In this present study, Fitbit devices were employed to evaluate objective sleep measures; however, it is important to note that these measurements are influenced by Fitbit's proprietary machine-learned algorithms which may change without user knowledge over time. Consequently, the sleep metrics obtained from Fitbit might deviate from the true values derived from polysomnography, the gold standard for sleep testing, affecting its validity. In this study, we opted not to use polysomnography, as our aim was to assess student sleep under real-life, free-living conditions. Lastly, while wearable devices like Fitbit offer valuable sleep metrics, their reliability should be approached with caution as many studies have attempted to validate Fitbit's sleep measures by comparing them with polysomnography.74–77 Nonetheless, in the case of the Fitbit Inspire 2, the model used in this study, recent comparisons with polysomnography revealed no significant differences in SE, TIB and WASO, although it significantly overestimated TST. 74
Conclusions
With the increasing global interest in medical QA, and its potential applications to promote and improve numerous health outcomes, this study explored the implementation of a human-AI symbiosis model to intervene on sleep quality. This study has shown preliminary feasibility for the model, specifically for sleep health coaching in university settings, as evidenced by a high adherence rate to the intervention. However, despite this, our proposed human-AI sleep coaching model did not demonstrate significant changes in PSQI or self-reported sleep. Nonetheless, TST and TIB as assessed by Fitbit demonstrated significant changes in favour of the intervention group. Consequently, future research should consider extending the intervention duration of the proposed model as promising preliminary improvements were observed in Fitbit sleep within the current timeframe. Moreover, further research will focus on extending the effective integration of QA systems as an information and knowledge tool into health coaching practices for sleep and other conditions, while continuing to enhance the reliability and accuracy of the QA system. The potential for QA systems to be a valuable tool that equips health coaches when providing guidance to their clients is magnified with recent advancements and abilities of LLM QA systems such as ChatGPT and MedGPT. With increased accessibility to quick and accurate information, the enhanced efficiency extends the potential for human-AI symbiosis and harnesses the benefits for QA systems to enhance effective health coaching for behavioural change. This synergy of health coaching and QA systems continues to hold revolutionary promise in ultimately transforming healthcare delivery among university students.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076241241244 - Supplemental material for A pilot randomised controlled trial exploring the feasibility and efficacy of a human-AI sleep coaching model for improving sleep among university students
Supplemental material, sj-docx-1-dhj-10.1177_20552076241241244 for A pilot randomised controlled trial exploring the feasibility and efficacy of a human-AI sleep coaching model for improving sleep among university students by Jintana Liu, Sakura Ito, Tra My Ngo, Ashwini Lawate, Qi Chwen Ong, Tatiana Erlikh Fox, Si Yuan Chang, Duy Phung, Elizabeth Nair, Malar Palaiyan, Shafiq Joty, John Abisheganaden, Chuen Peng Lee, May Oo Lwin, Yin Leng Theng, Moon-Ho Ringo Ho, Michael Chia, Iva Bojic and Josip Car in DIGITAL HEALTH
Footnotes
Acknowledgements
The authors extend their sincere gratitude to both the study participants and the dedicated health coaches for their invaluable contributions and active engagement in this research.
Contributorship
Conceptualization: TEF, IB; Methodology: TEF, IB, JC; Software: QCO, PVD, SIC, IB; Formal analysis: TMN; Investigation: JL, AL; Resources: JL, SI; Data curation: JL, AL, TMN; Writing – original draft: JL, SI, TMN; Writing – review and editing: JL, SI, QCO, TEF, MP, EN, SJ, JA, CPL, MOL, YLT, MHRH, MC, IB, JC; Visualization: TMN; Supervision: IB, JC; Project administration: JL, SI, AL; Funding acquisition: JC.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This study was approved by the Institutional Review Board of Nanyang Technological University (IRB-2021-739).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Accelerating Creativity & Excellence award (grant number 020373-00001).
Guarantor
IB.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
