Abstract
Background
FysBot is a ChatGPT-based mobile app developed to promote physical activity among adults living with obesity. This pilot study aimed to evaluate the feasibility and usability of FysBot.
Methods
A 6-week single-arm pilot study was conducted in which patients from an obesity rehabilitation clinic in Norway used FysBot. This pilot study employed an explanatory sequential mixed-methods design combining questionnaires and post-intervention interviews. Participants completed questionnaires at baseline and weeks 2, 4, and 6, assessing leisure-time physical activity (Godin Leisure-Time Exercise Questionnaire (GODIN)), motivation (Behavioral Regulation in Exercise Questionnaire-2 and relative autonomy index (RAI)), Self-Efficacy for Exercise (SEE), and System Usability Scale (SUS). Semi-structured interviews were conducted to explore user experiences further. Quantitative data were analyzed descriptively, with multiple imputations for missing data, while qualitative data were analyzed thematically.
Results
Fifty-three participants were eligible, 36 completed baseline, and 17 completed the final follow-up. App engagement declined steadily, with most participants ceasing use after week 2. The mean SUS score was 51.3, indicating below-average usability. The median of self-reported leisure-time physical activity (GODIN: 34–40) and overall motivation (RAI: 8.3–9.8) showed small, non-significant increases, while identified regulation increased significantly (2.8–3.3; p = 0.04) and SEE decreased (58–49). Qualitative findings supported these results, showing that participants valued the chatbot's motivational potential but experienced technical problems and limited personalization.
Conclusions
This study offers insight into the potential of a ChatGPT-based physical activity app for adults living with obesity and highlights key areas for refinement. Future iterations should incorporate user-requested features through iterative co-design, with enhanced personalization and guidance to improve relevance and engagement.
Plain language title
Testing an AI Chatbot app that uses ChatGPT to help people be more active
Plain language summary
Being physically active can be difficult for adults, including people living with obesity, even though there are lots of studies that say it is highly beneficial. To help support being active, the research team developed FysBot, a mobile app that uses ChatGPT, an artificial intelligence (AI) chatbot, to encourage and guide users to be more active in daily life. This study tested whether the app was practical, acceptable, and motivating for adults living with obesity.
The research team, in collaboration with an obesity rehabilitation clinic in Norway, invited patients who were attending or had previously attended the clinic to use FysBot for 6 weeks. The participants completed questionnaires about their physical activity, motivation, confidence to exercise, and the app's usability. After 6 weeks, some participants took part in interviews to share their experiences in more detail. Combining the results from the questionnaires and interviews helped the researchers understand whether the FysBot app influenced their physical activity behavior and their personal experiences with the app.
Most participants found the chatbot's exercise suggestions and reminders helpful at first. However, there were technical issues, no opportunities to personalize the app, and difficulties fitting it into their daily routines, which reduced their motivation and engagement with FysBot after the first few weeks. They rated the app's usability below average.
The study shows that an AI chatbot app like FysBot can be a promising tool for supporting physical activity among people living with obesity, but further development is needed. Fixing the technical issues and making it possible to integrate into users’ everyday lives could make future versions more engaging. FysBot illustrates how AI can support healthy behavior change.
Keywords
Introduction
Despite the well-documented benefits of physical activity (PA), 1 approximately one-third of the world's adult population does not meet the World Health Organization's PA recommendations. 2 Physical inactivity is a global public health challenge, contributing substantially to the growing burden of non-communicable diseases such as cardiovascular disease, type 2 diabetes, and certain cancers.3,4 This pattern is also evident in high-income countries, where the prevalence of overweight and obesity remains high. 5
Individuals with obesity, including those living in high-income countries such as Norway, encounter persistent barriers to engaging in and maintaining regular PA. Commonly reported challenges include physical discomfort, lack of motivation, and lack of time. 6 These barriers are also relevant for patients attending specialized weight management or rehabilitation clinics, particularly during extended periods at home or after completing treatment.
Digital interventions that utilize conversational agents, such as chatbots, have shown promise in improving PA behavior.7–10 Furthermore, chatbots based on large language models (LLMs), exemplified by ChatGPT 11 and widely adopted since their release in late 2022, enable more adaptive and natural user interactions compared to earlier rule-based chatbots,12,13 and may therefore enhance user engagement and intervention efficacy.
In previous work, we explored the preferences of individuals living with obesity regarding a PA chatbot, 14 representing an initial co-design phase where users provided input on the chatbot's design and functionality. Building on these findings, we developed and tested a prototype of FysBot, a PA app with a ChatGPT-based chatbot, among adult volunteers recruited from our network.15,16 Feedback from this prototype testing informed the app's further development.
The objective of this pilot study was to evaluate the feasibility of the FysBot app among current and former patients of a rehabilitation clinic for individuals living with obesity by examining app engagement, self-reported PA behavior, motivation, and self-efficacy, as well as app usability.
Materials and methods
Study design
This study was a 6-week non-randomized, single-arm pilot feasibility trial using an explanatory sequential mixed-methods design to evaluate the prototype of the FysBot app. The study is reported in accordance with the CONSORT extension for pilot and feasibility trials. 17 Quantitative and qualitative data were collected in two connected phases. During the intervention, structured questionnaires assessed quantitative measures such as self-reported PA (Godin Leisure-Time Exercise Questionnaire (GODIN)), motivation (Behavioral Regulation in Exercise Questionnaire-2 (BREQ-2) sub-scales and the relative autonomy index (RAI)), self-efficacy for exercise (SEE), and usability (System Usability Scale (SUS)]). After the quantitative phase, a sub-sample of participants participated in semi-structured interviews exploring their experiences with the app and perceptions of its usefulness. The qualitative component was designed to elaborate and explain quantitative patterns, such as engagement levels, thereby providing contextual depth to the numerical findings.18,19 Integration of the quantitative and qualitative data occurred primarily during the interpretation stage, when qualitative data were compared with quantitative trends to identify similarities and differences. As a pilot feasibility trial, this study was not designed to assess effectiveness.
Participants and recruitment
Former and current patients of Evjeklinikken, a specialized rehabilitation clinic for obesity affiliated with the South-Eastern Norway Regional Health Authority but open to individuals across the country, were invited to participate in the pilot study. Invitations were distributed through the clinic's follow-up system (Flowzone), which delivered messages via email and SMS, as well as through posts on the clinic's Facebook page. Recruitment materials, including a poster and a brief text document, were developed in collaboration with healthcare professionals at the clinic. After review and final approval by the clinic, a clinic staff member distributed the materials to patients who had attended the clinic since 2022.
Recruitment was conducted over 5 weeks, from February to April 2025. Eligible participants were adults aged 18 years or older, residing in Norway, able to read and understand Norwegian, owning an Android smartphone, and interested in increasing their PA. Individuals with medical conditions that contraindicate PA were excluded. Interested participants assessed their eligibility by accessing an online survey in REDCap (Research Electronic Data Capture) 20 via a link or QR code. After reviewing the first five registrations, the eligibility criteria were revised to remove the final option in the health screening question, as it unnecessarily excluded two otherwise eligible participants. Following this adjustment, subsequent participants were screened using the revised version. Informed consent was also obtained through REDCap.
As this was a pilot feasibility study, no formal power calculation was conducted. Instead, the sample size was determined by the number of eligible patients available during the recruitment period. No incentives were given to participants for their participation in the study.
Intervention
All study participants who completed the baseline questionnaire received access to the FysBot app via an email containing instructions to install it from the Google Play Store, along with a user manual developed by the first author (DL) that described how to install and use its features.
FysBot was delivered as an Android app connected to a secure cloud-based backend infrastructure hosted in Northern Europe. User authentication, data storage, and chatbot processing were handled through a protected Microsoft Azure server hosted in Norway. User messages were forwarded from the backend to an Azure-hosted LLM (GPT-3.5) using pre-defined system instructions that restricted the chatbot to general PA support and prevented it from providing medical, diagnostic, or emergency advice. To protect privacy, only minimal and non-identifiable information was included in prompts (e.g. step totals or general location information), and no names, contact details, precise geographic coordinates, or clinical records were transmitted to the LLM. Message histories were stored securely on the Norwegian server for research purposes. The backend also supported additional app features, including goal setting, badges, step synchronization via Health Connect, and scheduling.
The initial prototype of FysBot was developed as part of a supervised master's thesis project; detailed technical documentation, including system architecture diagrams, is therefore reported in the associated thesis. 16 The version of the FysBot prototype used in this pilot study incorporated adjustments informed by feedback from earlier usability testing conducted with adults not living with obesity. 15
In this updated version, the chatbot was powered by GPT-4, and available in either English or Norwegian, allowing participants to interact with it in either language at their convenience. The app was designed to offer personalized PA recommendations, motivational messages, and goal-setting support based on feedback from the target population.14,21 The scheduling feature was refined to offer clearer guidance for planning daily activities, and the badge feature rewarded users with digital bronze, silver, or gold medals for completing 33%, 66%, and 100% of their set goals, respectively. The profile section provided numerical and graphical summaries of users’ step progress, and allowed them to adjust their daily step goal and choose a preferred time for progress updates (see Figure 1 and Table 1). These updates used artificial intelligence (AI) functionality to deliver dynamic, personalized messages. However, just before the study commenced, an issue with the daily update feature prevented users from receiving updates at their chosen time. As a workaround, a generic message was created instructing users to check their progress in the profile tab at the time they had set for updates. No technical or functional updates were implemented in the FysBot app during the study period.

FysBot app interface with chat.
Summary of tabs/features of the FysBot app.
AI: artificial intelligence.
Data collection
The 6-week pilot study was conducted between April and June 2025. Participants started the intervention at different times after completing the baseline questionnaire, so the timing of follow-up assessments varied between individuals. REDCap sent one automated reminder for each questionnaire. To enhance response rates, the first author (DL) also sent personalized reminders via SMS for the baseline and week 2 questionnaires and via email for the week 6 questionnaire. Data was collected through a combination of self-reported questionnaires and semi-structured interviews. Self-reported questionnaire data were collected and managed using REDCap electronic data capture tools hosted at the University Hospital of North Norway.20,22
Self-reported questionnaires
Participants completed a baseline questionnaire before starting the intervention, which gathered demographic information, PA habits and preferences, previous chatbot experience, and self-perceived technological competence. The questionnaires also included validated instruments:
GODIN: assesses the frequency of mild, moderate, and strenuous leisure-time PA in a typical week. Scores ≥24 are often used to classify participants as sufficiently active. 23
BREQ-2: evaluates exercise motivation based on self-determination theory. It consists of five sub-scales reflecting different forms of motivation toward exercise: amotivation (lack of intention to exercise), external regulation (engaging in exercise due to external demands or rewards), introjected regulation (driven by internal pressures such as guilt or obligation), identified regulation (recognizing exercise as personally important and valuable), and intrinsic regulation (exercising for enjoyment or inherent satisfaction). The RAI, calculated from the BREQ-2 sub-scales, reflects the degree of autonomous versus controlled motivation.24–26
SEE Scale: measures confidence in maintaining exercise behavior despite common barriers, and was initially developed for sedentary adults in a community-based exercise program. 27
SUS: assesses the perceived usability of systems and applications. 28
In this study, the Norwegian validated version of the BREQ-2 was used. 29 The SEE and SUS were based on available Norwegian translations from previous studies,30,31 while GODIN was forward translated from English to Norwegian.
Follow-up questionnaires were administered in weeks 2 and 4 to assess PA habits and FysBot use over the previous 7 days, along with repeated measures of the GODIN and BREQ-2. The final questionnaire at week 6 repeated the same measures as the earlier follow-ups, with the addition of the SEE and SUS.
Semi-structured interviews
At the end of the 6-week pilot study, all participants who had consented to an interview (n = 48) were invited via email. Participants could schedule an appointment with the first author (DL) via a Google Calendar link. Interviews were conducted via video call using Google Meet between May 28 and June 18, 2025. A semi-structured interview guide covering motivation, usability, and overall experiences with the FysBot app was used. Participants could opt for an interview via a regular phone call if they preferred. Three email reminders were sent to encourage participation in the interview. All interviews were audio-recorded and transcribed verbatim. Verbal consent was obtained at the start of each interview to reaffirm the written consent provided at enrollment.
Seven individuals from the pilot study participated in the interviews, with durations ranging from approximately 11 to 39 min. Five participants had completed all follow-up questionnaires in the pilot study, while the remaining two had completed weeks 2 and 6, and weeks 4 and 6, respectively. All but one interviewee reported using FysBot to some degree.
Data analysis
Quantitative and qualitative data were analyzed separately before integration.
Quantitative analysis
Data were exported from REDCap and analyzed using IBM SPSS Statistics (Version 29). Descriptive statistics summarized participant demographics (numbers and percentages), PA behavior (GODIN), motivational regulation (BREQ-2 sub-scales and RAI), and SEE. Results are presented as medians with interquartile ranges (IQRs) due to small sample size and non-normal distribution. RAI was computed from BREQ-2 sub-scales, following established scoring procedures. Higher (positive) RAI scores indicate self-determined (autonomous) motivation, while lower (negative) scores reflect controlled motivation or absence of motivation (amotivation). For respondents with complete data at all four timepoints (baseline, weeks 2, 4, and 6), repeated-measures analyses of GODIN, BREQ-2 sub-scales, and RAI were conducted using non-parametric tests (Friedman tests). A comparison between the SEE baseline and week 6 data was conducted using a Wilcoxon signed-rank test. Statistical significance was set at p < 0.05.
Missing data on the key outcome variables—GODIN, SEE, and BREQ-2—were handled using multiple imputation in SPSS. The automatic method was applied, which by default uses a fully conditional specification algorithm to impute missing values across variables. Five imputed datasets (m = 5) were generated, and results were pooled according to Rubin's rules, as implemented in SPSS. The imputation model included demographic, behavioral, and technology-related variables (e.g. age, gender, education, activity tracker use, chatbot use, and app use) as predictors. Only participants who completed the baseline questionnaire and at least one follow-up survey were included. Participants who responded only at baseline and had no follow-up data for the outcomes listed above were excluded from the imputation process. Analyses were repeated for non-parametric tests (Friedman test) and the Wilcoxon signed-rank test. Descriptives from multiple imputations are reported as pooled means. The multiple-imputation analyses were conducted for exploratory purposes only and are presented in Appendix B (Table B1).
App engagement was assessed through self-reported app use collected at each follow-up timepoint. Participants indicated whether and how often they had used FysBot features in the previous 7 days. SUS is reported as the mean (SD) with the observed range. SUS scores were further analyzed by user groups—non-user (reported no app use), occasional user (reported using the app 1–2 times, and regular user (reported using the app ≥3 times)—categorized based on self-reported app use within the preceding 7 days at each follow-up timepoint. The classification, therefore, reflects recent rather than cumulative use; participants classified as non-users or occasional users may have used the app more frequently than was captured within this 7-day period.
The intervention was considered potentially feasible if at least 70% of participants completed the week-6 assessment, if at least 50% engaged with any app feature in a given week, and if the mean SUS score reached the commonly used benchmark of 68. These criteria were informed by feasibility study guidelines and prior mHealth pilot studies.32–36
Qualitative analysis
Thematic analysis was conducted on interview transcripts to identify key themes related to user experience and practical implementation challenges. 37 The first author (DL) familiarized herself with the data, generated initial codes inductively, and grouped them into themes focusing on user perceptions of usefulness, usability, motivation, chatbot interaction, and data privacy. The preliminary themes were iteratively reviewed, refined, and clearly defined to ensure they captured key patterns across the dataset—first through discussions with MVT, then with PZ, PR, EÅ, and MVT. Finally, all co-authors reviewed and agreed on the final themes.
Integration of the quantitative and qualitative findings occurred during the interpretation phase, consistent with an explanatory sequential mixed-methods approach. Qualitative findings were used to elaborate on and explain quantitative patterns, providing contextual insights into participants’ lived experiences.18,19
Ethical considerations
The study was reviewed and declared exempt from formal evaluation by the Norwegian Regional Committees for Medical and Health Research Ethics (Ref. 351357) and approved by the Data Protection Officer at the University Hospital of North Norway (Ref. 2024/6553-6). Written informed consent was obtained from all participants prior to enrollment. Data were anonymized to protect confidentiality. All procedures were conducted in accordance with the ethical standards of the Declaration of Helsinki and relevant national regulations.
Results
Study participation
Of the approximately 400 patients invited, 138 registered for the study, of whom 60 were excluded for not meeting eligibility criteria—most commonly (n = 41) because they did not own an Android phone. Of the 78 eligible individuals remaining, 53 consented to participate in the study. Thirty-six participants completed the baseline questionnaire and were enrolled in the study. They all received access to the FysBot app, but after one withdrawal, 35 continued in the pilot study. At week 2, 20 participants completed the follow-up questionnaire, while three withdrew from the study. At week 4, 19 participants completed the follow-up questionnaire, and two more withdrew from the study. By week 6, 17 participants (47.2%) of those enrolled completed the final questionnaire. At each follow-up, non-respondents who had not formally withdrawn from the study were considered as study participants. Only 13 participants completed all four questionnaires, with 12 providing complete responses and one participant missing data on the BREQ-2. Full details of exclusions, withdrawals, and responses are provided in Figure 2.

Modified CONSORT flow diagram for study participation.
Participant characteristics
Demographics and other measures
Of the 36 participants who completed the baseline questionnaire, 25 (69.4%) were female. Most participants were aged 45–64 years (28/36, 77.8%) and resided in Agder (13/36, 36.1%) or Østfold (7/36, 19.4%) in Norway. The educational status of participants was distributed as follows: high school (11/36, 30.6%), higher education (1–4 years) (9/36, 25.0%), and higher education (>4 years) (7/36, 19.4%). Regarding employment status, 13 participants (36.1%) were employed full time, while 12 (33.3%) received disability benefits. Nine participants (25.0%) were on their fifth stay at Evjeklinikken, seven (19.4%) on their fourth, another seven (19.4%) had completed treatment, and four (11.1%) were on their second stay.
Participants reported the following frequencies of PA: daily (8/36, 22.2%), five times per week (7/36, 19.4%), four or two times per week (6/36 each, 16.7%), and three times per week (4/36, 11.1%). The most commonly reported PAs were walking (15/36, 41.7%) and gym training (14/36, 38.9%). Other activities included cycling (3/36), gardening and household chores (2/36), strength/fitness training with friends (1/36), and Zumba or dance sessions (1/36). Most participants (31/36, 86.1%) reported using an activity tracker. Among these, the most frequently used brands were Garmin (10/36, 27.8%), Fitbit (8/36, 22.2%), and Polar watch (6/36, 16.7%).
Seventeen participants (47.2%) described themselves as tech-curious (i.e. willing to try digital devices and services), 16 (44.4%) as tech-savvy, and 3 (8.3%) as tech-reluctant or averse (i.e. hesitant about using technology). At baseline, 20 of 36 participants (55.6%) had never used a chatbot. Among those with prior chatbot experience (16/36), the most common uses were customer service or support (10/16, 62.5%), banking or financial services (4/16, 25.0%), and general inquiries (2/16, 12.5%). See Appendix A (Table A1) for full details.
Self-reported PA and app use at weeks 2, 4, and 6
Table 2 summarizes PA and app use among the study participants across the three follow-up timepoints.
Summary of self-reported physical activity and app use at weeks 2, 4, and 6.
N = number of participants answering the questionnaire at the respective follow-up timepoints.
Walking was the most frequently reported PA, followed by gym training, consistent with baseline reports. A gradual increase in daily PA was observed, with 29.4% (5/17) of respondents in week 6 reporting daily activity, compared with 15.4% (4/26) in week 2. The most common activity duration was 60 min in weeks 2 and 6, with participants reporting 30 min in week 4. However, app engagement declined steadily over time. Half of the participants (13/26, 50.0%) reported no app use in week 2, increasing to 70.6% (12/17) by week 6. Among app users, the profile feature was consistently the most used, mainly at week 6 (4/5, 80.0%). Daily app use remained low throughout, with only one participant (1/17, 5.9%) reporting it in week 6.
Analysis of PA instruments
For participants with data at all timepoints (N = 13), self-reported PA, as measured by the GODIN score, showed a modest increase over time. Median scores were 34 (IQR: 20–38.5) at baseline, 35 (IQR: 22–49.5) at week 2, 40 (IQR: 24.0–51.0) at week 4, and 40 (IQR: 30.5–63.5) at week 6 (see Table 3 for details). A Friedman test showed no statistically significant differences over time (χ2(3) = 4.42, p = 0.22). Median RAI scores increased at weeks 4 and 6, reaching 9.7 (IQR: −1.7 to 13.6) and 9.8 (IQR: 1.6–15.1), respectively, compared to 8.3 (IQR: 3.6–11.6) at baseline. This change was also not statistically significant (χ2(3) = 2.89, p = 0.41).
Descriptive statistics for GODIN, BREQ-2, and SEE over time (median (IQR)) for participants with data at all timepoints.
N: number; IQR: interquartile range; N/A: not applicable; GODIN: Godin Leisure-Time Exercise Questionnaire; BREQ-2: Behavioral Regulation in Exercise Questionnaire-2; SEE: Self-Efficacy for Exercise; RAI: Relative Autonomy Index.
For the BREQ-2 sub-scales, most regulation types remained stable over time. Median amotivation scores remained at 0.0 across all timepoints (χ2(3) = 2.94, p = 0.40). External-, introjected-, and intrinsic regulation also showed no significant change (χ2(3) = 3.07, p = 0.38, χ2(3) = 1.69, p = 0.64, and χ2(3) = 2.70, p = 0.44, respectively). In contrast, identified regulation increased from a median of 2.8 (IQR: 2.1–3.0) at baseline to 3.3 (IQR: 2.19–3.69) at week 6, a statistically significant change (χ2(3) = 8.41, p = 0.04). The median of SEE scores, measured at baseline and week 6 only, declined from 58 (IQR: 42.5–62.5) to 49 (IQR: 45.0–60.5). A Wilcoxon signed-rank test found no statistically significant difference (Z = −0.28, p = 0.78).
System Usability Scale
Seventeen participants completed the SUS in week 6, resulting in an overall mean score of 51.3 (SD = 15.1; range 10–73). Further analysis by user group categories showed that regular users reported the highest usability (mean = 61.7, SD = 11.6), followed by non-users (mean = 47.5, SD = 9.2) and occasional users (mean = 43.5, SD = 19.3) (Figure 3). Non-users’ experience with the app may have resulted from interactions that occurred before completing the questionnaires, which asked only about app use within the previous 7 days.

System Usability Scale (SUS) score by app user group.
These quantitative findings indicated moderate usability and declining engagement over time. To better understand participants’ experiences and the factors influencing these patterns, post-intervention interviews were conducted.
Qualitative findings from the interviews
Overall, no participant expressed major privacy concerns; most trusted the app because it was introduced by a reputable clinic and academic institution. Analysis of the interview transcripts provided insight into participants’ experiences with FysBot and contextualized the quantitative findings. Two overarching themes were identified.
Theme 1: AI chatbot-supported motivation
This theme represents user feedback related to how the FysBot app contributed to participants’ motivation to be physically active. Participants’ accounts reveal both positive and negative experiences.
Several participants described how the chatbot's exercise recommendations could serve as a convenient source of ideas and motivation, offering concrete, varied options that can be adapted to their health needs. One participant said that receiving multiple suggestions for back exercises made it easier to find something suitable, describing the chatbot's exercise library as “full of possibilities": “I liked the FysBot a bit because it came up with concrete measures. I asked about back exercises… It didn't just give me one exercise, but I think I got four… which made it possible to choose something based on your health”—Participant 14.
Others, however, found that location-related queries—such as requests for nearby hiking routes—produced generic or inadequate responses, reducing their motivation to continue engaging with the chatbot.
Participants suggested several ways to make the integrated chatbot's recommendations more engaging and motivating. They wanted more variety in its tips, such as seasonal activities or fun suggestions. Others proposed links to short videos or external sources to make the exercise guidance clearer and less text-based.
Experiences with the chatbot feature also appeared to shape participants’ attitudes toward AI in general. One participant explained that using FysBot changed her initial skepticism toward AI, showing that it could be a helpful tool for health support. Others said they found the chatbot feature confusing. As one participant expressed: “No, we (referring to himself and the chatbot) never became friends, so to speak … it was a bit like ‘God dag mann økseskaft’ (a Norwegian expression used when a response is irrelevant to the question asked)”—Participant 5.
Some participants who were already physically active found the app to be of little or no value and redundant, as it offered features similar to those of other apps they already used. Others said they struggled to make FysBot part of their daily routines because it did not feel intuitive or seamlessly integrated into their existing habits.
In summary, participants said they felt motivated when the chatbot feature offered relevant, varied, and personalized exercise recommendations. In contrast, generic recommendations, along with perceptions of redundancy and limited value, were experienced as reducing engagement and motivation.
Theme 2: feature-specific usability and usefulness
This theme highlights participants’ experiences with FysBot's key features and how technical aspects influenced usability. Participants’ feedback revealed both appreciation for specific functions and frustration with design limitations that affected their overall experience.
A few participants found FysBot simple and easy to navigate, noting that its minimalist design was a positive feature. One participant stated, “It (the app) was simple. And that is important”—Participant 30. Several participants highlighted the Profile feature and its daily notification function, designed to help them keep track of their step goals and progress, as useful tools. According to one participant: “What I think has been good is, in a way, the reminder that you can set, that ‘Hey, now you have said you're going to walk so many steps’… Yes. I think that was nice”—Participant 11.
However, others encountered technical challenges (or bugs) such as text fields appearing behind the keyboard, app crashes, and navigation difficulties that interrupted their use. Some features, including Schedule and Badges, were described as underdeveloped or unclear. Planned activities often failed to display correctly, and the badge system was confusing. Others emphasized that the inability to register activities manually reduced the app's usefulness, since they wanted to log achievements and “have something to brag about.”
Participants also shared suggestions for improving features that could enhance future FysBot use. Some wanted to schedule weekly rather than daily activities, receive real-time reminders for planned sessions, and have access to more personalized recommendations based on their previous data. One participant suggested integrating professional support, such as a therapist, to address questions the chatbot could not answer: “… therapists are getting less time, but if I were to use an app like FysBot, I'd have liked it to be possible to contact a therapist or a treatment center and get answers from real people to things the chatbot can't answer”—Participant 29.
In summary, participants appreciated features such as Profile, which provided an overview and tracked their goal progress, but they experienced challenges due to technical issues and limited functionality. Suggestions for improvement included weekly scheduling, manual registration, real-time reminders, data-driven recommendations, and specialist feedback.
Discussion
Summary of findings
This pilot study evaluated the feasibility and usability of FysBot, a PA app featuring a ChatGPT-based chatbot for individuals living with obesity. To our knowledge, FysBot is the first LLM-based PA coaching system piloted within obesity rehabilitation services in Norway, providing regionally relevant feasibility insights not captured in previous international studies. FysBot differs from general-purpose chatbots in that the chatbot was integrated with an app developed specifically for adults living with obesity within a rehabilitation context and provides PA coaching, goal setting, and step-tracking within a secure environment. Unlike generic LLM interfaces, FysBot's chatbot operates within defined boundaries, with controlled data handling and safety guardrails that support its use in a health-related setting. In addition, FysBot was introduced to participants through collaboration with a specialist rehabilitation clinic. Prior research suggests that digital tools situated within clinical contexts or endorsed by health professionals are perceived as more trustworthy and are more likely to be adopted and used consistently.38,39
Of the 36 participants who completed the baseline assessment and received access to the FysBot app, 17 answered the questionnaire at week 6, with engagement declining over time. The profile tab and chatbot were the most frequently used features in the app. No significant changes were observed in self-reported PA, motivational regulation, or self-efficacy for the participants with complete questionnaires at all timepoints. However, identified regulation—a BREQ-2 sub-scale—increased significantly across the study period, and multiple imputation analyses indicated only a significant rise in GODIN scores.
Usability, measured with SUS, averaged 51.3, which is below the benchmark for good usability, although sustained users reported higher scores. The qualitative findings underscored the chatbot's motivational potential, particularly for exercise recommendations and simplicity, while also highlighting technical issues, limited integration into daily routines, and missing functionalities. Participants provided constructive suggestions for improvements, including manual activity registration and more advanced personalization.
When considered against our predefined feasibility criteria, the study did not meet the thresholds required for progression to a full-scale trial. Retention at week 6 (47.2%) fell well below the ≥70% benchmark commonly used in feasibility studies, and sustained engagement declined to approximately 30% of participants by the end of the study, below our target of ≥50% weekly engagement. Although the prototype showed early promise, these indicators suggest that further refinement is needed before evaluating effectiveness in a larger trial. As this was a feasibility study without a control group, it was not designed to evaluate intervention effects.
App engagement
FysBot was previously tested with a small group outside the target population. 15 The present feasibility study highlights the importance of conducting usability testing with the intended users and addressing identified issues before proceeding to a larger trial. Although the chatbot was developed using current technology, participants did not always perceive its recommendations as relevant. Specific communication strategies known to support behavior change were not systematically integrated into FysBot's dialogue design. In a study on AI chatbots for promoting PA, Wiratunga et al. 40 found that voice-based systems were better suited for encouraging exercise among older adults. With the rapid development of AI-driven conversational agents capable of natural verbal interaction, 41 providing trustworthy, engaging feedback—preferably through audio or video—could increase user engagement, particularly given the average age of our participants. This interpretation is supported by our qualitative findings, where participants suggested linking exercise recommendations to external video sources. Such multimodal interaction could enhance the motivational and persuasive capabilities of AI-driven PA interventions.42,43
App engagement declined steadily during the 6-week study. Participants who remained active primarily used the Profile tab for viewing step counts and goals, while chatbot interactions decreased over time. These patterns mirror previous findings showing that users tend to discontinue health apps when they do not integrate into daily routines or fail to provide added value beyond existing tools.44–46 Interview findings supported this interpretation, as several participants reported that FysBot “did not become part of their everyday life” or felt redundant. Participants appeared more motivated by features that provided immediate feedback and a sense of achievement, such as step tracking and goal completion. Previous research has shown that goal setting and self-monitoring enhance the effectiveness of PA interventions.42,47 Similarly, our previous qualitative study of user preferences for a PA chatbot found that participants preferred chatbots with integrated goal setting and feedback functionalities to increase motivation for PA. 14 These findings suggest that incorporating goal setting, self-monitoring, and gamification features may engage users more effectively than conversational elements alone, at least in the short term. Overall, the engagement patterns observed in this pilot study indicate that disengagement likely resulted from multiple interacting factors, including usability limitations, difficulty integrating the app into daily routines, perceived redundancy relative to existing tools, and limited added value for some participants, rather than usability issues alone.
Self-reported PA behavior, motivation, and SEE
Despite modest engagement, small positive trends were observed in self-reported PA and motivation. Daily PA increased during the study, from 15.4% at week 2 to 29.4% at week 6. Similarly, GODIN scores rose slightly from 34 at baseline to 40 at week 6, indicating that participants maintained or slightly increased their PA levels. Although not statistically significant, these changes may still be clinically meaningful and align with findings from previous AI chatbot research reporting increases in activity levels associated with chatbot use.9,10,48 Interestingly, in our study, participants reported increased activity over time despite disengagement with the app. This could be attributed to the app being recommended by a trusted rehabilitation clinic, which created a sense of support or accountability that influenced motivation. The observed increase could reflect a Hawthorne effect, in which individuals modify their behavior because they know they are being observed. 49 Many participants already used other apps or wearables to monitor activity, which could also explain the maintenance or slight increase in self-reported PA.
Regarding self-determined motivation for exercise, significant changes were observed in only one BREQ-2 sub-scale: identified regulation, which increased over time. This suggests that participants’ recognition of the personal value of PA increased, likely reflecting participants’ prior rehabilitation experience, where the importance of PA is emphasized. 14 The RAI also remained stable, with a slight increase from 8.3 to 9.8. Previous studies have shown that higher RAI scores predict greater energy expenditure and longer exercise duration.50–52 Even small increases in RAI may support long-term engagement by enhancing intrinsic motivation. This aligns with self-determination theory, which posits that individuals are more likely to sustain healthy behaviors when they act voluntarily.52,53
SEE decreased from 58 at baseline to 49 at week 6, shifting from moderately high to lower confidence levels. SEE is a key factor linked to PA participation and intention.54,55 The reduction observed here may indicate that participants became more realistic about their ability to exercise, or that early enthusiasm gave way to recognition of barriers such as time constraints, fatigue, poor weather, or health limitations. Technical issues with the app or declining interest (a potential “novelty effect”) may have also been contributing factors. 44 Difficulties meeting goals or step targets could have further reduced confidence. Future studies with larger samples and longer follow-up periods are needed to examine whether self-efficacy increases over time as participants adapt to challenges.
Usability of the FysBot app
The mean SUS score for FysBot fell below the conventional benchmark of 68, indicating below-average usability. 36 Participants described app crashes, text fields hidden by the keyboard, and non-functioning notifications. The malfunctioning reminder feature likely contributed to disengagement and poor user experience, as personalized prompts have been shown to enhance adherence in digital behavior change interventions.45,56 Participants also noted the absence of key features such as manual activity logging and confirmation of completed session—functionalities typically found in other fitness apps. While FysBot was designed as a hub to consolidate activities from compatible Android apps through Health Connect, its functionality is currently limited because some wearables require third-party apps to synchronize with Health Connect. Expanding the range of compatible wearables and apps in future updates may enhance FysBot's usefulness and user satisfaction.
Nevertheless, participants valued the app's simplicity and design, and expressed willingness to use it in the future, provided that the current usability issues are resolved. Overall, usability issues seemed to result more from technical problems than from the app's concept or design. Future versions of the app will require additional technical development to address evident bugs, such as disappearing text fields, and better user guidance will be essential before a much larger study on effectiveness.
Strengths and limitations
Testing FysBot among individuals living with obesity in collaboration with a specialist rehabilitation clinic enhanced the study's real-world relevance and clinical applicability. Integrating the quantitative and qualitative results in this mixed-methods design provided a more comprehensive understanding of FysBot's feasibility and user experience than either dataset alone; however, the small sample size, high non-response rate, and short duration limit generalizability. As a single-arm pilot study, changes in PA, motivation, and self-efficacy may not be attributed to FysBot. Observed differences over time may reflect natural week-to-week fluctuations, regression to the mean, or external influences such as weather, personal circumstances, or concurrent use of other health apps. The findings should therefore be interpreted as exploratory indicators rather than evidence of intervention effects.
Multiple imputation was applied to address missing data, which introduces some uncertainty as it relies on statistical assumptions; however, the imputed results were presented only for completeness and exploratory purposes and were not used to draw conclusions. The feasibility assessment relied primarily on self-reported PA, which is vulnerable to recall error and social-desirability bias. Objective activity data would have strengthened the evaluation, but technical constraints in retrieving steps through Health Connect limited the availability of system-generated data. Future iterations of FysBot should ensure stable integration with wearable and smartphone sensors to allow objective measurement of PA and reduce reliance on self-report.
A major limitation was that the app was available only on Android, which led to the exclusion of 41 interested individuals and limits the generalizability of the findings in Norway, where iPhone use is common. 57 The maturity of the prototype may also have influenced engagement over the 6-week period, as early-stage technical issues and limited feature stability may have reduced participants’ willingness or ability to use the app consistently. The absence of validated Norwegian PA instruments may also have influenced perceived usability and reported outcomes. Nonetheless, this study offers early insights into the feasibility of an AI chatbot-driven intervention for promoting PA among adults living with obesity.
Future direction
The current FysBot prototype does not incorporate real-time feedback and adaptive monitoring; future iterations could incorporate more advanced forms of personalization, multimodal interaction, and AI-agent capabilities such as adaptive goal adjustment and proactive feedback based on real-time activity data. These developments would align the system with emerging chatbot technologies that provide more context-aware and responsive support. Although the system was designed not to require identifiable or sensitive health data, the use of commercial LLM services entails inherent data-privacy considerations. A formal security assessment and General Data Protection Regulation compliance review will be essential for future iterations before adoption for broader use in obesity rehabilitation. Cross-platform development (e.g. Android and iPhone) will be essential before further feasibility testing to ensure broader inclusion and more representative recruitment. In addition, future studies can test the effectiveness of such applications through randomized controlled trials with larger samples, longer intervention periods, and questionnaires available in validated languages to assess both short-term outcomes and long-term maintenance of PA behavior.
Conclusions
Overall, this pilot study demonstrated the potential of a ChatGPT-based PA app for adults living with obesity while also identifying key areas for refinement. The complementary quantitative and qualitative findings underscore the need to address the identified barriers through continued co-design and iterative testing of FysBot with the intended users. Through this process, the FysBot app could evolve into a scalable, personalized digital companion to support PA among individuals living with obesity. To enhance usability and better integrate the app into users’ existing routines and habits, future iterations should incorporate the features users explicitly requested, along with more advanced personalization and guidance to make the app more relevant and engaging.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076261417860 - Supplemental material for Feasibility and usability of a ChatGPT-based app to support physical activity: A pilot study
Supplemental material, sj-docx-1-dhj-10.1177_20552076261417860 for Feasibility and usability of a ChatGPT-based app to support physical activity: A pilot study by Dillys Larbi, Paolo Zanaboni, Eirik Årsand, Pietro Randine, Marianne Vibeke Trondsen, Kerstin Denecke, Rolf Wynn and Elia Gabarron in DIGITAL HEALTH
Supplemental Material
sj-doc-2-dhj-10.1177_20552076261417860 - Supplemental material for Feasibility and usability of a ChatGPT-based app to support physical activity: A pilot study
Supplemental material, sj-doc-2-dhj-10.1177_20552076261417860 for Feasibility and usability of a ChatGPT-based app to support physical activity: A pilot study by Dillys Larbi, Paolo Zanaboni, Eirik Årsand, Pietro Randine, Marianne Vibeke Trondsen, Kerstin Denecke, Rolf Wynn and Elia Gabarron in DIGITAL HEALTH
Footnotes
Abbreviations
Acknowledgments
We would like to thank Sondre Elvebakken Løvås and Dr André Henriksen for their assistance with the initial development and maintenance of the FysBot app, and Dr Maryam N. Tayefi for her assistance with the statistical analysis. We would also like to extend our sincere thanks to all participants for their time, engagement, and valuable feedback, and to the staff of Evjeklinikken for their assistance with recruitment and support.
ORCID iDs
Ethical considerations
The study was declared exempt by the Norwegian Regional Ethics Committee (Ref: 351357). The University Hospital of North Norway's Data Protection Officer approved the study (Ref: 2024/6553-6).
Consent to participate
Informed consent was obtained from all individuals involved in the study. Participants signed a consent form that informed them of the intention to publish the study's data. In addition, no identifying or personal details are presented in this study.
Author contributions
DL, PZ, MVT, EÅ, KD, RW, and EG conceptualized and planned the study. PR further developed the FysBot app with feedback from DL, PZ, and EÅ. DL developed the interview guide with feedback from MVT, EÅ, PZ, KD, RW, and EG. DL conducted the individual interviews. DL performed the data analysis with feedback from PZ, MVT, EÅ, PR, KD, RW, and EG. DL drafted the manuscript. DL, PZ, PR, MVT, EÅ, KD, RW, and EG contributed to the writing and review of the manuscript. All authors read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Norwegian Centre for E-health Research. Open access funding was provided by UiT The Arctic University of Norway.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated during and/or analyzed during the current study are not publicly available due to ethical and privacy restrictions related to participant confidentiality, but are available from the corresponding author on reasonable request.
Guarantor
DL is the guarantor of this work and accepts full responsibility for the integrity of the data and the accuracy of the analysis and reporting.
Supplemental material
Supplemental material for this article is available online.
Appendix A
The demographics and characteristics of the participants in the pilot study (N = 36).
| Demographics | Details | N (%) |
|---|---|---|
| Gender | Male | 11 (30.6) |
| Female | 25 (69.4) | |
| Age group | 25–34 years | 2 (5.6) |
| 35–44 years | 3 (8.3) | |
| 45–54 years | 15 (41.7) | |
| 55–64 years | 13 (36.1) | |
| 65–74 years | 3 (8.3) | |
| Region | Agder | 13 (36.1) |
| Akershus | 2 (5.6) | |
| Buskerud | 2 (5.6) | |
| Innlandet | 3 (8.3) | |
| Rogaland | 1 (2.8) | |
| Telemark | 2 (5.6) | |
| Vestfold | 4 (11.1) | |
| Vestland | 2 (5.6) | |
| Østfold | 7 (19.4) | |
| Education | Elementary school | 4 (11.1) |
| High school | 11 (30.6) | |
| Technical/vocational | 5 (13.9) | |
| Higher education 1–4 years | 9 (25.0) | |
| Higher education, >4 years | 7 (19.4) | |
| Employment | Full-time | 13 (36.1) |
| Part-time | 4 (11.1) | |
| Unemployed | 2 (5.6) | |
| Disability benefits | 12 (33.3) | |
| Retired | 3 (8.3) | |
| Other (e.g. military service) | 2 (5.6) | |
| Clinic stay | First | 2 (5.6) |
| Second | 4 (11.1) | |
| Third | 3 (8.3) | |
| Fourth | 7 (19.4) | |
| Fifth | 9 (25.0) | |
| Sixth | 1 (2.8) | |
| ≥Seven | 3 (8.3) | |
| Completed | 7 (19.4) | |
| Activity per week | 0 days | 1 (2.8) |
| 1 day | 3 (8.3) | |
| 2 days | 6 (16.7) | |
| 3 days | 4 (11.1) | |
| 4 days | 6 (16.7) | |
| 5 days | 7 (19.4) | |
| 6 days | 1 (2.8) | |
| Every day | 8 (22.2) | |
| Activity type | Walking | 15 (41.7) |
| Gym training | 14 (38.9) | |
| Cycling | 3 (8.3) | |
| Gardening and chores | 2 (5.6) | |
| Strength/fitness training | 1 (2.8) | |
| Zumba/dance | 1 (2.8) | |
| Activity tracker use | No | 5 (13.9) |
| Yes | 31 (86.1) | |
| Technology literacy | Tech savvy | 17 (47.2) |
| Tech curious/willing to try | 16 (44.4) | |
| Tech-reluctant/averse | 3 (8.3) | |
| Chatbot use | No | 20 (55.6) |
| Yes | 16 (44.4) |
Appendix B
Changes in outcomes over time based on multiple imputation (m = 5; pooled N = 26).
| Outcome | Baseline | Week 2 | Week 4 | Week 6 | p-Value range across imputations |
|---|---|---|---|---|---|
| GODIN | 32.0 | 32.4 | 52.7 | 47.4 |
|
| BREQ-2 | |||||
| Amotivation | 0.37 | 0.65 | 0.60 | 0.60 | 0.10–0.52 |
| External regulation | 0.98 | 1.04 | 1.15 | 1.23 | 0.28–0.95 |
| Introjected regulation | 2.08 | 1.82 | 1.71 | 1.72 | 0.09–0.54 |
| Identified regulation | 2.61 | 2.62 | 2.84 | 2.94 | 0.05–0.35 |
| Intrinsic regulation | 2.53 | 2.40 | 2.63 | 2.55 | 0.71–0.83 |
| RAI | 7.63 | 6.91 | 7.89 | 7.57 | 0.65–0.92 |
| SEE | 50.2 | N/A | N/A | 47.7 | 0.44–0.77 |
N/A: not applicable; GODIN: Godin Leisure-Time Exercise Questionnaire; BREQ-2: Behavioral Regulation in Exercise Questionnaire-2; RAI: Relative Autonomy Index; SEE: Self-Efficacy for Exercise; PA: physical activity.
Bold value is the only p-value range that was significant.
Values are pooled means from multiple imputation.
p-Value ranges reflect results across five imputations; non-parametric test statistics are reported per imputation in SPSS.
Outcomes were re-analyzed using multiple imputations (m = 5; pooled N = 26). Friedman tests for GODIN indicated a significant increase in PA across all timepoints (all p < 0.05). Pooled means rose from approximately 32.0 at baseline to 52.7 at week 4 and 47.4 at week 6. No significant changes were observed for RAI or the BREQ-2 sub-scales. Wilcoxon tests for SEE showed a slight, non-significant decline from 50.2 at baseline to 47.7 at week 6.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
