Abstract
Background
An assessment of how users rate physical activity apps of varying behavior change technique content is necessary to understand if users recognize differences in an app’s ability to promote physical activity.
Objective
The purpose of this study was to compare user ratings of an app with a lower behavior change technique count to an app with a higher behavior change technique count.
Method
Participants were randomly assigned to interact with either the high behavior change technique app or the low behavior change technique app using an iPad. Participants then completed a Mobile App Rating questionnaire.
Results
The final sample included 83 participants with an average age of 22.66 years (SD = 2.13; range = 20–29). Independent t-tests revealed significant group differences for perceived impact, t(81) = 5.27, p < .001, g = 1.15, 95% confidence interval (0.69, 1.62); engagement, t(81) = 6.71, p < .001, g = 1.15, 95% confidence interval (1.02, 1.87); aesthetics, t(81) = 4.29, p < .001, g = 1.15, 95% confidence interval (0.50, 1.38); and subjective quality, t(81) = 6.46, p < .001, g = 1.15, 95% confidence interval (0.75, 1.42), with participants from the high behavior change technique group scoring these qualities more positively than participants from the low behavior change technique group.
Conclusion
App users rated a physical activity app with higher behavior change technique content more favorably on aesthetics, engagement, subjective quality, and perceived impact than those with reduced behavior change technique content. Additional research is needed to understand how these perceptions influence users during the app selection process, as well as the efficacy of apps for promoting physical activity behavior change.
Introduction
Mobile health is a rapidly growing field, with mobile apps being examined for their ability to facilitate active lifestyles. In total, 77% of Americans own a smartphone and one in four of them use smartphones to look up health information, with about 19% using health-related mobile apps.1,2 Mobile apps have the unique potential to provide health-related programs, such as physical activity interventions, to a large number of people for a relatively low cost. As such, in 2017 there were over 325,000 commercially available mobile health apps in all major app stores. 3 However, research on the content and quality of these apps identified significant limitations, such as lack of theoretically based content.4,5
One way to incorporate theoretical content within mobile apps is the use of behavior change techniques (BCTs). As described by Michie et al., BCTs are the smallest active component designed to elicit a behavior change. 6 They can be used alone, but are typically used as part of a larger intervention to target a particular behavior, although most apps for physical activity (PA) use a combination of BCTs to promote PA uptake among users. 6 However, despite their popularity, mobile apps have not implemented enough BCTs to promote a significant change in PA behavior. 5 Additionally, there is limited evidence to suggest that mobile app users make app selections based on BCT content or that they are aware of differences in behavior change impact between apps with varying BCT content.
Several authors examined commercially available PA apps for BCT quality and quantity. In an examination of top PA and diet apps, Direito et al. examined the taxonomy of BCTs used in interventions and found free apps had an average of 6.6 BCTs (range 3–14) and paid apps had an average of 9.7 (range 2–18).
7
A similar study examined 100 PA apps and found that of the possible 93 BCTs (using BCT Taxonomy V1), only 39 were observed in the apps.
8
Further, they observed an average of 6.6 BCTs (SD = 3.3, median = 6) and did not find a significant difference in BCT content between free and paid apps, t(98) = 1.43, p = 0.08, d = 0.29.
8
These findings demonstrated that BCTs were under-utilized in PA apps. However, it should be noted that in general, the presence of more BCTs does not necessarily indicate a higher-quality intervention, as BCTs vary in their ability to promote behavior change. The least prevalent type of BCT reported in the study by Yang et al. was action planning (15%) and coping planning was not observed.
8
This is concerning because both are proposed to mediate the relationship between intention and behavior.
9
Carraro and Gaudreau found action planning and coping planning were significant predictors of PA, with spontaneous action planning and coping planning displaying moderate-to-large effects on PA.
9
Furthermore, apps must be correctly applied to improve effectiveness. Each BCT has a specific set of conditions needed for the technique to aid in promoting behavior change.
10
This suggests that although increasing the presence of BCTs may help increase the effectiveness of PA mobile apps, the BCTs utilized and
In addition to understanding the BCTs present in mobile apps, it is necessary to recognize user preferences of app features. In an online cross-sectional study of young adults, researchers found “goal setting on outcome of behavior,” “self-monitoring of behavior,” and “self-monitoring on outcome of behavior” as the most preferred BCTs within a PA mobile app. 11 Interestingly, this study also examined associations among specific personality traits and preferred BCTs. Researchers found a positive relationship between “agreeableness” and “goal setting” (odds ratio (OR) 1.61, 95% confidence interval (CI) 1.06, 2.41), an inverse association between “neuroticism” and “feedback/self-monitoring” (OR 0.76, 95% CI 0.58, 1.00), and a positive relationship between “self-efficacy” and “feedback and self-monitoring” (OR 1.06, 95% CI 1.02, 1.11). 11 These findings suggested that although users preferred some BCTs to others, PA mobile apps were not “one size fits all.” Similarly, Middleweerd et al. performed a qualitative study with Dutch university students in which participants used a PA app for 3 weeks then attended a focus group. 12 These researchers found participants favored the coaching, tailored feedback, and competition features of the app. 13 Finally, in another cross-sectional study on user perceptions of behavior change mechanisms, most respondents reported using a PA mobile app positively affected PA perceptions, attitudes, and beliefs. 14
Based on the findings from previous studies, users can identify app features they find helpful. It is unclear, however, how these preferences impact their perceptions of the ability of the app to change behavior. Further, it has not been determined if it is merely the presence of a single preferred BCT that influences perception, or if users identify apps with a variety of BCTs as possibly having a greater impact on behavior. For example, if a higher BCT content app contains a single feature that a user prefers, will the user perceive another app of lower BCT content with that same feature to be equally as impactful? An assessment of how users rate PA apps of varying BCT content is necessary to help understand if users recognize potential differences in the PA app’s ability to impact PA behavior. Stoyanov et al. 15 developed the Mobile App Rating Scale (MARS), a multidimensional scale for rating the quality of mobile apps, in response to this need. The MARS was originally designed for use by researchers and professionals, so the authors later developed an end-user version, the User Version of the MARS (uMARS). The uMARS scale consists of four objective quality subscales (engagement, functionality, aesthetics, and information quality), a subjective quality subscale, and a perceived impact subscale. Therefore, the purpose of this study was to examine user ratings of two PA mobile apps using an amended uMARS, specifically as it related to the engagement, functionality, aesthetics, subjective quality, and perceived impact of the apps. We compared user ratings of an app with a lower BCT count to an app with a higher BCT count, primarily to identify potential differences in ratings of the perceived impact of the apps. The findings from this study will provide a quantitative assessment on whether users rate the impact of apps with varying BCT content differently. We hypothesized that participants in the higher BCT count app group would rate the app more positively on engagement, functionality, aesthetics, subjective quality, and perceived impact than those in the lower BCT count app group.
Methods
Participants
Eligible participants were Georgia State University students aged 18–29 years. We targeted this age group because a significantly greater percentage of smartphone users in this demographic use mobile apps.1,16,17
Measures

Mobile app selection process.
Procedures
The study procedures were approved by Georgia State University’s (GSU’s) Institutional Review Board. Participants were recruited using flyers posted on the GSU Atlanta campus, as well as with classroom announcements. Interested volunteers emailed the student primary investigator (PI) who scheduled a face-to-face visit.
Participants who attended the face-to-face meeting consented to participate in the study then completed the participant history and the SOC questionnaires. The participants were randomly assigned to interact with either the MMF app (high BCT group) or the NWP app (low BCT group) using an iPad. Randomization was achieved by providing participants with an envelope from a randomly sorted stack, labeled either 1 or 2. The numbers corresponded with the iPad they were to use because only one PA app was loaded on each iPad (one had NWP, the other had MMF). Inside the envelope was a piece of paper with the name of the mobile app so participants knew which app to open. A student assistant was responsible for assigning a number to each iPad, labeling and shuffling the envelopes, and providing them to the researchers. The researchers were unaware which app was on each iPad and were blinded to app assignment. Once a participant was provided with an envelope, the student entered a separate room to pick up the iPad corresponding with the envelope provided. Each participant was allowed 15 minutes to interact with the assigned mobile app in a secluded space. No identifying information was needed for the participant to interact with the app. Following the app use, participants were asked to complete the amended uMARS questionnaire. The entire visit took approximately 45 minutes to complete.
Statistical analyses
Tests of normality and outliers were performed before analyses. All variables were summarized using frequencies, means, and standard deviations. Independent t-tests and Chi-square tests were used to examine baseline differences on demographic variables (age, BMI, race, year in school) and SOC between the two app groups. Cronbach’s alpha was calculated for the multi-item MAR questionnaire subscales. A Cronbach alpha of α > .70 = acceptable and α > .80 = good. 20 Group differences on the amended uMARS subscales were assessed using an independent t-test. Bonferroni corrections were used, so statistical calculations for the amended uMARS subscales were considered significant at an alpha level of p < .01. All other calculations were considered significant at an alpha level of p < .05. Analyses were conducted using SPSS version 23.
Results
The high BCT app (MMF) incorporated 11 BCTs and the low BCT app (NWP) incorporated two. Specifically, the high BCT app included the following: (a) provide instructions on how to perform behavior, (b) goal-setting behavior, (c) information about other’s approval, (d) prompt review of behavioral goals, (e) facilitate social comparison, (f) set graded tasks, (g) provide information on when and where to perform behavior, (h) prompt self-monitoring of behavior, (i) prompt self-monitoring of behavioral outcomes, (j) teach to use prompts/cues, and (k) provide reward contingent on successful behavior. The low BCT app included: (a) information about others approval and (b) prompt self-monitoring of behavior.
In total, 89 individuals responded to recruitment announcements. Data for five were excluded because they were not 18–29 years of age (n = 4) or they did not report the app they evaluated (n = 1). One participant was removed due to non-response on too many items. The final sample included 83 participants with an average age of 22.66 years (SD = 2.13; range = 20–29) who were mostly female (63.9%) and African-American (48.2%; see Table 1). There were no significant differences between the high and low BCT app groups on demographic variables or SOC. The amended uMARS subscales had acceptable-to-good internal consistency (see Table 2). The independent t-tests revealed significant group differences for engagement, t(81) = 6.71, p < .001, g = 1.15, 95% CI (1.02, 1.87), aesthetics t(81) = 4.29, p < .001, g = 1.15, 95% CI (0.50, 1.38), subjective quality t(81) = 6.46, p < .001, g = 1.15, 95% CI (0.75, 1.42), and perceived impact t(81) = 5.27, p < .001, g = 1.15, 95% CI (0.69, 1.62). Specifically, participants rated MMF (high BCT group) as having greater engagement, aesthetics, subjective quality, and perceived impact than NWP (low BCT group; see Table 2). No significant group difference was observed for functionality, t(81) = 2.09, p = .04, g = 1.15, 95% CI (0.02, 1.00).
Participant characteristics.
BMI: body mass index; SOC: states of change.
Group differences for MARS subscales.
aSignificant group differences found at p ≤ .008.
MARS: Mobile App Rating Scale.
Discussion
Mobile apps are being researched as an option to deliver behavioral interventions; however, limited research has focused on user ratings of apps as it relates to app quality and how users perceive an app’s ability to impact behavior. Therefore, the purpose of this study was to examine user perceptions of two PA mobile apps using an amended uMARS. Participants in the higher BCT app group rated MMF significantly higher on four of the five amended uMARS subscales. This was the anticipated outcome because previous research indicated app users had preferences related to the BCTs utilized within an app.11,12 The findings from the present study suggested that app users might have also related the presence of more BCTs to app effectiveness due to the group difference found for perceived impact. Several studies cited low theoretical content as a potential limitation of apps targeting health behavior.21,22 However, real-world implications should be considered. Before an app could be examined for efficacy, app users would first have to identify an app they believe would help promote PA. Previous research did not provide clear evidence to suggest app users would identify an app with more theoretical content as being more effective. Our results supported the argument that the amount of theoretical content in an app might influence user ratings of the quality and perceived impact of the app.
Two previous cross-sectional studies examined PA app users’ perceptions of behavior change mechanisms and app effectiveness.14,23 The studies found PA app use led participants to be more motivated to change their behavior and that a greater percentage of users perceived PA apps effectively affected their behavior. Their results supported the concept that PA apps can be used to positively influence user behavior. Our findings extended their results by demonstrating users may understand that some apps can support their behavior change efforts better than others. Taken together, these findings are promising because they provide further guidance on how to best design apps that will lead to PA behavior change. App developers and researchers should consider using a BCT taxonomy to increase the BCT quantity and quality offered by PA apps. The results can also impact how apps are marketed to consumers. Although our study did not test how users make app selections, the results may support emphasizing the presence of BCTs in the app description to encourage app use. However, as previously mentioned, it is also important to remember that all BCTs are not equally effective for promoting behavior change. As such, additional research is needed to understand whether perceived app effectiveness is related to actual PA behavior promotion.
Regarding the other subscales, for engagement, aesthetics, and subjective quality, participants also rated MMF (high BCT group) significantly higher than NWP (low BCT group). However, no group differences were observed for functionality. According to Stoyanov et al., when scored separately, the objective quality items (engagement, aesthetics, functionality) can be used to evaluate strengths and weaknesses in these areas. 18 The lack of group differences for functionality was ideal, as it supported the expectation that the group differences observed were not simply due to differences in how well each app worked. Although efforts were made to select apps that were similar in visual appeal, the group differences for aesthetics may have introduced bias in the participants’ ratings of the other scales. According to the 2017 U.S. Mobile App Report, 21% of app users between the ages of 18–24 reported deleting an app because they did not like the logo. 16 Further, Singh suggested that color can influence perceptions of a product. 24 It is possible that an app rated as less attractive is perceived as lower quality and, thus, less effective. Regarding the engagement and subjective quality subscales, our results supported findings from the Hoj et al. study. They found that app users who reported more frequent app use perceived PA apps to have a greater impact on behavior. 14 In our study, the app that users found to be more engaging and of higher subjective quality was also rated to have a greater perceived impact on behavior. Neither study explored the specific mechanisms, but the relationship between engagement and perceived impact should be further explored to better influence app design.
Other study limitations to consider when interpreting these results include the study population was limited to college students aged 18–29 years, which means the findings are not generalizable to other populations. These findings still provide valuable information as this demographic uses mobile apps at a higher rate.1,16 Another limitation is that most participants reported being in the action/maintenance SOC. This may have introduced bias due to potential similarities in BCT preferences among regularly active individuals. Also, the present study has limited external validity, as participants used the mobile apps in a laboratory setting for a limited period. This improves the internal validity of the study but limits our ability to generalize these findings to how participants would use the app in a real-world setting. Future research should attempt to replicate these findings in a more naturalistic setting. Further, 10 study participants reported use or knowledge of the app before study participation. We did not analyze this variable as a potential moderator because all 10 participants were in the high BCT group, but it should be noted that familiarity (or lack thereof) with the app could influence ratings. As it relates to the apps, the high BCT app was still relatively low in BCT quantity, considering there are 40 BCTs listed in the CALO-RE taxonomy. 19 Still, the high BCT app had more BCTs than the average of 6.6 observed in previous studies that quantified app BCT content.7,8 Lastly, this study did not consider the quality of the BCTs being used in each app, but rather focused on quantity of BCTs. This study focused on BCT quantity because previous literature cites low theoretical (BCT) content as a potential factor in why PA apps do not result in significant changes in PA behavior. 25 However, additional studies should be conducted to examine how BCT quality impacts user ratings of the quality and perceived impact of an app.
Conclusion
In summary, the results of this study indicated app users rated a PA app with higher BCT content more favorably on aesthetics, engagement, subjective quality, and perceived impact than those with reduced BCT content. These findings supported previous literature that suggested theoretical content might influence app effectiveness. The results from this study should encourage app designers and researchers to consider the quantity and quality of BCTs included in PA mobile apps. Also, app developers might utilize this information to influence how to market PA apps to consumers. Future researchers should include app users from a wider range of age groups and all PA levels. In addition, researchers should experimentally test apps with higher BCT counts in comparison to apps with lower BCT counts on their ability to promote PA.
Footnotes
Acknowledgements
We would like to acknowledge Susan Macedo for her assistance with data collection on this project.
Contributorship
AD developed protocol, oversaw data collection, conducted the analyses, and drafted the manuscript. RE contributed to the protocol, assisted with data collection, provided guidance on analyses, and edited the manuscript. Both authors approved the final version of the manuscript.
Conflict of interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
The study procedures were approved by GSU’s Institutional Review Board.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Guarantor
AD.
Peer review
This manuscript was reviewed by reviewers who have chosen to remain anonymous.
