Abstract
Objectives
Despite the proliferation of mobile mental health apps, evidence of their efficacy around anxiety or depression is inadequate as most studies lack appropriate control groups. Given that apps are designed to be scalable and reusable tools, insights concerning their efficacy can also be assessed uniquely through comparing different implementations of the same app. This exploratory analysis investigates the potential to report a preliminary effect size of an open-source smartphone mental health app, mindLAMP, on the reduction of anxiety and depression symptoms by comparing a control implementation of the app focused on self-assessment to an intervention implementation of the same app focused on CBT skills.
Methods
A total of 328 participants were eligible and completed the study under the control implementation and 156 completed the study under the intervention implementation of the mindLAMP app. Both use cases offered access to the same in-app self-assessments and therapeutic interventions. Multiple imputations were utilized to impute the missing Generalized Anxiety Disorder-7 and Patient Health Questionnaire-9 survey scores of the control implementation.
Results
Post hoc analysis revealed small effect sizes of Hedge's
Conclusions
mindLAMP shows promising results in improving anxiety and depression outcomes in participants. Though our results mirror the current literature in assessing mental health apps’ efficacy, they remain preliminary and will be used to inform a larger, well-powered study to further elucidate the efficacy of mindLAMP.
Introduction
Digital mental health tools present a scalable and affordable solution to address the current treatment gap in behavioral health.1,2 Though synchronous telehealth has been vital in delivering behavioral health services since the pandemic, 3 asynchronous self-help-based interventions such as smartphone apps have garnered particular interest as they can expand access beyond the limited number of clinicians and augment ongoing care. With smartphone ownership at near-universal levels in the United States, 4 patients report high interest in downloading a mental health app,5,6 and many already used a mental health app to monitor their condition.5–8 Therefore, understanding the efficacy of apps to improve mental health outcomes is important.
Though many people are downloading apps, little is known about the efficacy of these apps to improve mental health conditions.
9
Most apps available on public app marketplaces lack any scientific evidence.10,11 The research studies that have been conducted report promising, but at times, conflicting results. Several meta-analyses summarizing the efficacy of mental health apps demonstrate that these interventions have a small effect (
Another challenge in generating efficacy data is low patient engagement with mental health apps, which is most pervasive when assessing real-world usage.24–26 Studies of naturalistic use patterns show that few users will engage with a mental health app after five days. 27 Though improved, some studies also struggle with maintaining engagement. Sufficient mental health app engagement must precede any analysis or claims of efficacy that can be made. 28 Consequently, improving app engagement has become a central focus of researchers. 25 Strategies to boost patient engagement have included monetary compensation 29 and improved app design, 30 but these have yet to demonstrate a significant sustained impact. Human support, often in the form of coaching, has long been cited as one strategy to improve engagement rates31–34 and is now offered routinely as a part of app interventions.
In this article, we explore how the scalable and reusable nature of apps can support different studies that can complement each other and offer novel methods to explore the role of digital placebo, app interventions, and the impact of human support. For example, by using the open-source mental health application, mindLAMP, we compare data collected from two studies that feature two unique implementations of the same mindLAMP app: an unguided mood monitoring implementation versus a coached implementation of the app. Through this comparison, we will generate a potential effect size of mindLAMP with human support on the reduction of anxiety and depression symptoms which will be utilized to help power a formal subsequent study.
Methods
Recruitment
Both the control and intervention studies recruited participants who had elevated levels of stress or anxiety through online posts.
The methods for the control implementation have been published. 35 Inclusion criteria included English fluency, a mindLAMP compatible smartphone, a college email address, a student ID card, and a score of 14 or higher on the Perceived Stress Scale 36 after the completion of a screening survey. Six hundred and ninety-five participants were recruited entirely through online posts and met all the requirements above. Eighty-three participants were excluded due to never downloading the app. Participants who did not complete any of the weekly Patient Health Questionnaire-9 (PHQ-9) or Generalized Anxiety Disorder-7 (GAD-7) surveys were further excluded from the study.
For the six-week study intervention implementation 37 (Camacho et al., 2023 [Forthcoming]), individuals with anxiety and depression were recruited through Researchmatch.org from July 2021 to February 2022. To participate, individuals needed to be at least 18 years old, own a mindLAMP-compatible smartphone, and score a minimum of 5 on the GAD-7 scale. 38
Protocol
During the intake of the control implementation, a trained research assistant introduced mindLAMP to the participant and answered any relevant questions about this 28-day study. Participants completed a survey consisting of the Perceived Stress Scale (PSS) 39 questionnaire, demographic questions, and a question asking if they ever had COVID-19. The survey was completed and stored on REDcap. Throughout the study, mindLAMP sent push notifications each day for a brief survey and bi-weekly for a longer survey to be completed on the app. The daily survey consisted of 11 questions selected from the Patient Health Questionnaire-9, 40 the General Anxiety Disorder-7 scale, the Prodromal Questionnaire-16, 41 and the PSS. The bi-weekly survey included all questions from PHQ-9, GAD-7, PSS, UCLA Loneliness, 42 PQ-16, Pittsburgh Sleep Quality Index (PSQI), 43 and D-WAI. 44 The mindLAMP app also offered access to the interventions available in the intervention version (see below). However, no engagement support such as scheduled notifications and human coaches were utilized in the control version. The lack of engagement support in the control implementation resulted in low app usage despite the therapeutic interventions being readily accessible in the app. After concluding the study, a research assistant emailed participants with an exit survey and instructions to uninstall the app. Participants in the control implementation were compensated up to $50 depending on the completion of the bi-weekly surveys alone.
After starting the intervention implementation, participants met virtually twice (for up to 20 minutes per session) with a digital navigator or coach for app-related and engagement support, but not for clinical advice. The digital navigators were trained through our standardized and published training to support app use in both care and research. 34 After every meeting, participants completed PHQ-9, GAD-7, PSQI, PSS, SIAS, 45 UCLA Loneliness Scale, Flourishing Scale, 46 SUS, 47 and the D-WAI, which were all completed and stored on REDcap. Participants in the intervention implementation were given $75 after completing the study with payment not related to app use.
Both groups had access to the full mindLAMP application which other than the surveys described above included mindfulness activities such as meditations, and CBT-based interventions. Both studies were approved by the Institutional Review Board of Beth Israel Deaconess Medical Center and written informed consent was obtained from all participants.
mindLAMP
Both studies utilized mindLAMP, an open-source smartphone app developed by our team. mindLAMP's customizable platform couples in-app interventions such as mindfulness activities and meditations with robust digital phenotyping capabilities. A more detailed description of the mindLAMP application's development is reported elsewhere 48 and screenshots of the app have been included in Figure 1.

Screenshots from mindLAMP
Analysis
To extract survey data from mindLAMP, Cortex, 49 an open-source data analysis pipeline for mindLAMP was utilized. Data in which the initial score for PHQ-9 and GAD-7 was lower than 5 was excluded. More specifically, all analyses excluded PHQ-9 or GAD-7 results if the corresponding initial score was less than 5. Iterative Imputer from the sci-kit-learn library was applied to impute the missing final PHQ-9 or GAD-7 scores with initial score, age, gender, and race and ethnicity serving as the predictor variables. Only the control implementation's data set required the multiple imputations noted above. Data from the intervention implementation was analyzed at both weeks 4 and 6 since that study was a 6-week study, while data from the control implementation was only analyzed at week 4 since the study's duration was 4 weeks. Non-parametric statistical tests were utilized as the pre-post data of both groups were not all normally distributed. We confirmed this by visually inspecting histograms (Figure 2) and through the Shapiro-Wilk test. The Wilcoxon signed-rank test (scipy.stats.wilcoxon) was applied to note any statistically significant longitudinal improvement in scores for the intervention and control study. The Wilcoxon rank-sum test (scipy.stats.ranksums) was utilized to detect if there was a statistically significant difference in percentage improvement of clinical scores between the two cohorts. A complete case analysis was also completed to compare the findings with the imputed results. Pearson from scipy.stats was utilized to measure any linear relationship between demographic and baseline variables and outcomes. All analysis was completed in the programming language Python on Jupyter Notebook.

Histograms depicting baseline symptom severity and clinical outcomes
Results
Demographics
Patient demographic characteristics of the control implementation's participants and the intervention implementation's participants included in the analysis are depicted in Table 1.
Participant characteristics in each study.
Kruskal-Wallis rank sum test.
Freeman-Halton test.
Outcomes
Participants in the intervention implementation improved from the beginning of the study to both weeks 4 and 6 for both GAD-7 and PHQ-9. The mean change for each is noted in Table 2. The mean change for all psychological assessments administered in the intervention implementation is provided in Supplemental Appendix Table 1.
Mean percentage change in GAD-7 and PHQ-9 values in the intervention study.
GAD-7: Generalized Anxiety Disorder-7; PHQ-9: Patient Health Questionnaire-9.
*Indicates the percentage change is statistically significant at
While the control implementation participants had a significant reduction in PHQ-9 scores, there was no significant reduction in GAD-7 scores over the four weeks. After the missing PHQ-9 and GAD-7 outcomes were imputed, there was a statistically significant reduction in both PHQ-9 and GAD-7. The mean changes for both complete cases and all the cases including imputed values are described in Table 3. There was no difference in baseline symptom severity between those that completed the full four weeks and those that did not in the control implementation.
Mean percentage change in GAD-7 and PHQ-9 scores in the control study.
GAD-7: Generalized Anxiety Disorder-7; PHQ-9: Patient Health Questionnaire-9.
*Indicates the percentage change is statistically significant at
**Indicates the percentage change is statistically significant at
In both the intervention and control implementation, baseline demographics did not correlate with outcomes for both GAD and PHQ. However, in both the intervention group and the control group, baseline symptom severity had a slight negative correlation with the percentage change in scores (Intervention: −.24 for PHQ, and −.22 for GAD, and Control: −.37 for PHQ and −.35 for GAD). A group of histograms has been included below depicting baseline symptom severity and outcomes.
Effect size calculations results
For analysis including the imputed control implementation values, there was a significant difference in percentage improvement between the control group's GAD-7 and the intervention group's GAD-7 at week 4. This was not the case for PHQ-9. However, upon comparing the intervention cohort's PHQ-9 improvements at week 6, their percent changes were significantly different than that of the control cohort's PHQ-9. For complete case analysis, the only significant difference was between the intervention group at week 6 and the control group at week 4 GAD-7. A more detailed breakdown of the corresponding
Effect size results.
GAD-7: Generalized Anxiety Disorder-7; PHQ-9: Patient Health Questionnaire-9.
Discussion
This exploratory analysis illustrates that implementing mindLAMP as an unguided mood monitoring tool versus as a coached intervention has a small effect on the reduction of anxiety and depression symptoms. Engagement with the app was higher in the intervention study than in the control mood monitoring case, likely due to the support offered in the two digital navigator check-ins. Even though the mood monitoring control condition had full access to the interventions, their utilization of said interventions was negligible, allowing us to utilize them as the control condition. Though preliminary, our reported effect size of
The potential effect size of mindLAMP aligns with other effect sizes of mobile health apps on improving anxiety and depression12,15 and the effect size of transdiagnostic treatments. 50 This effect size is exciting given the scalability of apps and the asynchronous delivery of the intervention. However, its magnitude is of course lower than traditional care 51 suggesting an adjunctive role of apps. While our study does not allow us to assess the impact of the two 20-minute digital navigator meetings, the use of coaches as a scalable means to drive engagement with apps appears promising given the shortage in licensed mental health clinicians 52 and has been widely cited.
Our results also suggest that mental health apps, like mindLAMP, have the potential to help individuals from various backgrounds. In our analysis, baseline demographic characteristics did not correlate with outcomes. This indicates that a variety of patients can benefit from the use of mindLAMP which aligns with a previous study conducted by our team. 53 In addition, the fact there was a negative correlation between the initial severity of PHQ-9 or GAD-7 and the outcome suggests that mindLAMP can help individuals with more severe symptoms just as much if not to a greater degree than patients with more mild presentations. These results support the ability of the app to increase access to care for patients with the highest needs.
Our results also raise questions for future research. While the effect size of
A strength of this study is that it reflects a real-world use case of a mental health app with additional methodological rigor. Specifically, by using mindLAMP in both the control and intervention implementation, our study was able to control for access to the app, app reminders, aesthetics of the app, access to the interventions, and other confounders often not controlled for in research studies. Given mindLAMP is free open-source software that is easy to configure into different versions, it is possible for other teams to use our results to assess their own effect sizes with or build upon our results here to create more effective interventions. Furthermore, the human support provided in the intervention study conformed to peer-reviewed and publicly available training, 34 which is rare as often little information is reported about digital navigators’ training, qualifications, and responsibilities.54,55 While our approach does not replace the need for rigorous randomized controlled trials, being able to assess the potential digital placebo effect before conducting a costly and timely study offers numerous advantages for agile research. It also offers a useful baseline for assessing the preliminary impact of cultural adaptions and additional interventions that could be added to mindLAMP in the future.
Limitations
Our study is limited by several factors. First, to calculate the preliminary effect size of mindLAMP, we combined two studies that were conducted separately. Because of this, the demographics between the two cohorts were slightly different which is noted in Table 1. However, current literature seems to support that app usage between younger and middle-aged adults does not differ. 32 Another limitation is that the protocol differed slightly between the control and intervention implementation as the control implementation lasted 4 weeks, while the intervention was 6 weeks in duration. This limitation was mitigated by analyzing and reporting data at both timepoints. Furthermore, the control implementation required a score of 14 or higher on the PSS, but this was not the case for the intervention implementation in which a score of 5 or higher was required on the GAD-7 survey. In future studies, the inclusion criteria will be the same for both the control and intervention groups. However, to partially address this limitation, significant PSS and GAD-7 correlations are provided in Supplemental Tables 2 and 3. Lastly, there was high missingness of data in the control condition, which is common in many digital health research studies including those that have received FDA approval. 25 To help counteract this missingness, multiple imputation was specifically utilized due to its ability to handle large fractions of missing information, and a complete case analysis was included for comparison as well. In light of these limitations, we consider this analysis to be exploratory as reflected in the title of this article. The results from this analysis will be used to inform a randomized controlled trial that will present similar participant demographics between groups, adherence to the same protocol and psychological assessments, and digital navigator use to promote participant engagement.
Conclusion
In this exploratory analysis, we present a means to utilize the scalable, reusable, and customizable nature of apps to explore the potential effect size of mindLAMP in reducing anxiety and depression symptoms. These promising results will be used to inform the study design of a large, well-powered study that will be conducted in the future. Despite the nontraditional methods presented in this study, our team implemented some of the most cited methodological strengths in digital health (i.e. active digital control, human support, and replicable materials) in a manner that others can use to advance their own research today.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231187244 - Supplemental material for An exploratory analysis of the effect size of the mobile mental health Application, mindLAMP
Supplemental material, sj-docx-1-dhj-10.1177_20552076231187244 for An exploratory analysis of the effect size of the mobile mental health Application, mindLAMP by Sarah Chang, Noy Alon and John Torous in DIGITAL HEALTH
Footnotes
Acknowledgements
None to declare.
Contributorship
JT designed the study, wrote the protocol, and completed data collection. SC and JT analyzed the data. SC and NA drafted the first draft of the manuscript. All authors contributed to revising the manuscript and have approved the final draft.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Both studies were approved by the Institutional Review Board of Beth Israel Deaconess Medical Center (protocols 2020P000310 and 2020P000589).
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Guarantor
JT
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
