Abstract
Background
Digital mental health interventions can be effective for treating mental health problems, but uptake by consumers and clinicians is not optimal. The lack of an accreditation pathway for digital mental health interventions is a barrier to their uptake among clinicians and consumers. However, there are a number of factors that may contribute to whether a digital intervention is suitable for recommendation to the public. The aim of this study was to identify the types of evidence that would support the accreditation of digital interventions.
Method
An expert workshop was convened, including researcher, clinician, consumer (people with lived experience of a mental health condition) and policymaker representatives.
Results
Existing methods for assessing the evidence for digital mental health interventions were discussed by the stakeholders present at the workshop. Empirical evidence from randomised controlled trials was identified as a key component for evaluating digital interventions. However, information on the safety of users, data security, user ratings, and fidelity to clinical guidelines, along with data from routine care including adherence, engagement and clinical outcomes, were also identified as important considerations when evaluating an intervention. There are considerable challenges in weighing the evidence for a digital mental health intervention.
Conclusions
Empirical evidence should be the cornerstone of any accreditation system to identify appropriate digital mental health interventions. However, robust accreditation systems should also account for program and user safety, user engagement and experience, and fidelity to clinical treatment guidelines.
There is extensive evidence that digital mental health interventions can be effective for treating mental health problems.1–5 However, uptake by community members and clinicians for some evidence-based services has been limited, which may result from a failure of implementation into practice,1,6 from limitations on the appropriateness of research evidence, 7 or from market saturation of untested interventions.8,9 Digital mental health interventions are defined here as online programs and apps that deliver structured therapy (based on existing evidence) to the user, in either a self-guided or clinician-supported format. This definition excludes therapy delivered live by clinicians using technology, informational or non-therapeutic programs, and tools used exclusively by clinicians. Clinician and consumer barriers to the use of these interventions include limited awareness of digital mental health interventions and their appropriateness for different mental health problems, preference for face-to-face care, lack of knowledge of the evidence base supporting their use, poorly integrated delivery pathways, concerns around the privacy of using these interventions, a perceived gap between users’ actual needs and the problems typically addressed by diagnostically driven online interventions, a lack of user input into the design and delivery of interventions, and a lack of formal accreditation processes that would feed into the identification and delivery of appropriate digital interventions to the public.1,6,10,11
A number of directories or portals of digital mental health interventions are now available, and may be employed to support the use of such interventions in clinical, community or individual settings. Some of these, such as Beacon 12 and Psyberguide 13 include an evaluation of the empirical evidence for each available intervention, which allows users to identify programs that have the strongest evidence base for efficacy. However, there are broader considerations that may be important in assisting potential users to determine the most appropriate intervention. 14 Accreditation systems for digital interventions need to account for multiple factors in identifying interventions that are appropriate for public use and providing recommendations to clinicians or consumers.
We report on the outcomes of an expert workshop that was convened to discuss the challenges in assessing evidence for digital mental health interventions and how these issues might feed into systems of accreditation for such interventions. Formal systems of accreditation for digital interventions are emerging, such as the UK Digital Assessment Questionnaire (DAQ) 15 and the US Food and Drug Administration’s procedures for approval of digital tools as medical devices. 16 However, the workshop was designed to identify the viewpoints of diverse stakeholders on how different types of evidence can be integrated into new accreditation systems, without specific reference to these emerging accreditation systems.
Method
Attendees of the workshop (n = 15) were researchers in digital mental health (n = 9, three were also clinicians), clinicians (n = 5), a representative of people with lived experience of mental health condition (n = 1), Government representatives from the Australian Department of Health (n = 2) and a note-taker. Although only one consumer representative attended, specific discussion points around consumer engagement were included in the agenda, with discussion in this area led by the consumer representative. The 5-h workshop was conducted as an unmoderated group discussion in February 2018 to inform the ongoing development of the Australian Government’s headtohealth portal (https://headtohealth.gov.au), a website that lists Australian information, resources and services to support mental health. The Beacon approach for assessing empirical evidence for web-based therapy programs, apps and Internet support groups was described. Beacon is a catalogue describing existing online health interventions with an evidence rating based on randomised controlled trials and other empirical designs. 12 Loosely following an agreed agenda, limitations to the Beacon methodology were discussed, along with broader challenges of relying primarily on evidence from randomised controlled trials (RCTs) for each digital program or resource to facilitate the selection of appropriate digital interventions. In addition to empirical evidence, other important indicators to guide selection of digital mental health interventions for clinicians and consumers were identified. Discussions were structured around identifying appropriate evidence for three types of interventions: online programs, apps, and Internet support groups (which are outside the scope of the current paper).
Results
Existing approaches to rating digital health interventions
Existing techniques for assessing digital mental health interventions were identified by workshop participants, as current approaches may provide more efficient pathways to developing a rigorous evaluation or accreditation framework. Some of these approaches focus only on empirical evidence, while others account for user experience, security of data systems and/or alignment with clinical treatment guidelines.
Relevant approaches to evaluating digital interventions identified in the workshop include:
Beacon (https://beacon.anu.edu.au/): Australian-based directory of internationally available digital health programs (web-based, mobile apps) with ratings based on peer-reviewed scientific evidence; Psyberguide (https://psyberguide.org/): directory of mental health apps with ratings on scientific evidence, user experience, and clarity of the privacy policy, based in the United States; Mobile Application Ratings Scale:
17
a tool to rate apps focusing primarily on usability and engagement, but also including an expert rating of content. The scale has potential for adaptation to rate websites; NICE Guidelines (https://www.nice.org.uk/guidance): a framework for identifying which therapeutic strategies are likely to be consistent with treatment guidelines.
Additional resources that include listings of digital interventions include:
headtohealth (https://headtohealth.gov.au): recently developed Australian Government portal for digital and other mental health services, which does not provide an evaluation; NHS apps database (https://apps.beta.nhs.uk/): directory of apps, with inclusion based on multiple inputs including a technical assessment of data security, DAQ assessments and NICE guidelines, based in United Kingdom; National Registry of Evidence-based Programs and Practices (https://nrepp.samhsa.gov): US listing of evidence-based programs, which includes some digital programs (listings are provider-driven and must be accredited), based in United States.
The role and limitations of empirical evidence
The workshop discussion primarily centred on the roles of empirical evidence and other forms of evidence, and how different forms of evidence might be used to inform accreditation processes. There was consensus on the majority of discussion points, except where noted below. Disagreements in the workshop were not confined to specific stakeholder groups, but primarily occurred within the researcher (or researcher-clinician) group.
The attendees of the workshop noted the importance for clinicians and community users to have access to an updated database of available online interventions, with the scientific evidence for each intervention described and rated in an accessible way. Such databases provide clinicians and consumers with up-to-date information to guide their decisions about interventions that are most likely to be effective for them and suited to their needs. The Beacon directory assesses empirical evidence for digital mental health interventions largely on the basis of the number of RCTs with a positive outcome. Based on this system, more positive RCTs result in a higher evidence rating. However, RCTs for specific programs are not the only form of evidence, and there are a number of challenges and nuances that were discussed when assessing empirical evidence.
RCTs are highly variable and may not be appropriate in some conditions. 18 Discussion in the workshop noted that the quality of RCTs can vary, and depends on factors including sample size, randomisation method, blinding, and type of comparison/control condition. The quality of an RCT may impact on the results obtained, raising questions as to how to account for low-quality studies when assessing the evidence base. RCTs can be conducted in a range of settings. For example, a prevention trial of a self-guided program in schools is likely to have different expectations and outcomes compared to a treatment trial of a clinician-guided program delivered in a clinical setting. The type of program (self-guided versus clinician guided), the type of participants (e.g. healthy population versus severe clinical population, or specific subgroups defined by age, gender, ethnicity, etc.), the delivery setting (e.g. online, clinic, school, community) and the mental health target (e.g. depression, anxiety, substance use) are all likely to play a role in the outcomes of a trial. Moreover, as interventions are redesigned, turned into apps, become outdated or are used in ways that differ markedly from trial conditions, it is unclear the extent to which existing evidence remains valid. It was also noted in the workshop that RCTs conducted by an organisation independent of the developers of the intervention may be viewed by some as more rigorous than RCTs conducted by the developers. Attendees of the workshop were of the opinion that, following evidence from RCTs, large-scale effectiveness trials using data obtained from routine care provide useful information to inform public policy but are rarely conducted. There was also a consensus that evidence needed for apps was equivalent to that for online programs.
In addition to empirical evidence for effectiveness, adherence to and engagement with a digital intervention may be important considerations. Poor adherence may indicate an intervention is not engaging, and trials with high drop-out may have biases in the estimation of effects. Interventions that are shown to be effective in an RCT may still have suboptimal user engagement. 19 However, it remains imperative that the designers of interventions aim to optimise the user experience, as this is vital for effective, safe and engaging delivery. To this end, additional data such as adherence rates or consumer ratings may provide a clearer picture of how engaging a program is likely to be. It was noted that adherence is a complex outcome, and that early drop-out from an intervention may indicate a participant has had a positive response and hence, discontinued program use. Furthermore, trials with poor adherence or small sample sizes are less likely to show a positive effect, as the power to find a difference in effect is restricted. However, it was also noted that trials with poor adherence may reflect a poorly designed and ineffective intervention, emphasising the importance of monitoring outcomes from routine care.
Safety was discussed extensively during the workshop, and is a critical element for the identification of digital interventions that are appropriate for public use. The concept of safety covers both user safety (i.e. assurances that a digital intervention will not cause harm or increase the likelihood of deterioration) and digital safety (i.e. privacy and data security). It was noted that few trials report rates of deterioration, although interventions that are shown to be effective typically have fewer users with deterioration than in control conditions. 20 Interventions that are delivered within a service setting may have more extensive clinical data on user safety but may not have a comparator (i.e. a control condition) to enable benchmarking of the expected rates of deterioration without active intervention. It may be possible to assess data privacy and security by seeking information from providers about their policies, platforms and standards for security, and maintaining user privacy. However, independently verifying these claims remains challenging.21,22
There are other types of data that may indicate that a digital intervention is likely to be efficacious or effective, beyond RCTs. Discussions focused on two forms of data. Firstly, clinical service data from routine care (or relatedly, other types of empirical studies such as open trials) can be used to support the effectiveness and safety of an intervention. Programs that are delivered to the community as a clinical service may have considerable and detailed data on how specific types of users respond to programs over time. 23 Such services can continuously provide data on usage and clinical outcomes, can monitor user safety and can be used to identify appropriate clinical dosage. Such data are essential for determining the actual clinical benefits and risks when deployed in routine care, information which is essential for funders and planners. Secondly, fidelity to clinical guidelines may provide further evidence that an intervention is likely to be effective. Ensuring that program content is consistent with clinical treatment guidelines and does not include unsupported treatment strategies is one way to provide some reassurance that an intervention is likely to be useful and safe for users. Indeed, a minority of attendees (reflecting some of the researchers and clinicians providing Internet-based services) viewed fidelity to existing evidence-based programs as sufficient for meeting minimum standards for accreditation, much as other new (non-digital) clinical services that conform to clinical guidelines are not expected to undergo RCTs. Fidelity to clinical guidelines or evidence-based practice could be assessed using an expert clinical review of the intervention, for example. 9 However, it was noted that a program may be entirely consistent with clinical guidelines but have poor outcomes, potentially due to low user engagement (e.g. if a program is too text-heavy).
Discussion
The outcomes of the workshop with expert stakeholders, including clinician and consumer representatives, indicated a number of challenges for the development of accreditation systems that provide information about the suitability of Internet interventions based on multiple sources of evidence. There was considerable agreement among the attendees on issues such as the need for high-quality evidence of effectiveness, digital safety and user safety, along with demonstration that an intervention is engaging for its intended audience. There was also agreement that fidelity to clinical guidelines and data from routine service delivery provide important indicators of effectiveness and safety, although there was no consensus on whether such data alone would be sufficient for an accreditation system. There was also no consensus on the role of RCTs. Trials were seen by all attendees as providing high-quality evidence but the challenges of conducting timely trials that reflect real-world use of Internet interventions may limit their applicability and feasibility, necessitating the use of other forms of evidence. A majority of attendees viewed programs that do not have RCT data as problematic, as evaluations without a control group may reflect non-specific effects of an intervention or a natural course of improving symptoms. Overall, it was not possible to form a consensus on the best balance between multiple forms of evidence – this balance would need to be considered within the context of how an accreditation system is designed and delivered.
Existing directory services such as Beacon that assess empirical evidence using objective metrics (e.g. number of positive RCTs) are advantageous due to their transparency, simplicity, objectivity and reliance on high-quality evidence. However, this approach may not consider aspects of trial quality and other forms of empirical evidence, which may disadvantage interventions with poor adherence or those evaluated using small samples, or ones with evidence using non-RCT methodology. Reliance on data from RCTs may also be insufficient to demonstrate their uptake, clinical benefits and risks when implemented at scale as a clinical service that is available to the public, an important consideration when developing accreditation systems. There are other important factors to account for in assessing whether an intervention is likely to be effective and safe. Some interventions may be effective only for a subgroup of the population or when used in particular ways. Consumer ratings and adherence rates may provide a guide to how engaging an intervention is likely to be, an important consideration in recommending interventions. Many of these factors are being incorporated into emerging accreditation systems such as the DAQ. However, further consideration of the roles of different forms of evidence in the development of recommendations to consumers and clinicians is warranted, taking into account the diverse viewpoints of developers and end users. In particular, questions remain around the feasibility, sustainability and impact of accreditation processes and their inclusion of consumer and clinician evaluations of interventions.
Developing a feasible, sustainable and impactful accreditation or certification process
In developing an accreditation system for digital interventions, there are a number of factors that are likely to be important to ensure that the system provides useful recommendations to clinicians and consumers. At a minimum, standards should account for some level of empirical evidence that an intervention is effective and safe for users, along with evidence for data security, including protections for the privacy of users and transparency around security policies. Standards for reporting program content 24 and e-health trials, 25 including comparisons of deterioration rates and reporting of adverse events, would assist in identifying programs that are likely to be safe for users.
In addition, use of clinical service data 26 and fidelity to evidence-based practise or clinical treatment guidelines 27 are indicators that should be routinely reported. Such data will indicate whether a program is likely to be appropriate for use in the community and is delivering what it purports to deliver (e.g. cognitive behaviour therapy for depression). Clinical service data can establish ongoing positive impact and low deterioration rates, which may be used as a requirement for ongoing accreditation of a publicly available service. Expert clinical judgement may provide evidence for fidelity to clinical guidelines or existing treatment protocols, although evaluation of fidelity requires an objective rating framework to be developed and evaluated 24 and does not guarantee effectiveness, safety or acceptability.
User ratings from clinicians and consumers within a curated repository of digital mental health interventions may also be a valuable metric to assist users to determine whether a digital intervention might be appropriate and engaging. There are also many challenges of implementing a user rating system, including subjectivity of ratings, scope to ‘game’ the system, need for moderation, and the resources required to set up and maintain a robust and user-friendly rating system. It should also be noted that user star ratings within general app stores have been shown to have limited correlation with measures of the clinical quality of apps. 28 Establishing independent panels of clinician and consumer users to provide a consistent rating process may overcome some of these challenges. An efficacious intervention may be ineffective if it is not designed around the needs of the user.
Indicators for the multiple attributes that may be used to identify an appropriate digital intervention are likely to be independent (e.g. user engagement may not always be consistent with effectiveness). Therefore, it may be challenging to develop overall benchmarks to identify whether a program should be recommended by an accreditation process. Reporting of individual indicators may be preferable and allow users to focus on the areas that are most important to them, with potential for an accrediting agency to establish minimum standards for each attribute. Accreditation procedures may require flexibility to assess the total body of evidence available for a digital intervention and present this evidence in a way that meets the needs of both clinicians and consumers. This information should include the contexts and specific outcomes where the digital intervention has documented impact.
Currently, digital interventions operate along a continuum of regulatory requirements, from therapist-guided interventions that typically must comply with practitioner regulations, to self-guided apps that typically have no such requirements. Apps are often developed by laypeople with no oversight or accountability, or by software companies with limited expertise in mental health or research methodology. Information on the attributes important to accreditation may be collected in several ways. The onus could be placed on providers to demonstrate that they meet minimal criteria for efficacy, effectiveness when deployed within the targeted population or relevant service setting, safety and user engagement, or an independent panel could collect this information. There may be risks in a self-report model, such as a lack of independence and limited expertise or resourcing. Alternatively, external peer review models or a combination of methods could be used, although the volume of interventions available is likely to require limits on the scope of external reviews. The choice of an accreditation model may require choosing an appropriate business model to ensure that the process is dynamic, so that listings remain up to date, followed by building public awareness and trust in the system. Updating the evidence base as programs age will remain a challenge, although linking evidence updates to an accreditation process may ensure greater accountability. There also remains a need for sufficient expertise, training and standard processes to rigorously measure each component of an accreditation system.
Limitations
A limitation of this paper is that other types of digital support interventions, such as Internet support groups and chat-based therapy, may be more difficult to incorporate into an accreditation process, as it may be challenging to assess clinical outcomes in these settings and standardised delivery cannot be guaranteed. The marketing and promotion of directories or accreditation systems is an additional challenge, as existing evidence-based portals compete with app stores, search engines, and other established and well-funded sources of information that neglect quality indicators. The inclusion of a variety of stakeholders in the workshop was a strength, although it was not feasible to include a wide selection of consumers or carers, overseas experts, or other experts such as information technology experts, who may have divergent viewpoints. Further investigation of how consumers, carers and clinicians weigh different forms of evidence is warranted.
Conclusions
Empirical evidence for effectiveness should be the cornerstone of any accreditation or directory system that identifies appropriate digital mental health interventions. Although RCTs remain the strongest evidence that a program is efficacious, they may not provide evidence of effectiveness. Furthermore, there are limitations to the use of RCTs7,18 and limitations in the application of trial evidence to the delivery of interventions as a clinical service. Ideally, RCT evidence should also be supported by evidence for effectiveness from large-scale pragmatic effectiveness trials in the relevant population or clinical settings, which enable regular reporting of outcomes that are relevant to users, clinical services and funding organisations. Robust accreditation systems should also account for program and user safety, user engagement and experience, and fidelity to clinical treatment guidelines. The key outcomes and indicators that go into any evaluation of existing digital interventions should be transparent, systematic and objective.
Footnotes
Conflict of interest
The author(s) declared following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: PJB, ALC, BOD, MEL, DK, NT, SM, IH, MT BFD and LT are involved in trials and/or delivery of Internet interventions but do not directly profit from these interventions. NT and BD are funded by the Australian Government to develop and provide a national online and telephone-delivered mental health service, and are funded by the Western Australian Primary Health Alliance to operate a statewide online and telephone-delivered mental health service.
Contributorship
All authors provided input into the workshop discussion, reviewed and critically edited the manuscript. PJB wrote the first draft of the manuscript with input from ALC and PG. All authors approved the final version of the manuscript.
Ethical approval
Ethical approval was not required for this study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The workshop was supported with funding from the Australian Department of Health. PJB, ALC, IH, MT and BFD are supported by NHMRC fellowships 1083311, 1122544, 1136259, 1078407 and 1128770.
Guarantor
PJB
Peer review
This manuscript was reviewed by reviewers, who has chosen to remain anonymous.
