Abstract
Purpose
The COVID-19 pandemic has intensified the demand and use of healthcare resources, prompting the search for efficient solutions under budgetary constraints. In this context, the increasing use of artificial intelligence and telemedicine has emerged as a key strategy to optimize healthcare delivery and resources. Consequently, chatbots have emerged as innovative tools in various healthcare fields, such as mental health and patient monitoring, offering therapeutic conversations and early interventions. This systematic review aims to explore the current state of chatbots in the healthcare sector, meticulously evaluating their effectiveness, practical applications, and potential benefits.
Methods
This systematic review was conducted following PRISMA guidelines, utilizing three databases, including PubMed, Web of Science, and Scopus, to identify relevant studies on the use and cost of chatbots in health over the past 5 years.
Results
Several articles were identified through the database search (
Conclusion
Furthermore, there are challenges regarding the implementation of chatbots, compatibility with other systems, and ethical considerations that may arise in different healthcare settings. Addressing these issues will be essential to maximize the benefits of chatbots, mitigate risks, and ensure equitable access to these health innovations.
Introduction
The COVID-19 pandemic has led to an increased use of healthcare resources. In the context of budget constraints and higher demand for resources, as well as social distancing and remote work, this has resulted in greater use of artificial intelligence and telemedicine, aiming for efficiency and efficacy in the delivery of care and promoting health.1–5 Efficiency is understood as better use of available resources and benchmarking, as established by public organizations with their well-known spending reviews, to focus resources on the most priority areas or activities.
In this way, despite the fact that many countries have a long tradition of evaluating public spending efficiency and have increasingly focused on this issue since the Great Recession fifteen years ago, it is generally observed that many have predominantly relied on linear spending cuts. This approach has persisted even after the lessons learned during the mentioned period, reflecting a global trend toward reducing expenditures without a thorough review of efficiency and effectiveness in resource allocation.
The use of chatbots in healthcare is also being driven by international collaborations and knowledge sharing. Organizations such as the World Health Organization (WHO) have recognized the potential of artificial intelligence to improve public health and have promoted the development of guidelines and standards for the implementation of these technologies. These initiatives aim to ensure that the benefits of healthcare chatbots are accessible globally, promoting equity and quality in healthcare. 6
In this context, related to the search for efficiency and the growing advancements in artificial intelligence, the idea of applying these technologies to medical uses has emerged. Consequently, the use of chatbots in the healthcare sector has grown significantly, becoming an innovative tool to improve patient care and optimize health processes. Aggarwal et al.
7
approximated the following definition for “chatbot” in healthcare: “
Chatbots offer various applications in the healthcare sector, from providing information on symptoms and treatments to scheduling appointments and medication reminders but their main focus till now is within mental health, screening, and public health according to Afsahi et al. 10 For example, some studies11–13 showed that chatbots can be effective in the field of mental health, providing clinical support and resources to individuals needing therapeutic conversations. This use can be especially beneficial in contexts where access to mental health professionals is limited. Other uses are related to the use of chatbots for changes in patient's behaviour towards healthier lifestyles 7 or even chatbots are helpful in the management of chronicity.14,15
Additionally, chatbots have the potential to monitor patient health and provide early interventions. Irfan et al. 16 in their article show that chatbots can monitor health parameters such as blood pressure and glucose levels, sending alerts to medical professionals when abnormal values are detected. This not only improves the quality of care but can also prevent serious complications by allowing for early intervention.
However, despite these benefits, a significant limitation in the literature is the lack of studies estimating the costs associated with the development and implementation of these systems. Evaluating the cost-effectiveness of chatbots is crucial to ensure that their use is not only clinically beneficial but also economically viable. Studies analysing the cost–benefit relationship will help justify the investment in these technologies and promote their adoption in healthcare systems with limited resources.
Chatbots also systematically collect data through specific queries and data retrieval, improving patient engagement and quality of life. Bergmo 17 emphasizes that comprehending the costs and benefits of eHealth interventions is crucial for several reasons: it helps to demonstrate cost-effectiveness, aids in decision-making, and is vital for creating business models and payment systems that can sustain widespread services. Integrating a cost-effectiveness analysis will allow for measuring not only the costs associated with the implementation and use of these platforms but also the economic benefits derived from improved health and reduced cardiovascular events.
Through this analysis, we aim to provide a comprehensive overview of how these interactive systems are being used to facilitate communication, diagnosis, patient follow-up, and medical information management, thereby contributing to more efficient and accessible care.
Methods
Search strategy and eligibility criteria
We have carried out this literature review following the guidelines of Preferred Reporting Items for Literature Review and Meta-Analyses (PRISMA) (Moher et al. 2009). This review is not registered, and a protocol has not been previously published.
We have used three different databases such as PubMed, Web of Science, and Scopus for the search of relevant articles published during the last 5 years to identify the most current studies published regarding the use and cost of chatbots in health. The search strategy is based on mainly three selected keywords: “Chatbots,” “Health” AND “Cost within the fields” “title, abstract and keywords,” depending on the database consulted (Table 1). Nevertheless, although the word “Costs” was a keyword in our search, this survey did not estimate the cost or savings of the chatbot.
Search strategy: PubMed, Web of Science, and Scopus.
Source: Authors’ elaboration.
Studies are included if they meet the following inclusion criteria: (a) full-text articles; (b) written in English or Spanish; (c) published in peer-reviewed journals; (d) published in the last 5 years; (e) focused on create a chatbot for the use in medicine. Meanwhile, studies are excluded from the further review if (a) results are not reported; (b) are non-English or non-Spanish publications; (c) are review articles, conference presentations, abstracts, editorial letters, or comments; (d) are deemed outside of the scope of the present review, that is, they do not focus on their application in medicine.
Study selection and data extraction
Using the procedure and criteria mentioned above, 231 articles are identified by the search strategy in our databases. Of all, 58 studies are found in PubMed, 123 in Web of Science, and 50 in Scopus. After the exclusion of the duplicates, 169 articles are screened. Titles and abstracts are reviewed, and 86 articles do not fulfill the inclusion criteria, so they are excluded. In total, 83 studies are included for full-text screening and reviewed, of which 52 are excluded, to do this, figures (such as flowcharts) and tables were used to synthesize the data. Therefore, the final number of studies included in this review is 33. Figure 1 summarized the search selection procedure as Table 2 shows the results.

Flow diagram for the search process for identifying and including references for the systematic review.
Summary of the case studies.
Source: Authors’ elaboration.
AIML: artificial intelligence markup language; CBL: cognitive behavioral therapy; NLP: natural language processing technology; HIV: human immunodeficiency virus.
After removing duplicates, two independent reviewers performed the first selection of the screening process. To assess the quality of the articles selected in this systematic review, the authors applied a structured methodology to ensure the rigor and reliability of the included studies. On the one hand, studies were initially assessed against predefined inclusion criteria, such as the relevance of the approach to the use of chatbots in healthcare (communication, diagnosis, patient monitoring, and medical information management), their methodological design, and the availability of complete data. Articles that did not meet these criteria were excluded. On the other hand, studies that passed the initial review were assessed by at least two independent reviewers to reduce bias in the quality assessment. In case of disagreement, both researchers discussed the article with a third reviewer and a joint decision was reached (include or exclude the article). For each of the studies selected for inclusion, one author designed a data extraction procedure that was approved by the other review authors. The extraction form included the following data: authors and year of publication, country, study objectives, population characteristics and sample size, type of nudge, duration of intervention, setting, study outcome, and outcome.
As shown in Figure 2, the first paper found on the use and cost of chatbots in healthcare was published in 2020, despite this literature review covering the last 5 years. In this document, a total of 31 articles are considered. The temporal evolution has been increasing from 2020 to 2023. Thus, in the first year, there were 2 articles, while the highest number of articles was published in 2023 (14 articles). The annual growth of articles has been as follows. From 2020 to 2021, articles analysing interactive systems to facilitate communication, diagnosis, patient follow-up, and medical information management have increased by 250.00%, from 2 articles to 7. From 2021 to 2022, this percentage has grown to a lesser extent, by 14.29%, from 7 to 8 publications related to our objective. Meanwhile, from 2022 to 2023, the annual growth corresponds to 75.00%, increasing to 14 articles.

Evolution of published papers over time.
Then, Figure 3 plotted the distribution of the selected studies by country. In this context, most studies were conducted in India (

Distribution of individual country articles by country.
According to the applications of chatbots in health, here were a synthesis of the chatbot interventions described in the reviewed articles. The articles were grouped by similar types of interventions, demonstrating the diverse applications of chatbot interventions in healthcare, ranging from mental health support and medical information to appointment management and health education.
Several articles included mental health support and psychological well-being chatbots. Kaywan et al. 23 assessed the feasibility and effectiveness of detecting early and assisting depression through an artificial intelligence chatbot, which showed some potential for assisting with automation and discreet communication. Potts et al. 24 evaluated a multilingual mental health and well-being chatbot called ChatPal, which aimed at improving mental well-being of the participants from rural areas. Meanwhile, Sabour et al. 27 analyzed effectiveness in reducing symptoms of mental distress through a conversational agent providing cognitive support (Emohaa), concluding significant improvements in symptoms of depression, negative affect, and insomnia. Suharwardy et al. 28 evaluated acceptability and efficacy for mood management using a mental health chatbot intervention in postpartum women, showing results such as high user satisfaction but limited impact on depressive symptoms due to baseline conditions. In addition, Upadhyaya and Kaur 29 assisted individuals using a chatbot that employs Cognitive Behavioural Therapy to provide mental health support, help manage emotions, and cope with thoughts and experiences. Zhu et al. 11 investigated determinants of user satisfaction, interaction, and clinical support with mental health chatbots, designed for symptom collection, health status monitoring, medication reminders, and patient education on their health conditions and presenting positive influences on user satisfaction and continuance intention. In the case of Hakani et al., 33 the authors detected signs of anxiety and suggested methods for controlling depression using a digital chatbot, but their results were limited due to lack of data for accurate predictions. Bendig et al. 39 improved psychological well-being using agent-based software and they found positive results in psychological well-being. In particular, Gabrielli et al. 41 managed stress and anxiety through a psychoeducational chatbot (Atena) and showed benefits for university students in stress management. Omarov et al. 42 provided personalized psychological support using a mobile psychologist chatbot based on artificial intelligence markup language and cognitive behavioral therapy, concluding significant advances in accessible psychological support. For example, Daley et al. 45 evaluated the engagement and effectiveness of a mental health chatbot (Vitalk), delivering mental health content in a conversational format, showing decreased anxiety, depression, and stress levels.
On one hand, other articles were focused on medical and health information chatbots. Anjum et al. 18 provided accessible healthcare information using artificial intelligence and natural language processing technology through a medical chatbot offering advice on healthy lifestyles, who demonstrated effective results in reducing costs and making healthcare accessible. In this context, Bhuvanesh et al. 19 developed a chatbot for symptom assessment and disease classification, concluding that it presented high accuracy in disease diagnosis. In particular, Hasnain 22 provided information about monkeypox infection using ChatGPT for disease information dissemination, highlighting potential disadvantages in prediction competency. Rekik et al. 26 built a medical chatbot for health predictions and medical care based on symptoms for the Tunisian dialect and it also demonstrated high efficiency and accuracy. Vera and Palaoag 30 provide information on medicinal plants and their applications using a chatbot for alternative healing methods, which made a significant contribution to healthcare with sustainable practices.
On the other hand, two articles were related to disease diagnosis and communication management chatbots. Irfan and Zafar 16 diagnosed illnesses and provide basic disease information through a text-to-text conversational agent for health problem diagnosis, which was easy to use and offered personalized diagnoses. Prakasam et al. 25 analyzed a WhatsApp chatbot that facilitated managing, booking, canceling, and rescheduling doctor appointments and enhanced communication between patients and physicians. Mehta and Singh 34 used natural language processing technology, that is, a robot, for human–computer communication in medical contexts, showing the useful applications in offices and medical centers. In the case of Bhangdia et al., 31 they evaluated symptoms and classify possible diseases based on symptoms and emotions, provided appropriate responses, and suggested activities based on the user's mood using a chatbot. Pivithuru et al. 44 created a multifunctional, user-friendly technology for an Integrated Electronic Patient Health Record System, including, specifically, a chatbot for medical record management and showing 90% accuracy in lung disease diagnosis. Srivastava et al. 46 built a diagnosis bot (Medibot) for engaging patients in medical queries and providing individualized diagnoses based on symptoms and patient profiles, presenting a recall of 65% and a precision of 71% in symptom identification.
In addition, health education and awareness were contexts covered by some articles considered. Goldnadel et al. 21 educated on alcohol-related topics with risk reduction strategies through digital conversational agent accessible 24/7, promising scenario for health education on alcohol harms. Peng et al. 38 assisted in Human Immunodeficiency Virus (HIV) testing and prevention among marginalized populations using artificial intelligence chatbot for culturally sensitive health tool, and the results showed valuable insights into key features for design and implementation.
Two articles also analyzed chatbots for promoting healthy lifestyle changes. Binh et al. 32 improved the lives of people suffering from obesity through an active lifestyle using a specific platform, showing the positive usability and feasibility, suitable for collecting fitness data and logs. Similarly, Dhinagaran et al. 40 promoted healthy lifestyle changes using a conversational agent, which made useful recommendations for improving future digital interventions. Larbi et al. 43 tried to increase physical activity using a Telegram-based chatbot, but their results were inconclusive based on motivation to increase physical activity.
Related to COVID-19 management chatbots, we found some other articles. Booth et al. 20 analyzed the chatbot's log data to provide information on usage patterns and different types of users during the beginning of the COVID-19 pandemic, presenting positive insights into user types and usage patterns, helping in further app development. In the case of Mellah et al., 35 they suggested measures against COVID-19 using a bilingual chatbot (Arabic and French) with a 90% accuracy in risk prediction. Natsheh and Jabed 36 provided 24/7 help to meet the increasing demand of COVID-19 patients, having the capability to meet future healthcare sector needs. Okonkwo et al. 37 assessed the vaccination status of students against COVID-19 using an interactive system, with high levels of effectiveness and usability.
Discussion
This article aimed at summarizing the growing adoption and effectiveness of chatbots in health care systems. We demonstrate that chatbots are revolutionizing healthcare by providing accessible, efficient, and effective solutions across different domains, including mental health and patient management, disease diagnosis, and health information dissemination. This fact indicates a significant shift towards digital health solutions.
The integration of AI and NLP in chatbots enhances their functionality and accuracy. Studies like those by Bhuvanesh et al. 19 and Rekik et al. 26 demonstrate high diagnostic accuracy and a n improved access to healthcare, suggesting that AI-driven chatbots could become integral components of future healthcare systems. Moreover, chatbots have proven to be effective in engaging patients and providing health information in a user-friendly manner. The ability of chatbots to operate 24/7 and provide immediate responses makes healthcare more accessible, particularly for populations with limited access to traditional healthcare services as Natsheh and Jabed 36 demonstrated.
Regarding costs, the fact that routine tasks such as appointment scheduling, management of medical records, and processing of administrative data, which reduces the workload of healthcare staff can be automatized and provide preliminary health assessments, chatbots can significantly reduce direct costs. Anjum et al. 18 and Upadhyaya and Kaur 29 highlighted the cost-reducing potential of healthcare chatbots, making systems more sustainable and efficient while chatbots improve the quality in healthcare delivery.
One the one hand, studies such as those by Kaywan et al. 23 and Sabour et al. 27 provide clear evidence on the potential of AI-driven interventions in addressing mental health problems through accessible and discreet platforms. The evidence suggests that mental health chatbots can serve as valuable tools in extending mental health support, particularly in underserved areas. The success of multilingual and culturally sensitive chatbots, like ChatPal and those targeting rural populations, indicates that such technologies can bridge gaps in mental health care accessibility and equity. Moreover, the diverse applications of chatbots, from cognitive behavioural therapy (CBT) to psychoeducation, illustrate the flexibility and potential of these technologies in delivering personalized mental health care. This adaptability is crucial for developing tailored interventions that can cater to varied demographic and psychological needs.
The broader application of chatbots in health information dissemination, disease diagnosis, and lifestyle promotion demonstrates their potential to contribute significantly to overall health and well-being. Chatbots can offer personalized healthcare solutions by using patient data to provide tailored advice and interventions. This personalized approach is crucial for managing chronic diseases and improving patient outcomes, as demonstrated by Upadhyaya and Kaur 29 and Bhangdia et al. 31
In the case of recent Large Language Models (LLMs), they have significantly surpassed their predecessors in the ability to understand and generate natural language, enabling much more fluid and accurate interactions with patients.47,48 This improves the quality of communication and user satisfaction. Greater accuracy in language analysis and the generation of contextualized responses reduces errors and improves the reliability of diagnosis as well as treatment recommendations. 49 In addition, LLMs allow chatbots to better adapt to patients’ needs, providing more personalized responses based on deeper analysis of available data. This not only increases effectiveness in monitoring and managing medical information but also improves clinical outcomes50,51. Nevertheless, while previous studies showed that chatbots could reduce costs by alleviating the burden on healthcare staff, the new LLMs offer even greater efficiencies, allowing for better scaling without compromising service quality.52,53 However, we recognize that the upfront costs of implementing LLMs can be higher due to their complexity and infrastructure requirements. We have also to add to this analysis the fast evolution of chatbots. It presents a challenge when interpreting conclusions from older studies and even possible replication in the near future. Early research in this field often relied on less sophisticated algorithms and limited computational resources, which constrained the functionality, accuracy, and scalability of chatbots. For example, earlier systems lacked the advanced natural language processing and adaptive learning capabilities that are now available in modern large language models (LLMs) that are accessible to the wide public such as GPT-4 and Gemini. These newer models offer substantially improved contextual understanding and personalized interaction, enabling more effective applications in healthcare. As a result, while older studies provide valuable foundational insights, their relevance may be limited in the context of today's technological advancements. The lack of enough data from real-world applications of current-generation chatbots also poses a barrier to totally understand their potential. This discrepancy highlights the need for ongoing and continuously update research to evaluate the impact of brand new chatbots and ensure that conclusions drawn from earlier studies are appropriately contextualized.
As evidenced by studies on COVID-19 management and health education, chatbots can play a critical role in public health responses and preventive care. The studies by Mellah et al. 35 and Okonkwo et al. 37 highlight how chatbots can be rapidly deployed to support public health efforts and very useful disseminating information, assessing symptoms, and managing patient inquiries such as vaccination status. Chatbots can also play a key role in health education. Goldnadel et al. 21 and Peng et al. 38 showed promising results at this regard in educating the public on critical health issues, promoting preventive measures, and reducing the spread of misinformation.
Despite their benefits, the deployment of chatbots in healthcare comes with challenges such as ensuring data privacy, addressing ethical concerns, and managing user trust. Some studies like the one from Suharwardy et al. 28 reveal some limitations in their study concerning baseline conditions of users. Additionally, Hakani et al. 33 noted the importance of improving the accuracy of their depression predictive model in the case of the lack of extensive data. These issues need to be carefully considered and addressed to fully realize the potential of chatbots in healthcare. Future research should explore the long-term impact of chatbot interventions on patient outcomes, the integration of chatbot technology with electronic health records, and the development of more sophisticated AI models to enhance chatbot capabilities. Additionally, cross-disciplinary collaboration might foster innovative solutions and address the current limitations.
Despite the strengths of this review, some limitations should be noted. First, the review includes studies published in English and Spanish, which may introduce some language bias and limit the generalizability of the findings. Although most of the newest advances in LLM and chatbots may be published in English. Secondly, although we include articles published within the last five years to ensure updated evidence, some of the first studies that could offer additional insights in the evolution of LLM and chatbots may have been not included in our paper. Thirdly, the heterogeneity of the included studies in terms of methodologies, sample sizes, and outcomes makes it challenging to draw definitive conclusions and propose policy recommendations. Furthermore, many studies provided limited details on the cost-effectiveness and real-world implementation challenges of chatbot technologies. Lastly, as chatbot technology continues to evolve rapidly, some findings from included studies may already be outdated, being necessary continuous updates to capture the impact of emerging innovations. Future research should address these limitations by incorporating broader language inclusivity, longitudinal assessments, and real-world data to provide a more comprehensive evaluation of chatbot applications in healthcare.
Conclusions
In summary, chatbots represent a transformative technology in healthcare, offering numerous benefits from increased accessibility and personalized care to cost savings and enhanced patient engagement and empowerment. Despite these benefits, challenges remain in integrating these tools into healthcare systems and assessing their long-term impact. In recent years, the development of LLMs, such as GPT-4 or Gemini among others, has significantly improved the effectiveness and accuracy of chatbots, enabling more fluid and contextualized interactions with patients. Additionally, recent advancements and optimizations in technological infrastructure, along with increasing market competition, have led to a substantial reduction in the costs associated with using LLMs. This decrease in costs can enhance the cost-effectiveness of chatbots in healthcare settings, enabling their use not only in large hospital systems but also in smaller clinics and rural environments, where access to healthcare is limited. However, the high costs of implementation and operation of these technologies should not be overlooked, as they have been a barrier to their widespread adoption.
To fully leverage the potential of this technology, it will be crucial to continue fostering innovation, supporting robust research, and developing supportive policies that regulate and evaluate the use of chatbots to ensure they meet the required quality and safety standards. Additionally, regulatory frameworks must evolve to ensure that advancements in AI and LLMs are integrated ethically and efficiently into clinical practice, safeguarding patient privacy and data security.
Finally, policymakers should develop frameworks to regulate and evaluate the use of chatbots in healthcare, ensuring they meet quality and safety standards. Support for research, evaluation and development in this field can accelerate the adoption of chatbot technology leading to improved healthcare delivery and outcomes.
Footnotes
Acknowledgements
The authors acknowledge the Government of Cantabria for funding this research.
Contributorship
Conceptualization: DC, AD, and RM conceived the systematic review, defining the study's focus and objectives. Methodology: MB developed the search strategy, inclusion/exclusion criteria, and designed the methodology for study selection and data extraction. Data collection: DC, PL, and JL were responsible for data collection, conducting the search and selection of relevant studies. Data analysis: DC and MB. Writing—original draft preparation: The initial draft of the manuscript was written by MB, PL, and JL. Writing—review & editing: The review and editing of the manuscript were done by DC, AD, RM, and FP. Supervision: DC and MB supervised the entire process, ensuring the consistency and quality of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Ethical approval to report this research was obtained from «COMITÉ DE ÉTICA DE LA INVESTIGACIÓN CON MEDICAMENTOS DE CANTABRIA» (11/2024–10/05/2024).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Government of Cantabria (Grant Number: SUBVTC-2023-0021). This study is also funded by the European Commission in the Horizon H2020 scheme, awarded to the TIMELY project (Grant agreement ID: 101017424).
