Abstract
Objective
The objective is to provide an overview of the application of large language models (LLMs) in healthcare by employing a bibliometric analysis methodology.
Method
We performed a comprehensive search for peer-reviewed English-language articles using PubMed and Web of Science. The selected articles were subsequently clustered and analyzed textually, with a focus on lexical co-occurrences, country-level and inter-author collaborations, and other relevant factors. This textual analysis produced high-level concept maps that illustrate specific terms and their interconnections.
Findings
Our final sample comprised 371 English-language journal articles. The study revealed a sharp rise in the number of publications related to the application of LLMs in healthcare. However, the development is geographically imbalanced, with a higher concentration of articles originating from developed countries like the United States, Italy, and Germany, which also exhibit strong inter-country collaboration. LLMs are applied across various specialties, with researchers investigating their use in medical education, diagnosis, treatment, administrative reporting, and enhancing doctor–patient communication. Nonetheless, significant concerns persist regarding the risks and ethical implications of LLMs, including the potential for gender and racial bias, as well as the lack of transparency in the training datasets, which can lead to inaccurate or misleading responses.
Conclusion
While the application of LLMs in healthcare is promising, the widespread adoption of LLMs in practice requires further improvements in their standardization and accuracy. It is critical to establish clear accountability guidelines, develop a robust regulatory framework, and ensure that training datasets are based on evidence-based sources to minimize risk and ensure ethical and reliable use.
Introduction
Large language models (LLMs) are artificial intelligence (AI) systems trained on extensive datasets, including articles, books, and other internet content. These models are artificial neural networks that utilize a transformer architecture invented in 2017. 1 Since then, LLMs have evolved rapidly, with notable models such as the GPT family and Google's Gemini. Among them, ChatGPT implements conversational interactions that can understand and generate human-like text based on received inputs. 2 Studies have shown that it performs at or near the human level in a variety of cognitive tasks. 3 LLMs can respond to free-text queries without requiring specialized training for each query. This versatility and adaptability allow the LLMs to show great potential in various areas of healthcare, including clinical diagnosis, 4 medical education,5–7 and research.8,9
Despite the impressive technological advances and capabilities of LLMs, most existing research has primarily focused on their technical aspects. 10 These studies often concentrate on specific models, such as the GPT family or BERT, or target particular applications within healthcare. 11 However, there is a noticeable lack of systematic analyses that explore the broader academic landscape of LLMs research, including its knowledge structure and emerging trends. Few studies have mapped out the major research themes, collaborative networks, or the evolving directions in this field. This narrow focus limits our understanding of the interdisciplinary potential of LLMs and their wider implications for healthcare innovation.
Bibliometric analysis (BA) is a powerful tool that helps to identify relevant research papers in a particular area of interest.12,13 It provides a quantitative summary of author details, scholarly productivity in a given field, and journal quality. Through BA, the cumulative scientific knowledge and evolutionary nuances of a field of interest can be summarized and mapped, enabling scholars to gain a detailed overview, identify research gaps, and acquire new research ideas. 14 In addition by analyzing citation data, collaborative networks, and keyword distributions, core researchers, major research institutions, and key research themes in a given field can be identified. Overall, BA provides researchers with a systematic and quantitative way to examine and understand the current status and trends of an academic field, thus promoting further depth and innovation in scientific research. 13
Therefore, this study aims to review the literature on the application of LLMs in healthcare using BA, to assess the current research status and development trend of LLMs in medical healthcare, and to provide valuable guidance and reference for the future research direction and practice of LLMs.
Methods
Search strategy
A comprehensive search was conducted using PubMed and Web of Science (WoS) for English-language peer-reviewed articles. The search strategy was identified based on relevant published systematic reviews. 15 The search strategy was as follows: (Large Language Model OR LLM OR Chatgpt OR GPT) AND (Health Care Surveys OR Delivery of Health Care OR Health Services OR Health Care Surveys OR Delivery of Health Care OR Health Services OR Healthcare OR Medical Care), and all articles from 2017 to July 30, 2024, were included. Then, we traced reference lists of the included literature to supplement our findings. In cases where the full text of the literature was unavailable, we contacted the corresponding authors via email to request the original documents.
Inclusion and exclusion criteria
In this article, all journal articles on LLMs in the healthcare domain were included. The articles analyzed were limited to those that (1) were written in English, (2) belonged to the field of healthcare, and (3) dealt with LLMs techniques. Articles were excluded if they were reviews, commentaries, conference reports, etc.
Study selection and data extraction
Two researchers underwent focused training on the inclusion and exclusion criteria as well as the screening methods outlined for the study. Following this, they independently conducted a pilot screening on the same set of 30 articles, classifying each as “included,” “excluded,” or “uncertain.” The researchers then compared their screening results and discussed any discrepancies until a consensus was reached. The two of them then performed a back-to-back initial and full-text screening of titles and abstracts to determine the final inclusion of articles.
Data analysis
We conducted a BA of the included articles, focusing on data such as publication date, authorship, country of origin, and journal. These results were analyzed using descriptive methods, utilizing the Bibliometrix R-package in R Studio (version 4.4.1). Additionally, we employed the Leximancer text analysis software (version 4.5; Leximancer Pty Ltd, Brisbane, Australia) to evaluate the titles and abstracts, generating advanced concept maps based on specific terms and their interconnections. 16 Leximancer employs techniques such as Bayesian statistics to track word occurrences and relate them to the occurrence of other words. 17 As the data was sourced from the PubMed database, citation-related metrics were not included.
Result
Search result
We searched a total of 1118 articles, with 1012 from PubMed and 106 from WoS. After removing 37 duplicate records, we performed an initial screening based on the inclusion criteria, resulting in the exclusion of 626 articles. We then conducted a full-text review of 455 articles, after which we excluded 84 articles. Finally, 371 articles were included in the final analysis, we conducted a thorough review of articles for inclusion. Figure 1 summarizes the search methodology and results.

Search method and results.
Temporal trends
Figure 2 indicates the number of articles and their trends in the number of articles in the healthcare domain for the LLMs. We can see that the number of relevant articles is very few in 2017–2022, while the number of articles is growing from 2022 to 2023, and the growth rate of articles from 2023 to 2024 is more rapid than before.

Annual number of published papers at the intersection of healthcare and large language models.
Geographic trends and collaboration of LLMs
Table 1 shows that most of the top 10 countries in terms of the number of publications are developed countries. The United States (22.64%), China (8.89%), and Germany (5.12%) are the countries with the highest number of articles, and the authors with the highest number of articles are mostly from the United States and China. Six of the top ten institutions are located in the United States, two in China, and one each in Singapore and Saudi Arabia. The Mayo Clinic topped the list of institutions, contributing 47 articles. The remaining institutions were primarily universities, with Harvard Medical School (n = 43), the University of Pennsylvania (n = 40), Vanderbilt University Medical Center (n = 39), and the University of Southern California (n = 37) ranking second to fifth, respectively. In terms of international collaboration, North America, Europe, and Asia form the core regions of cooperation, with the United States maintaining the most frequent and robust partnerships, particularly with China, Canada, the United Kingdom, and Germany (Figure 3).

Most relevant affiliations and country collaboration map.
The distribution of the bibliographic records by top 10 (by quantity) countries.
Topics and medical specialties
The distribution of bibliographic records across different sources and medical specialties is illustrated in Table 2. Overall, Cureus an open-access, peer-reviewed medical and health sciences journal, published the highest number of articles (n = 31) with an impact factor of 1.0. This was followed by the European Archives of Otorhinolaryngology (n = 13, impact factor = 1.4) and the Journal of the American Medical Informatics Association (n = 10, impact factor = 4.7). The Journal of NPJ Digital Medicine, despite publishing only five articles, had a high impact factor of 12.4. In terms of medical specialties, Otorhinolaryngology was the most represented, with 15 articles. Urology Surgery and Oncology both produced 14 articles each, and Plastic Surgery followed closely with 13 articles.
Distribution of bibliographic records: Top 10 article sources and medical specialties (by quantity).
Table 3a shows that “consultation” (n = 70), “patient education” (n = 69), and “medical education” (n = 53) were the most common applications of LLMs. LLMs are specialized in understanding and generating natural language, which enables them to excel in counseling and education. These areas focus on providing information to help patients understand health issues and make informed decisions, assisting healthcare professionals with clinical diagnosis and treatment, and generating educational materials for both patients and healthcare providers (Table 3). Subsequently, 41, 29, and 23 articles highlighted the role of LLMs in supporting healthcare providers with clinical decision-making, diagnosis, and treatment. Additionally, 15 and 14 articles addressed the use of LLMs for triage and administrative tasks to alleviate provider workload. LLMs were also recognized as a communication tool between doctors and patients (n = 9). Specific examples of articles are at Supplemental Table S1.
Distribution of bibliographic records: Top 10 categories (by quantity) on the application of large language models (LLMs) in healthcare.
A total of 175 articles addressed specific disease types, which we categorized into 14 distinct categories. The largest number of articles focused on Surgical and Orthopedic Disorders (n = 31), Oncology and Cancer Diseases (n = 28), and Neurological and Mental Health Disorders (n = 19), followed by Chronic and Metabolic Diseases, Endocrine and Genitourinary Disorders, and ENT and Ophthalmology Disorders (Figure 4(b)). Among the most common applications of LLMs across various disease types were patient education and counseling, with 42 and 41 articles, respectively. Neurological and psychiatric disorders predominantly utilized LLMs for disease screening and early detection, Surgical and Orthopedic Disorders emphasized patient counseling, while Acute and Traumatic Disorders highlighted the need for LLMs in rapid clinical decision-making.

The distribution of large language models (LLMs) in healthcare and their application to different disease types.
Leximancer text analysis
In Leximancer text analysis, colored circles indicate the space of a topic, and in the center is the label for that topic. 16 The black words are concepts, the size of the gray dot to which the word belongs indicates the relative frequency of occurrence, and the lines in the middle of the concepts are links that tell us which concepts are semantically related. The relative positions of the colored circles reflect the degree of association between them.
In this cluster, the most significant topics are “ChatGPT,” “intelligence,” and “performance,” followed by “questions.” The primary type of scholarly publication associated with this cluster is the “journal article.” The “intelligence” cluster, shown in yellow, encompasses topics such as AI, LLMs, and research, indicating a focus on the foundational technology used in healthcare and clinical applications. This cluster suggests a burgeoning interest in understanding how AI and LLMs contribute to medical intelligence.
According to Figure 5, ChatGPT is one of the most frequently mentioned LLMs. Our review also shows that researchers mostly used pre-trained models, with only two articles training their networks. The “Chatgpt” cluster, highlighted in red, includes key terms like clinical, patient, quality, and information. This denotes the centrality of ChatGPT in practical healthcare and clinical settings, underscoring its potential to influence patient care quality and information dissemination. A closer inspection of this cluster reveals its substantial interaction with both the “intelligence” and “questions” clusters, emphasizing the importance of evaluating the effectiveness of responses generated by ChatGPT in a clinical context to ensure reliable patient care (Figure 5).

Thematic map of all titles and abstracts.
Discussion
The primary objective of this study was to provide a comprehensive overview of the application of LLMs in the healthcare domain through a BA approach. By examining 371 peer-reviewed English-language articles, we aimed to map the trajectory of LLMs research in healthcare, identify significant trends, and explore the various applications and implications of these technologies. Our findings highlight a rapid and accelerating growth in their application. However, there is a notable geographic imbalance, with most studies concentrated in developed countries and fewer in low- and middle-income regions. Furthermore, while LLMs are being applied across various disciplines in healthcare to help disease diagnosis and health education, concerns about their practical implementation persist. Addressing these existing challenges is critical for advancing their effective use in healthcare.
LLMs article growth was rapid, but regional development remained uneven
One of the most striking findings from our analysis was the exponential rise in publications related to LLMs in healthcare, particularly from 2022 to 2024. LLMs have gained widespread interest and application, experiencing rapid and accelerating growth globally. 18 This sharp increase can be attributed to several factors. Firstly, The introduction of the Transformer architecture in 2017 overcame key limitations of earlier sequence models by improving parallel computation and handling long-distance dependencies. 1 This breakthrough has significantly enhanced the capabilities of LLMs, making them more applicable and valuable in healthcare settings.19,20 Moreover, the growing interest in AI and its potential to revolutionize medical research and clinical practice has driven both academic and industry stakeholders to invest heavily in this area. ChatGPT has emerged as one of the most widely used LLMs, fueling a growing body of academic research exploring its applications in disease diagnosis, 21 medical education, and clinical decision support. 22 As a result, the volume of publications related to LLMs in healthcare is expected to continue its steady growth in the coming years.
However, the geographical distribution of studies revealed a clear imbalance. According to the WHO's Global Observatory on Health Research and Development, significant disparities in research capacity exist across regions. 23 This gap was also demonstrated in our study, which showed that publications related to LLMs in healthcare were dispersed across North America, Europe, Asia, and Oceania. Among the top 10 countries in terms of publication output, most were developed nations, including the United States, Germany, Italy, Australia, Canada, and Israel. These countries typically invest substantial resources in science, technology, and medical research, benefiting from advanced facilities and environments conducive to innovation. This disparity in health research investment may further widen the gap in research capacity, particularly in the development and application of cutting-edge technologies. 24 A report indicated that the weighted average health Gross Domestic R&D Expenditure on Health (health GERD) as a percentage of GDP was significantly higher in high-income countries (0.27%) compared to other income groups. Moreover, high-income countries exhibited the greatest increase in weighted average health GERD as a percentage of GDP (from 0.25% to 0.27%) compared to previous analyses.
Applications and challenges of LLMs in healthcare
LLMs have demonstrated a wide range of applications across various medical specialties, reflecting their versatility and potential to address diverse healthcare needs. Our analysis identified several key areas in which the LLMs were utilized based on the target users, including counseling, patient and medical education, diagnosis, treatment planning, administrative reporting, and enhancing physician–patient communication. Firstly, for healthcare providers, LLMs offer professionals access to vast amounts of information that can assist with diagnosis and treatment, 22 which is especially crucial in emergency care, where extensive knowledge and rapid information processing are required. Secondly, they helped transform complex medical jargon into language that both patients and physicians can easily understand, improving communication, strengthening the patient–physician relationship, and facilitating better therapeutic outcomes.25,26 The last is for medical education, LLMs served as tools for personalized teaching and learning, offering interactive learning modes that help clarify research questions and promote deeper understanding. For patients, LLMs generated patient education materials and simplified medical reports, improving readability and comprehension. 27 This was especially beneficial for patients with socially stigmatized diseases, where interactions with LLMs can be less time-consuming and stressful, often providing more empathetic responses. 28 Finally, for healthcare institutions, LLMs simplify the management process and assist in patient stratification, offering valuable support in managing healthcare operations more efficiently. For specialties, such as Otorhinolaryngology, Urology, Surgery, and Oncology have emerged as prominent fields of application for LLMs. For instance, in Otorhinolaryngology, LLMs are being used to enhance the accuracy of diagnostic imaging and improve the management of complex cases. In Oncology, LLMs are being leveraged to identify potential biomarkers, predict patient outcomes, and tailor personalized treatment plans.
However, it is important to note that most current applications of LLMs have remained in the validation stage and have not yet been widely implemented in real-world healthcare settings. Several key concerns contributed to this cautious approach: (1) Inconsistent performance: Research has revealed variability in the accuracy and consistency of LLMs outputs. For instance, Christoph's study compared ChatGPT's performance against certified otolaryngology consultants in answering clinical case questions. The findings indicated that the ORL consultants significantly outperformed ChatGPT in terms of medical adequacy, conciseness, comprehensibility, and coherence. 29 Conversely, Ahmed's study on ChatGPT's accuracy and repeatability in answering common hypertension questions showed that ChatGPT achieved an overall accuracy of 92.5%, with 93% of its answers being consistent upon repetition. 30 These mixed results highlight the challenges of relying on LLMs in critical medical contexts. (2) Issues with information sources and hallucinations: LLMs are trained on extensive, diverse datasets, including content from the internet and other unverified sources. This lack of transparency in data sourcing can lead to the generation of “hallucinations,” where the model produces irrelevant or inaccurate information, which is particularly problematic in healthcare settings.31,32 (3) Bias in training data: The training datasets for LLMs predominantly originate from high-income, English-speaking regions, which introduces a significant bias. As a result, the outputs of LLMs may reflect and perpetuate these biases, especially concerning race, gender, and socioeconomic disparities.33,34 This raises ethical concerns and questions about the fairness and representativeness of LLM applications in global healthcare.
To overcome these challenges, a key focus should be improving the standardization and accuracy of LLMs, with a strong emphasis on transparency in their training data to ensure that it is sourced from diverse, authoritative, and evidence-based datasets. Additionally, establishing a comprehensive regulatory framework is essential to clearly define the responsibilities of all stakeholders, including users and developers. 35 Continuous improvement through fine-tuning and iterative updates will be crucial in enhancing the accuracy of LLMs and ensuring they can address the needs of various populations and regions. Moreover, developing specialized LLMs tailored to specific healthcare domains could become a pivotal trend, 36 enabling more precise and reliable applications in medical practice. AI companies such as OpenAI, Google, Anthropic, Mistral, and Cohere are actively developing smaller language models (SLMs) that cater to specific tasks and customer needs. These SLMs could offer a more targeted and practical solution, addressing the limitations of broader LLMs in healthcare. 37
Limitations
BA is a scientific method that researchers can use to gain insights into key areas of medical research and obtain an overview of the published literature landscape. 13 However, our study has some limitations. Firstly, our review was confined to journal articles indexed in PubMed and WoS and did not include citation data. Secondly, our information retrieval was limited to articles with search terms appearing in the title or abstract, and non-English articles or documents other than journal articles were not assessed. These factors may have led to the exclusion of relevant studies. Future research could benefit from a more comprehensive approach that includes a broader range of documents and languages. Additionally, future studies could focus on the individual clusters we examined, such as specific diseases or target users, to conduct more in-depth analyses. For instance, examining the consistency and accuracy of LLMs in responding to healthcare-related questions would be valuable. Such targeted analyses could help synthesize findings and establish best practices for the application of LLMs in healthcare.
Conclusion
In conclusion, the significant potential of LLMs in healthcare applications has garnered widespread attention, with researchers exploring their use across various clinical scenarios. However, LLMs still present risks and ethical challenges. It is imperative to continue exploring innovative applications of LLMs while remaining vigilant about the ethical implications and potential risks associated with their use. This balanced approach will be essential for harnessing the full potential of LLMs to revolutionize healthcare and improve patient outcomes.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076251324444 - Supplemental material for Application of large language models in healthcare: A bibliometric analysis
Supplemental material, sj-docx-1-dhj-10.1177_20552076251324444 for Application of large language models in healthcare: A bibliometric analysis by Lanping Zhang, Qing Zhao, Dandan Zhang, Meijuan Song, Yu Zhang and Xiufen Wang in DIGITAL HEALTH
Footnotes
Acknowledgements
The authors would like to extend gratitude to the collaborators, for their invaluable contributions and unwavering support throughout the course of this article.
Contributors
LPZ and QZ contributed equally to this paper. They collaborated on data analysis and manuscript drafting. DDZ, MJS and YZ are responsible for the literature review. XFW conceived the study. All authors were responsible for the interpretation of the data, and revised, and gave final approval of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This study does not involve the collection of any personal information or data. Therefore, there are no ethical issues or controversies in this study.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Shenzhen High-level Hospital, Shenzhen Third People’s Hospital In-hospital Spontaneous Project (grant number IIT-2024-083).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
