Abstract
Generative artificial intelligence (AI) comprises a class of AI models that generate synthetic outputs based on learning acquired from a dataset that trained the model. This means that they can create entirely new outputs that resemble real-world data despite not being explicitly instructed to do so during training. Regarding technological capabilities, computing power, and data availability, generative AI has given rise to more advanced and versatile models including diffusion and large language models that hold promise in healthcare. In musculoskeletal healthcare, generative AI applications may involve the enhancement of images, generation of audio and video, automation of clinical documentation and administrative tasks, use of surgical planning aids, augmentation of treatment decisions, and personalization of patient communication. Limitations of the use of generative AI in healthcare include hallucinations, model bias, ethical considerations during clinical use, knowledge gaps, and lack of transparency. This review introduces critical concepts of generative AI, presents clinical applications relevant to musculoskeletal healthcare that are in development, and highlights limitations preventing deployment in clinical settings.
Introduction
Artificial intelligence (AI) comprises computer algorithms that mimic human behavior requiring intelligence, such as advanced decision-making, language processing, and image identification [10,45]. The rapid progress in AI has been driven by technological advancements in computing power and model architectures, as well as by considerable financial investments from public and private stakeholders [5]. Many investments have focused on generative AI models that produce synthetic outputs representing patterns learned during training [36]. Generative AI, well known as the foundation of interactive chatbots such as ChatGPT [26], has been called transformative due to its versatility and broad capabilities in tasks relevant to multiple sectors such as finance and healthcare.
Musculoskeletal healthcare, including orthopedics and rheumatology, is poised to benefit from advances in generative AI as treatment decisions rely on large amounts of unstructured data and visual information such as radiographs or advanced imaging modalities [27] and interventions may lead to rapid changes in patient’s health. Given that human lives are directly affected, innovations in AI should be embraced, yet meticulously challenged [33]. In a field where treatment is grounded in regulatory approvals, randomized trials, and vetted clinical guidelines, healthcare providers may find it difficult to adapt to this rapidly changing technology. Yet because generative AI has potential benefits for musculoskeletal healthcare, providers should become familiar with this technology and related digital healthcare solutions [16].
Musculoskeletal providers’ opinions are divided on the value and necessity of AI solutions [17]. However, with the rapid evolution of generative AI and the development of unprecedented use cases, its potential has become more evident [27]. Applied toward musculoskeletal healthcare tasks, generative AI may unlock unprecedented efficiencies in clinical settings and enhance patient care practices, including surgical interventions. This review presents the central concepts of generative AI, introduces current and developing applications relevant to musculoskeletal care, and discusses critical limitations surrounding the clinical use of generative AI.
What Is Generative AI?
In defining generative AI, it is useful to first differentiate it from discriminative (non-generative) AI. In machine learning (ML; subset of AI that enables computers to learn from data and improve performance for predicting an outcome) and deep learning (DL; specialized branch of ML that uses neural networks to model complex datasets), discriminative AI is defined as technology that predicts or classifies an outcome based on its training with data that has known or labeled outcomes [10,44]. For example, in an institutional registry of patients who underwent rotator cuff repair—which contains demographic and perioperative variables—rotator cuff re-tears may be routinely recorded. If the goal of an investigation was to apply ML algorithms to this dataset to predict the probability of re-tear based on a patient’s risk profile, this would be a discriminative task because the outcome is already known. Discriminative algorithms can further be described as
Unlike discriminative models,
Contemporary Generative AI Models
At the time of this review, large language models (LLMs) and diffusion models for computer vision tasks are the predominant models of generative AI being used [21,29]. This section defines them and introduces how they can be applied to clinical tasks in musculoskeletal health.
Diffusion Models
Diffusion models are a family of generative AI models also known as “reconstruction algorithms,” as they leverage learning based on the statistical probability distributions of images acquired in training to generate new visual data [22]. These models use neural networks called “U-nets” that specialize in capturing complex spatial details through encoding processes [57]. Diffusion models have become the predominant deep generative model for generating new visual data, creating higher fidelity outputs than other families of these models. Diffusion models function through an iterative process involving 2 competing arms: a random forward process and a generative backward process. In the random forward process, diffusion models create “noise”—images that become increasingly dissimilar from those used in training [57]. For example, a diffusion model created to generate synthetic shoulder radiographs would utilize millions of authentic shoulder radiographs during its training phase; in the random forward process, diffusion model outputs deconstruct these images until, eventually, they resemble static noise without any semblance of an image. In the generative backward process, these images are then “denoised” step-by-step until high-quality images resembling realistic shoulder radiographs are generated from pure randomness. Another layer of complexity includes the distinction between conditioned and unconditioned diffusion models. In a conditioned diffusion model, inputs from a user can help guide the model output, whereas unconditioned models generate imaging solely based on data patterns [57]. Diffusion models can also be applied to generating synthetic video sequences.
Large Language Models
Contemporary LLMs include those that specialize in generating text outputs that resemble human speech, such as Claude (Anthropic), Gemini (Google), Mistral (MistralAI), and Chat generative pre-trained transformer (ChatGPT; OpenAI) [39]. LLMs are either
At the core of LLMs, transformer architectures are used through a process called
Although thought of as text input-text output models (such as a chatbot), many contemporary LLMs are now considered multimodal or foundation models [1,3]. In other words, capabilities have expanded to function with inputs beyond text including images, audio, and videos. Outputs are likewise multimodal. Some multimodal models incorporate components of diffusion models to enhance the performance of generating visual outputs, while text-image models may use transformer architectures to allow for conditioned diffusion by allowing users to encode text prompts [4,56]. Training such models is resource-intensive and requires vast amounts of data, but the reward is enabling efficient and versatile model performance for potential deployment in clinical settings for new healthcare use cases.
Modifying Generative Models May Unlock New Healthcare Use Cases
To address concerns about suboptimal performance and training data bias in LLMs, several modifications have been investigated including (1) fine-tuning, (2) prompt engineering, (3) retrieval augmented generation (RAG), and (4) multi-agent frameworks [27]. A potential problem with incorporating proprietary LLMs into healthcare is that the output is generated from the unregulated domain of the Internet and thus may be outdated or incorrect. Confidence in the validity of responses and transparency in the retrieval data can be increased through modifications and allow for novel use cases of LLMs.
Multi-agent frameworks, also known as agentic augmentation, can be thought of as a simultaneous collaboration among multiple LLMs, each with a specialized function [8,9]. By implementing a modular approach to tasks, multi-agent frameworks can simplify seemingly complex and demanding workflows, allowing for more efficient function. In these frameworks, each LLM has a responsibility based on its strengths. For example, a framework may include an LLM that can break down a large amount of complex input data into smaller samples, recruit multiple LLMs to interpret sections of an input, evaluate a proposed response for appropriateness, and even incorporate human input into the process. Some multi-agent frameworks allow access to Internet resources in real-time to coordinate various processing steps within the workflow and incorporate contemporary information or data points. A surgical multi-agent framework could in theory include LLMs dedicated to tasks including assisting with anesthesia and monitoring patient vital signs (with alerts from deviations from normal limits), monitoring real-time implant and tray availability, and tracking surgical procedure duration and turnover times simultaneously.
Applications of Generative AI in Musculoskeletal Health
Most generative AI models have not been fully vetted through the regulatory frameworks required for responsible and safe deployment in a clinical setting due to challenges surrounding the complexity of AI systems, the rapidly evolving and dynamic nature of technology and products, medicolegal considerations of patient privacy and data security, and challenges in harmonization across international borders [35]. However, several proprietary models have been explored at smaller scales for feasibility and efficacy after obtaining preliminary consent and device clearances. The primary use cases applicable to musculoskeletal clinical contexts include automating clinical intake and scheduling, generating documentation, processing billing and prior authorizations, and monitoring patients [54]. Others include communications, diagnostic and prognostic treatment insights, and surgical training and planning.
Administrative Tasks and Medical Document Generation
Medical document generation, clinical intake, and scheduling are notoriously time- and work-intensive processes [19]. A broad outline of a typical musculoskeletal clinical workflow makes clear the areas in which generative AI can be applied: patient scheduling, completion of clinical intake prior to or upon arrival, acquisition of potentially relevant imaging of an affected joint or extremity, clinical evaluation by specialist integrating patient history and imaging, clinical documentation with proposed treatment plan, and post-encounter communication. An indicated treatment often requires prior authorization with the payer, with several “pain points” that demand a cognitive or physical burden; extensive effort is required for clinical chart review, analysis of patient imaging, documenting a medical note for the chart and billing purposes, and responding to messages or calls from patients [19,52]. These critically important yet time-consuming processes place an even greater demand on providers already experiencing time constraints due to high patient volumes and workload demands.
Several healthcare technology start-ups have been developing and deploying generative AI solutions for such challenges. For example, Veradigm and Thoughtful AI are both leveraging generative models to improve the patient experience, optimize resource utilization, and decrease administrative burden and associated costs through automating patient scheduling. Another advantage to this automated process is its 24-hour availability compared to the defined work hours of humans.
The automation of clinical intake may also enhance the patient experience through decreasing pre-visit wait times while assisting providers by generating medical documents, templates, or treatment plans based on a patient’s medical intake. AllaiHealth, Inc., a healthcare technology start-up, provides an AI-driven intake platform that performs smart screening to triage patients to the correct providers, eliminates the need for pre-charting, automatically generates a history of patient illness summaries, presents differential diagnoses, and provides patient education and potential treatment plans based on possible diagnoses and evidence-based guidelines.
One can imagine a foundation model pipeline in which information from clinical intake and a physical exam are integrated with imaging from a picture archiving and communication system to provide real-time risk prediction and treatment prognosis. ML has been used for real-time probability generation and risk stratification in musculoskeletal research for several years [18], but the rapid evolution of capabilities that are now inherent in foundation models may further improve the opportunity to personalized treatment. Ambient scribes currently can be implemented for medical document generation alone. Integration of ambient scribing systems that utilize AI in real time to record and document office-based conversations as well as translate physical exam findings and pertinent information into existing risk probability models may further expand their utility in patient forecasting [7,53]. Regardless, AI solutions that perform documentation may not only standardize quality and minimize errors but also reduce the cognitive burden on providers.
Patient Communication
Responding to patient inquiries often overflows outside of work hours into a clinician’s “pajama time.” Several investigations have sought to use generative AI to address this burden on providers. Two studies published in
Payer Solutions and Prior Authorization
Prior authorization for treatment consumes significant time and may require frustrating peer-to-peer phone calls. This process has been consistently recognized as a burden for physicians, requiring considerable time for costly and inefficient treatment discussions [48]. Almost one-third of physicians reported prior authorization leading to serious adverse events for patients, while more than 75% of physicians reported that patients may abandon treatment due to prior authorization decisions and wait time [40]. Several healthcare start-ups have attempted to address this ongoing challenge. Cohere and Availity utilize AI to facilitate this process by matching payer utilization management criteria with prerequisites obtained from medical documentation. The use of AI can make the prior authorization process more efficient, transparent, and accessible.
Medical Imaging and Surgical Planning
Computer vision tasks leveraging generative AI also have relevant applications in musculoskeletal healthcare. Diffusion models can enable image-image transformation—for example, converting a T1 MRI sequence into a fluid-sensitive MRI sequence—as well as image conversion by producing 3-dimensional imaging (ie, computed tomography) from 2-dimensional imaging (radiographs) [13]. This may decrease the cost and hazardous exposure associated with some advanced imaging. Researchers have explored utilizing diffusion models to generate synthetic pelvis and hip radiographs to create larger and more robust de-identified image research repositories that avoid issues of patient privacy [24]. Others have targeted surgical planning. For example, Rouzrokh et al [47] created a novel DL inpainting algorithm called THA-NET to demonstrate how a total hip arthroplasty (THA) implant would appear postoperatively using only a preoperative pelvis radiograph as an input. This application also allowed users to change the implant type and examine clinically important metrics, with each image retaining implant-specific features important for planning. Such use cases may enhance preoperative planning, allow for more patient-specific planning, and eventually integrate with intraoperative robotics and advanced surgical platforms.
Limitations of Generative AI Models
Major challenges concerning the use of generative AI in healthcare include the propensity for LLMs to hallucinate, knowledge cutoffs, propagation of bias, lack of transparency in decision-making, and ethical considerations [15]. These limitations must be considered when performing research and development in the realm of generative AI. More importantly for practitioners, these challenges must be addressed to ensure responsible clinical use of generative AI and to establish trust in augmenting clinical practice with AI-based solutions.
Model Output Hallucinations and Response Inaccuracies
Hallucinations are a well-described limitation of generative AI models, particularly LLMs [25,28,50]. The primary cause of a hallucination is a mismatch between the model’s programming (which requires it to respond to an input) and the depth of subject knowledge the model possesses. Unfortunately, hallucinations can be difficult to detect for users who are not experts in a topic. Such users may presume the output is credible—not surprising, perhaps, given that LLMs make use of a highly confident and proficient tone that mimics human language. An example follows:
Although the output may include a list of references that appear authentic in formatting and content, upon exploration the user would find that the studies do not exist. Without a human confirming its credibility, this hallucination could be further propagated and may mislead healthcare providers who interact with patients. Previously discussed modifications of LLMs such as fine-tuning and RAG may help overcome this limitation and mitigate the incidence of hallucinations.
Knowledge Cutoffs
While each LLM generally reflects training on a larger or more contemporary body of information, there is limited ability to incorporate real-time updates as guidelines change or new information accumulates [6]. As a result, each LLM has a knowledge cutoff, a date on which its training data were gathered. Information created after the knowledge cutoff will not be included. Thus, depending on the version, clinical guidelines and medical knowledge may be outdated or information may be incomplete. This poses the risk of harm and medicolegal liability in the treatment of patients. Furthermore, when providing outputs, LLMs often fail to disclose uncertainty or gaps in information; this may further propagate incorrect knowledge. Static training data also contribute to hallucinations; for example, a query might be made on a musculoskeletal topic that is more recent or nuanced than the data available when the model was trained. This is akin to a user performing a query on robotic THA with an LLM developed prior to this technology, and the LLM subsequently providing a seemingly plausible output on this topic without any true knowledge.
Propagation of Bias
As is well known, disparities exist in medical data, and if these data are used to train LLMs it may give rise to biases that affect patient health [32]. Omiye et al [43] demonstrated that outputs of all 4 of the leading LLMs showed examples of perpetuating race-based medicine. As generative AI produces novel output by mimicking training data, disparities and biases that are present in an original dataset would theoretically be propagated in model outputs. For example, bias in the form of restricting training data from a specific geographic region or patient demographic would make the model less generalizable. This is especially concerning for large generalized LLMs that are trained on vast amounts of unregulated Internet resources. In addition, human biases may influence methods of model design, evaluation, and performance. In the realm of computer vision, the propagation of bias is also a concern in diffusion models, as synthetic images may differ in meaningful ways if influenced by inherent biases in training data [23]. Generative models must therefore be rigorously evaluated prior to deployment in clinical settings, and those responsible for model development must proactively train models with inclusive and diverse datasets.
Transparency and Ethical Considerations
The inherent complexity of generative model architectures, neural networks, and the processes through which a model makes predictions based on its training data remains an area with poor transparency; this lack of transparency is often referred to as a “black box” [14]. It creates uncertainty and doubt for providers. Trustworthiness and accountability are essential components of healthcare and providers using AI solutions, yet they must take a leap of faith that the outputs are grounded in evidenced-based reasoning. Therefore, methods for increasing model transparency and leveraging insight into model decision-making is an important and ongoing area of research.
Important biomedical and ethical considerations are related to lack of transparency. Without clarity on how an AI model makes decisions, using the model in clinical settings will continue to present an ethical dilemma. For example, consider a provider who utilizes an AI-assisted tool in patient care without first validating its accuracy or fully understanding the decision-making process of the model. If the outcome is suboptimal, there is a question of where medicolegal fault is assigned. Is the responsibility of this poor outcome attributed solely to the provider opting to use an AI-assisted tool, the healthcare system that employs the provider and subscribes to the tool, or is it shared between both? Does the company that provides the AI tool bear any accountability, especially if the model was trained using datasets that may possess bias or disparities? This complex medicolegal circumstance requires a comprehensive and thoughtful regulatory framework and may deter providers and healthcare systems from adopting such technology [20,38]. The rapidly evolving nature of both AI technology and the tools already created introduces further challenges for regulatory bodies to monitor the safety and efficacy of generative AI solutions [12]. Significant effort and collaboration between AI companies, AI users such as providers and healthcare systems, and political entities and regulatory bodies will be necessary to ensure the responsible use of AI solutions in healthcare; all parties will need to adapt to a new environment in which the use of AI solutions is routine [20].
Conclusion
Generative AI encompasses ML and DL models developed on unlabeled training sets that create novel outputs based on acquired knowledge. Contemporary generative AI predominantly involves the use of LLMs and diffusion models. The breadth of relevant healthcare applications and use cases for generative AI solutions continues to expand as considerable investments in this technology are made by financial institutions. Indeed, generative AI has given rise to the vision of a digital ecosystem involving all aspects of an episode of care, from scheduling a patient visit and clinical decision-making to postoperative patient communication and surveillance. The ultimate integration of generative AI into healthcare is complex and will require caution due to the current lack of standardized regulations and a comprehensive bioethical framework that considers the risks of bias and patient harm.
Supplemental Material
sj-pdf-1-hss-10.1177_15563316251335334 – Supplemental material for Generative Artificial Intelligence and Musculoskeletal Health Care
Supplemental material, sj-pdf-1-hss-10.1177_15563316251335334 for Generative Artificial Intelligence and Musculoskeletal Health Care by Kyle N. Kunze in HSS Journal®
Footnotes
CME Credit
Declaration of Conflicting Interests
The author declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Kyle N. Kunze, MD, reports a relationship with AllaiHealth, Inc.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Human/Animal Rights
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration.
Informed Consent
Informed consent was not required for this review article.
Required Author Forms
Disclosure forms provided by the author are available with the online version of this article as supplemental material.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
