Abstract
The advent of generative artificial intelligence (GenAI) has introduced both remarkable opportunities and significant challenges in the field of pharmacovigilance (PV). This perspective review reflects on emerging trends, practical use cases, and conceptual frameworks shaping the integration of GenAI in high-risk domains such as drug and vaccine safety monitoring. We draw on current experiments and early real-world applications to examine the potential benefits, inherent risks, and propose a framework for integrating GenAI into PV systems, emphasizing the necessity of rigorous testing, human oversight, and ethical considerations. Our goal is to support PV professionals and stakeholders in navigating this rapidly evolving landscape by identifying promising strategies and implementation pathways.
Plain language summary
Perspective Review: Orchestrating generative AI in pharmacovigilance—predicting and preempting the unpredictable
This perspective review explores how generative artificial intelligence (GenAI) may change the way we monitor the safety of medicines and vaccines—a process known as pharmacovigilance (PV). The paper outlines both the opportunities and the challenges of using GenAI in this high risk healthcare application.
• PV is the process of identifying, assessing, and preventing side effects from medicines once they are in use by patients.
• GenAI tools can help PV by analyzing large amounts of safety data, summarizing reports, and generating draft content that supports faster decision-making or enhancing process efficiencies.
• These tools also come with challenges, such as generating inaccurate information (“hallucinations”), missing key details, or producing results that are hard to explain or verify.
• To safely use GenAI in PV, organizations must design experiments that test how GenAI performs, establish clear safeguards, and appropriately implement and monitor elements as part of an overall risk-based PV system.
• With careful planning, GenAI could improve how quickly safety concerns are detected and addressed, ultimately helping protect patients and making drug monitoring systems more efficient.
Introduction
Recent advances in artificial intelligence (AI) technologies are dramatically revolutionizing processes and tasks across all disciplines, including medicine and healthcare, 1 particularly in pharmacovigilance (PV). 2 As the field continues to explore these innovations, there is a growing need for reflective perspectives that integrate technical, ethical, and practical viewpoints. The ability to effectively collect, manage, and analyze data, and act appropriately on outputs is at the heart of PV. These activities aim to both better understand the known or hypothesized effects of medicines and vaccines and to identify the entirely unexpected. The promise of AI across all aspects of the PV lifecycle is therefore enormous. While these technological strides present us with undeniable opportunities, they also confront us with challenges that are not fully understood. AI, much like any intricate algorithmic system, exhibits imperfect performance. The advent of generative AI (GenAI)—which produces novel outputs based on a set of inputs—introduces an additional layer of complexity to the risk–benefit (Risk–benefit means weighing the positive effects of a medicine against its potential risks. This includes considering any issues related to the medicine’s quality, safety, or effectiveness that might affect patients’ health or public health 3 ) assessments that inform AI deployment decision-making.
When evaluating the intended purposes of a given AI application, risk needs to be considered. Several bodies have proposed risk categorizations. While the National Institute of Standards and Technology (NIST) defines three risk categories—low, moderate, and high—that are applicable to GenAI,4,5 and the strategies we need to employ in an overall high-risk application may differ significantly from those in low-risk applications. 6 Both NIST 5 and the European Union (EU) AI Act 7 define tiered categories of risk ranging from very low to very high, or from acceptable to unacceptable risk. Unacceptable risk within the EU AI Act refers to systems involving factors unsuitable for any business purpose, such as those that may result in manipulative or exploitative practices. High risk, however, takes into consideration the impact on individuals’ health and safety, in addition to organizational risk, and many areas of PV inherently fall into this high-risk category. Importantly, there are definitional differences of these concepts that exist and warrant harmonization.
Implementation, deployment, and ultimately routine usage of AI systems necessitate continuous assessment and management of their performance relative to the specific use cases to which they are applied. The propensity for risks associated with AI—such as errors of all types, including hallucinations, biases, and omissions ranging from subtle to overt—is a concern that impacts its trusted use in many situations. Moreover, there are practical challenges in optimally operationalizing AI as part of complex processes, including human-in-the-loop interactions within human–AI interfaces.
AI use cases must be intimately linked to their intended purposes, and the impact of imperfect performance can vary, potentially leading to undesirable consequences. Thus, when making trade-off decisions about implementing AI-supported processes, we must have the ability to preemptively develop plans to detect, assess, understand, and mitigate these issues to the extent possible, weighing them against the potential benefits offered by AIs. The level of testing, scrutiny, and mitigation strategies required, as well as the acceptability of using a given AI solution, are intrinsically linked to the perceived level of risk associated with its application within its intended purpose.
In this perspective, we aim to contextualize the rapidly evolving use of GenAI in PV by examining both emerging real-world use cases and experimental deployments, while also drawing parallels with other uses of AI or machine learning (ML) in PV. We reflect on both the promise and the pitfalls of these technologies and the broader implications for safety-critical domains.
That said, while the new challenges presented by these AI technologies are undeniable, it would be a mistake to assume that routine AI cannot play a major role in PV. Indeed, the history of AI and ML testing of applications in PV spans many decades. 8 Work has ranged from routine execution of disproportionality analyses using association rule analysis with Bayesian shrinkage (with the extent of the shrinkage learnt from pre-existing data) for quantitative signal detection, 9 to duplicate report identification, 10 screening of free text for adverse event (AE) information or other PV-relevant data, 11 anomaly identification in classifications, 12 and case report seriousness classification. 13
More recently, the U.S. Food and Drug Administration (FDA) developed and evaluated SPINEL (Supporting Pharmacovigilance by Leveraging Artificial Intelligence Methods to Analyze Electronic Health Records Data), an AI-enabled software prototype that extracts opioid-related adverse drug events (ADEs) from electronic health records (EHRs) using keywords and trigger phrase analysis. 14 SPINEL demonstrated high accuracy in detecting known opioid-related ADEs and received positive usability feedback from FDA participants, showcasing the potential for domain-specific AI systems in routine safety surveillance tasks.
Similarly, the Uppsala Monitoring Centre deployed vigiRank as a routine signal detection method. 15 VigiRank used ML in the shape of shrinkage logistic regression to identify predictive variables of case series of known emerging safety signals. Having estimated the respective weighted contributions of different variables for signal prediction, this weighted score is now used prospectively to predict future case series based on completeness and clinical relevance for prioritized clinical review. 16
Performance testing of AI or ML has always been challenging in PV, particularly given the need to define and develop gold standards for testing—and their potential suboptimal use 17 —noisy, sparse, non-random training data 8 perpetuated by changes in data access and evidence generation approaches within PV. 18 Nevertheless, older AI/ML technologies tended to have certain attributes that facilitated their evaluation and use in PV, allowing for extensive testing and application.19–21
Historically, the use of ML exhibited certain attributes: some degree of determinism and understanding/explainability of how an ML algorithm arrived at a specific solution, clarity on training (and test) data used to run the model, and fixed (often binary or discrete) outputs. 8 However, in the era of GenAI, some or all of these attributes have changed, making evaluation more difficult. Figure 1 illustrates this contrast. Traditional AI systems operate in more predictable, controlled environments, whereas GenAI introduces broader creativity and adaptability—along with variability and uncertainty. This conceptual shift has significant implications for safety-critical domains like PV, where oversight and reproducibility have long been considered essential.

Conceptual comparison between traditional AI/ML and GenAI. Traditional systems are designed for predictability, with known training datasets and explainable outputs. GenAI, while more flexible and creative, produces less predictable results and requires new forms of oversight to ensure safe use in regulated domains like PV.
Several critical questions arise: How much testing is sufficient? How many repetitions of the same question are needed? How should inputs be varied? What constitutes conceptual similarity in an output? Could the widespread usage of GenAI across multiple problem tasks lead to other new or greater magnitude problems in PV? How to address issues like patient privacy violations that could arise when using GenAI? How to tackle any lack of clarity regarding what content/output is AI-generated and how its origin can be tracked? Is missing information as problematic as misleading information for output generation? How to conduct ongoing performance monitoring?
Given the rapid progress of GenAI, the numerous unanswered questions, and its impact on routine PV activities, it is crucial to emphasize the significance of designing and conducting proper experimentation that enables learning from these experiences. The ease of use and the alluring, rapid outputs that are generated by GenAI could lead to overlooking the further need for experimentation. 22 As we reflect on the possible uses of this technology, we must consider various scenarios in which GenAI could impact PV, weighing their potential outcomes and implications.
For example, the lack of explainability may be particularly concerning and challenging for AI in PV in general. Explainability is traditionally considered essential to ensure compliance with regulatory standards, to build trust among stakeholders, and to enable the integration of AI in PV. The integration of explainable AI (XAI) techniques offers a promising solution to enhance transparency and trust in AI-driven decision-making processes. XAI can provide insights into how AI models derive their outputs, thereby making the decision-making process more interpretable for stakeholders. 23 Furthermore, multi-agent-based frameworks (i.e., systems involving many AI applications developed and trained for specific PV roles and associated tasks) offer the potential for GenAI to “check itself” for validity and reduce the occurrence of all types of errors. 24
Explainability of GenAI outputs can potentially be enhanced, to an extent, by combining AI with rules-based systems, thereby providing a clear, logical framework that complements the unpredictable capabilities of AI and making the decision-making process more transparent and understandable.6,23 However, if AI and/or rules-based systems are used in GenAI explainability, given the unavoidable dimensional reduction, one must guard against obfuscating the reasons for an output and giving confidence to an erroneous output, risking wrong actions as part of a process.
A core aspect of PV is the ability to identify rare potential “black swan events,” 25 that is, safety signals that may be rare and unexpected but have the potential to impact the benefit–risk profile of a medicine in routine use, either in general or for specific populations. Routine PV needs to adopt a lifecycle risk-based approach focusing on quality management systems (QMSs), being able not only to identify and effectively analyze commonly received PV data but also rare and idiosyncratic ADEs. 26 It must provide confidence to users and other internal and external stakeholders regarding the effectiveness of the system in protecting patient safety. 6
Experimentation with GenAI technologies, when conducted properly, can provide performance metrics and evidence for decision-making on whether and how to implement these tools into existing processes. However, deploying GenAI into production within the context of PV systems poses notable challenges, including validation and minimizing risks of problematic AI outputs when assessed in a risk-based manner. Due to the critical nature of certain aspects of PV, especially those directly impacting patient safety, there is a very low tolerance for errors. However, it is recognized that other processes within PV, such as literature search and synthesis, may also have more tolerance for certain types of irregularities.
To date, there is limited evidence of complex GenAI applications being put into production for routine use in PV. Furthermore, the use of AI/ML has often been more one-off research of specific use cases and has not accounted for the dynamic and interactive nature of the PV ecosystem, where various AI agents would work effectively with each other and humans as part of well-defined processes.
The aim of this perspective review is to explore how GenAI may be orchestrated within PV to anticipate and manage its inherent unpredictability. We describe the challenges and opportunities for widespread use of GenAI across PV, as illustrated by several experiments from across a range of PV applications. We show how GenAI could inform a future research agenda for enabling widespread routine uses in PV and other areas with potentially high-risk applications.
The phrase “predicting and preempting the unpredictable” is used here to express the need to enable learning from common or well-known aspects of PV for faster and more effective handling of safety data. This phrase also refers to the aspirational goal of GenAI in surfacing early or subtle outliers, where the novelty of an issue might make learning from prior data more challenging. We use it to refer to the model’s potential to identify these subtle but high-impact patterns that might otherwise be overlooked, not to imply deterministic foresight. This framing aligns with the emerging work on uncertainty-aware reasoning and probabilistic signal detection.
Although this perspective focuses on potential applications of GenAI in PV, we recognize that the real-world environment in which such systems would operate is highly complex. PV systems must navigate regulatory constraints, jurisdictional differences in safety reporting, and the need for compliance with international frameworks such as ICH E2E and EMA Good Pharmacovigilance Practices (GVP).3,27 AE reports, even for the same medicine-outcome pair, remain enormously heterogeneous in context, often coming from very different healthcare settings, can be submitted in multiple languages, and can contain noisy, unstructured narratives, complicating automated interpretation. 28 Clinical trial data add further heterogeneity due to variations in study design, data formatting, and population characteristics. When considering the integration of GenAI into PV practice, these challenges underscore the need for careful system design, multilingual model capabilities, and rigorous validation.
Insights from GenAI experiments in PV
Experiments involving the deployment of GenAI in PV have highlighted both potential benefits and significant challenges. One experiment assessed the capability of Chatbot Generative Pre-trained Transformer (ChatGPT)-3.5 to extract signs and symptoms from medical literature and convert them into Medical Dictionary for Regulatory Activities (MedDRA) preferred term codes, comparing these outputs against a human-coded gold standard. The results indicated that ChatGPT-3.5 outperformed other algorithms with a 78% predictive accuracy and a kappa value of 1 across 10 iterations, suggesting its promise in PV applications. 29 However, defining acceptable thresholds for performance remains critical, as these thresholds must consider the purpose of the application, the severity of potential errors, and the predictability of failures, all of which directly impact mitigation strategies.
The generalizability of the performance needs to be considered at the time of study, but also over time. For example, would the model’s performance remain stable or degrade with continuous use? Effective performance monitoring strategies must be devised to identify both abrupt and gradual declines. Moreover, it is essential to determine mitigation measures to be taken in case of performance deterioration, and to establish frameworks for optimal human-in-the-loop interactions. This requires exploring whether model failures are predictable and if they consistently occur in specific contexts that could be flagged for human review. The seriousness and systematic nature of these failures would influence their acceptability and the feasibility of deploying algorithmic or human-driven corrective actions. 30
Research underscores the importance of adequate prompt engineering for large language model (LLM) inputs to maximize the quality of LLM outputs, although prompt engineering alone may not fully ameliorate all errors, particularly in real-world use, where suboptimal prompting will occur with some frequency. 26 The potential for biased, incorrect, or non-actionable outputs presents an additional challenge. AI-generated outputs, which may seem technically correct, may lack the contextual depth needed to support decision-making, thereby limiting their practical utility. To address this, an experiment evaluated the utility of a GenAI-based chatbot for extracting actionable insights from complex user guides. 30 To ensure a systematic and robust evaluation, the experiment employed multiple testers to assess inter-rater variability, given the subjectivity in assessing outputs based on the nature of qualitative outputs. The dataset included various question types, ranging from straightforward retrieval to more complex, nuanced queries, and incorporated an audit trail for transparency and reanalysis. The sample size was chosen pragmatically to balance the need for a comprehensive evaluation against the inherent uncertainty of initial performance. Given the non-deterministic behavior of LLMs, questions were repeated verbatim and with altered wording to assess response variability. Assessors were provided with clear guidance to minimize subjectivity, though some level of subjective assessment remained unavoidable. The experiment showed that 73% of the answers generated by the LLM when prompted twice with the same question were consistent; when responses did not match, the variations were limited in both accuracy and completeness. Succinct prompts and questions yielded better LLM responses. 30
A significant challenge in the methodical testing of LLMs lies in bridging the gap between controlled assessments and real-world deployment. In practice, the value of outputs is often evaluated qualitatively based on whether the response enables human reviewers to take the correct actions effectively and reliably. This inherent subjectivity highlights the need for ongoing monitoring and assessment of LLM performance during routine use, as methodologically derived performance metrics may not fully predict operational outcomes.
In another experiment, OpenAI’s GPT-4 model, utilized within a retrieval-augmented generation (RAG) framework and enriched with a business context document, was tested to generate structured query language (SQL) code from natural language queries (NLQs) for complex relational PV databases. 31 This approach significantly improved NLQ-to-SQL accuracy, from 8.3% with the database schema alone to 78.3% with the business context document. The tool’s performance was evaluated based on objective criteria: a pass for code that ran correctly without modifications, a partial fail for code requiring minor adjustments, and a fail for code that did not run at all. Initial findings demonstrated promising results, revealing that the development of context documents enriched with expert knowledge exposed to the LLM significantly enhanced its performance in generating accurate SQL code for complex relational PV databases. Unlike relying solely on the schema, these context documents provided the LLM with a deeper understanding of the intricate structures and relationships within the enterprise database, allowing it to capture subtleties that would otherwise be overlooked. This sharing of domain-specific knowledge empowered the LLM to better comprehend nuances, resulting in superior task execution for text-to-SQL conversion. While this approach showed clear benefits, further investigation is needed to determine the generalizability of using context-enriched documents for other programming tasks or databases and to develop standardized guidelines for creating these context documents to optimize LLM performance across different applications.
Lastly, experiments using LLMs fine-tuned on AE data to integrate structured and unstructured multilingual intake data into coherent English narratives demonstrated the model’s potential to hallucinate information, posing a risk in high-risk applications. This observation led to the implementation of “hard guardrails” to prevent critical errors, akin to medical “never events” that must be avoided due to their potential for significant harm. 32 These guardrails included rule-based checks that ensured model outputs adhered to strict standards, effectively preventing hallucinations of key PV terms. This proactive approach highlights the importance of integrating robust safeguards and comprehensive human oversight—referred to as “soft guardrails”—to provide effective error management and enable human reviewers to address uncertainties efficiently. The experiment also demonstrated the importance of developing mechanisms for token-level uncertainty quantification, where the model quantifies where it is uncertain with outputs prepared (e.g., with visualizations) to enable humans to identify areas that may require scrutiny and review.
The body of evidence indicates that GenAI integration into PV systems requires an ecosystem of both algorithmic and human-driven safeguards to ensure reliability and maintain trust. Future work must continue to explore strategies for minimizing all types of errors, optimizing human–computer interactions, and establishing comprehensive testing frameworks tailored to the high-risk nature of PV.
Beyond experimental use cases, the biopharmaceutical industry now routinely employs GenAI for foundational tasks, leveraging sandboxed general LLM tools to support text preparation, summarization, and document interrogation.33,34 As discussed above, regulatory agencies have also begun implementing GenAI in operational PV contexts, such as through the FDA’s new Elsa system.35,36 These examples illustrate how GenAI is already being integrated into PV processes and underscore the growing importance of context-specific AI tools in global drug safety efforts.
Despite this growing adoption, there remains a paucity of peer-reviewed publications describing routine, end-to-end workflows that incorporate general GenAI capabilities or report resulting performance metrics. To date, the routine use of GenAI tools tends to be task-specific—such as text preparation, summarization, and interrogation of complex documents—often stratified according to domains of business-relevant content. While pipelines for GenAI use have been discussed, detailed descriptions largely reside in the gray literature.
In parallel, AI systems based on rule-based and statistical natural language processing (NLP) pipelines are already in routine use within PV workflows, particularly for literature screening. One of the earliest implementations was presented by Glaser et al., who described an NLP-driven system designed to prioritize biomedical publications relevant to drug safety. 37 This system, developed and evaluated using real-world literature sources, exemplifies how non-GenAI approaches remain critical in current workflows. Compared to emerging GenAI-enabled pipelines, these systems demonstrate the maturity and continued relevance of rule-based methods. Together, these complementary approaches illustrate the breadth of AI deployment in PV—from established, rules-based screening tools to next-generation GenAI models aimed at narrative generation and hypothesis exploration.
Considerations and practical approaches for GenAI in PV
Integrating GenAI into PV marks a shift from traditional ML techniques that rely on structured data and algorithmic transparency. While traditional methods require predefined models and extensive feature engineering to ensure reliability, GenAI offers enhanced flexibility for handling unstructured data, generating human-like text, and uncovering complex patterns. However, these capabilities bring new challenges—such as interpretability, predictability, and risk management—which must be carefully addressed for safe, effective deployment in PV systems.
As the number and diversity of GenAI models grow, PV professionals must stay informed about the capabilities and limitations of different tools. These tools differ in their training data, architecture, and intended cases, and several are increasingly explored in health-related domains. Table 1 summarizes widely used GenAI models and their relevance to PV, including newly emerging tools such as DeepSeek-VL from China. 38
Well-known GenAI tools and their relevance to pharmacovigilance.
AE, adverse event; GenAI, generative artificial intelligence; NLP, natural language processing; PV, pharmacovigilance; QA, question answering; USMLE, United States Medical Licensing Examination.
A recent systematic review by Warner et al. 39 highlights that most real-world research efforts in AI for PV are now focused on signal detection, with models such as random forest and gradient boosting machines consistently outperforming traditional disproportionality methods. Their analysis confirms that supervised ML approaches not only improve performance but also help identify previously unknown safety signals when applied with methodological transparency and appropriate gold standard controls. Complementing this, Imran et al. 40 describe the successful deployment of an XGBoost model within a pharmaceutical company for signal validation, using SHAP (SHapley Additive exPlanations)-based explanations to improve expert acceptance and trust. This case illustrates how interpretable ML can be applied in routine PV settings to support human decisions while maintaining regulatory expectations. Although these examples do not employ GenAI, they underscore the broader potential of AI in PV and reinforce the importance of focusing GenAI development on high-impact tasks such as signal detection—provided interpretability and responsible validation remain central in real-world applications.
Understanding GenAI capabilities and limitations
GenAI’s functionalities, including natural language understanding, text generation, summarization, classification, data extraction, and question answering, hold significant promise for PV tasks. However, its limitations—finite context, lack of tool usage, and the risk of various types of errors—necessitate rigorous validation and monitoring frameworks to ensure reliable outputs. These safeguards are essential to maintain trust and ensure that GenAI complements, rather than compromises, decision-making in PV.
Risk-based planning and experimentation
A risk-based methodology guides the strategic deployment of GenAI, balancing testing intensity with patient safety. Experimentation or proof of concepts can help establish GenAI’s suitability for specific PV tasks, requiring careful design to generate meaningful performance metrics. Effective data selection should account for edge cases and rare events, while evaluation protocols must ensure that outputs meet real-world utility standards. Human evaluation should be well-planned to maximize expertise and manage resources effectively. Sequential experimentation on high-impact areas can reveal operational and scientific efficiencies while fostering gradual integration. Post-deployment monitoring is also essential to detect any deviations in performance with adequate preplanned interventions and predefined escalation mechanisms to address deficiencies using a risk-based approach.
Human–AI interaction and organizational readiness
Successful GenAI integration requires both technological and human factors, supported by structured strategies to accelerate skill acquisition and user confidence. Training programs should simulate real-world scenarios, allowing users to practice and understand limitations like suboptimal prompting. Interactive systems that provide feedback on input quality can enhance engagement and efficiency, while task-specialized AI agents and “chain-of-thought” interactions further improve human–AI collaboration. Ongoing user feedback mechanisms should be embedded in the system to support continuous learning and improvement.
To help illustrate how GenAI might function in future PV workflows, consider a hypothetical, but plausible, use case. A GenAI system tuned for PV monitors social media platforms such as Reddit and identifies a cluster of posts describing an unusual experience (e.g., blurred vision or panic attacks) following initiation of a newly released medication. Using contextual analysis, the model links the symptoms to the product name and generates a narrative summary for safety reviewers. To add context, the system queries an EHR data source via an application programming interface (API), retrieving de-identified records to estimate how often the drug has been prescribed in the past 6 months, stratified by age or comorbidity. This helps estimate background exposure and guides whether the signal may warrant further evaluation. For more illustrative examples from non-GenAI applications (i.e., how ML could be used), and how it might provide erroneous outputs, see Figure 1 in a study by Kjoersvik and Bate. 25
While this example is intended to be illustrative, we emphasize that such systems would face serious challenges: informal language in social media, inconsistent event timing, data linkage difficulties, and the risk of both false positives and missed patterns. Furthermore, integrating data from external sources like EHRs raises concerns around access permissions, interoperability, and data quality. Any real-world deployment would require strong traceability, validation pipelines, and rigorous human oversight to ensure that the signal is both methodologically and ethically actionable. However, it is important to note that even full human-in-the-loop oversight will sometimes not be sufficient by itself to mitigate and minimize all risks associated with erroneous AI outputs.
Ethical, legal, and data integrity considerations
Protecting patient data, respecting privacy laws, and addressing copyright concerns are non-negotiable requirements for the ethical deployment of GenAI in PV. Establishing clear, enforceable guidelines ensures that AI operations remain transparent, ethical, and compliant—fostering trust and supporting regulatory expectations. Ongoing performance monitoring is equally critical, with systems in place to systematically track metrics reflecting both operational benefits and potential risks, such as hallucinations, biases, or omissions. Legal review processes should be integrated into AI deployment pipelines to anticipate jurisdictional and international data use limitations.
The rapid growth in AI adoption within PV and other biomedical domains has raised complex ethical challenges. While in-depth discussions are provided elsewhere,41,42 ethical considerations must remain central as AI capabilities and societal expectations evolve. Multiple ethical dimensions must be addressed in the application of AI, some of which are broadly relevant across domains, while others are specific to PV.43,44 For example, the ethical use of EHR data and AI has been explored by the Primary Care Informatics Working Group of the International Medical Informatics Association. Additional challenges more specific to AI in PV are also emerging. 45
While many ethical principles are applicable across all uses of AI, certain issues are especially important in high-risk domains like medical safety and PV. Examples of GenAI-related ethical considerations include balancing the need to maximize data access for insight generation with strong data privacy protections, ensuring explainability in critical use cases where it may be computationally expensive to achieve, and determining the appropriate threshold and timeliness for routine deployment of AI systems once sufficient evidence of performance, generalizability, and actionability has been demonstrated.
Legal frameworks and interpretations continue to evolve, especially in light of the unprecedented scale of content generated using GenAI. This shift places increasing emphasis on unresolved legal challenges such as copyright protection and intellectual property rights. 46
Given the high-risk nature of PV, at least in part, AI systems must align not only with technical standards but also with established regulatory frameworks. These include international guidelines such as the ICH E2E Pharmacovigilance Planning Framework, 27 the EMA’s GVP, 3 and emerging FDA guidance on the use of AI/ML in Software as a Medical Device (SaMD). 47 Although these documents do not yet explicitly address GenAI, they offer foundational expectations regarding validation, oversight, and performance monitoring that remain highly relevant. Future implementations of GenAI in PV will need to consider alignment with these and other evolving standards.
Enhancing transparency, interpretability, and safety
Transparency and interpretability are essential for trust and accountability in AI-powered PV, providing information on why a decision was made. Visualizations of GenAI outputs enable stakeholders to understand AI-generated insights and the model’s uncertainty, reinforcing confidence and facilitating regulatory monitoring. In addition, integrating clear, concise summaries of decision rationales and offering interactive question and answer features where users can query the AI about specific decisions can further enhance transparency. Particularly in high-stakes decisions, automated logging of explanation steps may help support both auditability and user trust.
Multimodal and synthetic data strategies
Expanding GenAI’s application to multimodal data enhances PV by synthesizing insights from varied sources, offering a more holistic view for better decision-making. 48 Generating synthetic data might address data sparsity issues, creating training datasets that prepare medical professionals for rare cases, thereby strengthening safety measures and PV readiness. 49 In addition, GenAI’s ability to analyze and aid the integration and interpretation of diverse data streams, such as real-world data (RWD) and social media, 50 introduces richer perspectives on patient safety, expanding PV’s data landscape, and enhancing the multidimensional approach to patient protection. 51 Care will need to be taken to ensure that such enrichment does not inadvertently introduce bias, leading to poorer outcomes.
Operational integration pathways
Integrating GenAI into existing PV systems requires the coordinated alignment of technical infrastructure and regulatory considerations. While this perspective does not describe a complete working system, we outline potential integration pathways informed by emerging trends in healthcare AI. A data-driven, AI-enabled framework is central to this vision. 52 For example, GenAI models can be deployed via secure APIs to access metadata and narrative fields from data sources such as the publicly available Freedom of Information version of FAERS (FDA Adverse Event Reporting System), supporting tasks such as summarization and triage. In terminology-driven systems like MedDRA, GenAI could assist with automated coding, ontology mapping, or cross-lingual harmonization. Recent PV implementations demonstrate the feasibility of combining ML outputs with explainability layers to support expert review.39,40,53
Prompting strategies can be enhanced through RAG pipelines, which extract relevant context from safety databases, PV process documentation, or regulatory guidance prior to model inference. 30 Recent work demonstrates how multi-agent orchestration frameworks—such as MALADE—combine GenAI reasoning with external tools and structured query agents for AE extraction. 54 In other cases, a hybrid architecture routes GenAI-generated hypotheses through symbolic rules for validation—an approach known as symbolic-neural orchestration. These methods enable integration into biomedical workflows using prompt tuning, ontology-aware embeddings, and supervised task alignment. 28
GenAI is also beginning to appear in end-user workflows for literature screening and triage. Studies have shown that LLMs can significantly streamline PV literature review processes while maintaining reviewer-level sensitivity. 53 Additional use cases under active exploration include real-time summarization and signal extraction in platforms such as Sentinel, where LLMs are being considered to augment both structured query systems and free-text analysis pipelines. 55 These developments support embedding GenAI capabilities into PV dashboards to assist with case prioritization, narrative generation, and hypothesis formulation in human-in-the-loop environments.
Hybrid reasoning systems, which combine the flexibility of neural language models with the rigor of rule-based logic, offer a promising approach for GenAI integration in PV. These systems can enhance interpretability by allowing symbolic checks to validate or constrain model-generated outputs—particularly important in contexts demanding traceability and regulatory compliance. 28
As GenAI systems evolve toward multimodal capabilities, careful consideration of system design will be required to manage oversight across diverse data streams—such as inputs from AI-driven clinical support systems—and to assess the implications of combining these inputs within PV processes and generating RWD outputs. 56
Ensuring robust validation and safeguards
Due to the high-risk nature of PV, comprehensive validation processes are crucial. Early involvement of computer system validation experts helps define a pragmatic validation framework aligned with GenAI’s specific requirements, covering data scope, quality control, and regulatory expectations. Validation strategies must be adaptable to both static models and those subject to frequent updates or retraining. User-friendly validation tools can assist developers, facilitating rapid, compliant deployment of AI tools within PV. 19
Governance considerations
The integration of GenAI into PV must be guided by strong ethical and regulatory governance frameworks. Key considerations include ensuring human accountability, managing bias, supporting explainability, and maintaining patient trust. While this perspective does not report on deployed systems, it outlines principles that should guide implementation. These include human-in-the-loop oversight, interpretable models (e.g., using SHAP explanations), and continuous performance monitoring across diverse populations and use cases.
Recent literature has emphasized the importance of placing ethics at the forefront of GenAI deployment in PV and this should be foundational to a governance approach. Jain et al. highlight patient-centered safeguards and call for a structured approach to bias mitigation, transparency, and organizational readiness. 45 Glaser and Littlebury provide practical governance recommendations, including the role of cross-functional oversight and performance traceability in AI-enabled safety workflows. 26
Several regulatory frameworks remain relevant today or provide foundational guidance for AI oversight in high-risk domains. These include the ICH E2E Pharmacovigilance Planning Guideline, 27 the EMA’s GVP, 3 the FDA’s AI/ML SaMD Action Plan, 47 and the recently adopted EU Artificial Intelligence Act. 57 While not all of these directly address GenAI at present, they establish key principles—such as transparency, robustness, and risk classification—that can inform GenAI system design and evaluation in PV.
In addition, the draft Council for International Organizations of Medical Sciences Working Group XIV report provides emerging international guidance which includes AI governance in PV, emphasizing the importance of accountability, lifecycle oversight, and human intervention in safety-critical tasks. 58
Although a full treatment of these ethical and regulatory topics is beyond the scope of this perspective, we acknowledge their importance and refer readers to these evolving resources to inform future implementation efforts.
Future considerations and potential challenges for GenAI in PV
While implementing a PV ecosystem that leverages GenAI technologies offers theoretical promise for improving economic, and operational efficiencies, and support deeper scientific insights, it also represents a significant shift from current PV practices. For GenAI to be routinely and safely incorporated into PV, an AI-PV-QMS agnostic to any GenAI technology will be essential. Such a system must support the rapid evolution of GenAI tools and capabilities while providing a stable foundation for quality assurance and regulatory compliance.
Challenges
Integrating GenAI into PV poses several complex challenges that demand ongoing research and innovation. Key challenges in PV, as well as broader considerations for GenAI, are presented in Tables 2 and 3. These challenges underscore the nuanced requirements of deploying GenAI in high-risk, highly regulated applications like PV. Many notable problems specific to PV are listed in Table 4.
General challenges for PV method testing important for GenAI development.
GenAI, generative artificial intelligence; PV, pharmacovigilance.
Generic challenges for GenAI and lessons for PV.
AI, artificial intelligence; GenAI, generative artificial intelligence; LLMs, large multimodal models; ML, machine learning; PV, pharmacovigilance.
GenAI problems, challenges, and possible mitigations of specific importance for PV.
ABPI, Association of the British Pharmaceutical Industry; GenAI, generative artificial intelligence; ICSR, individual case safety report; LLM, large language model; PV, pharmacovigilance.
Regulatory constraints pose a significant hurdle to the integration of GenAI in PV. The highly regulated nature of PV demands strict adherence to validation and compliance standards, which can introduce redundancies that complicate seamless integration. As regulatory frameworks evolve to accommodate AI, it is essential to assess whether GenAI can provide superior solutions within these constraints or if certain rigid requirements may hinder its effective application. Given the speed of change in GenAI technologies, this will necessarily need to be iterative.
Preparing organizations is critical for the adoption of GenAI in PV. This involves upskilling the workforce, building trust, establishing clear roles between technical and functional teams, and fostering a shared understanding of GenAI’s potential. Initial training on simpler systems and earlier versions of GenAI can help users recognize common pitfalls and prepare for more advanced applications. As GenAI technology becomes more complex, the potential for subtle, hard-to-detect errors increases, making ongoing training essential for identifying limitations and missteps specific to GenAI. Continual monitoring is therefore essential, with purposeful simulation and incorporation of errors into outputs to test and reinforce the robustness and efficiency of human-in-the-loop reviews to detect errors and minimize the risk of complacency.
Understanding optimal human–computer interaction, including fault tolerance and mitigating human error, is crucial for high-risk applications of GenAI. Lessons learned from human factors and behavioral sciences should be incorporated to enhance the collaboration between humans and GenAI systems.
The learning curve for effectively utilizing GenAI in PV requires comprehensive change management strategies to accelerate skill development. Recognizing that users vary in their familiarity with technology and their trust in new systems is essential. Initially, end-users must gain a basic understanding of GenAI’s capabilities and limitations. As GenAI adoption widens, PV professionals need to be proficient in identifying scenarios where LLMs may produce suboptimal or incorrect outputs. These current limitations include handling complex queries, data tables, knowledge cutoffs, context window restrictions, and issues with ordering and clustering information.
To support effective upskilling, PV professionals should experiment with LLMs at an early stage to learn from the more apparent errors in less sophisticated versions. Developers should integrate behavioral considerations and implement safeguards and assistive features to minimize risks. This includes using behavioral forecasting and design thinking to predict risks and identify potential misuses, ensuring that LLM outputs are user-friendly and safe. Practical measures such as output confidence scores, templated responses, consistent tone, and moderation of outputs are vital, particularly for regulated documents. By prioritizing safety in the design of GenAI systems and educating users on common errors, risks can be mitigated, enabling GenAI tools to be fully harnessed.
Opportunities
Despite these challenges, GenAI offers substantial opportunities for enhancing PV. By introducing efficiencies across the PV lifecycle, GenAI can automate many routine tasks and unlock deeper insights from diverse data sources, including RWD and social media. 51 These capabilities may lead to more comprehensive safety monitoring and a multidimensional approach to patient protection.
Furthermore, transparent and interpretable AI outputs are critical for building trust among stakeholders. Enhanced visualization techniques and clear reporting can make AI-driven insights more accessible, fostering confidence and enabling effective regulatory monitoring. As LLMs are integrated into PV, creating a checklist-based framework for evaluating GenAI systems can aid in systematically transitioning experimental GenAI applications into production.
Finally, a proactive approach to ensure that workforce upskilling happens early, providing users with the skills needed to engage effectively with GenAI. This preparation can help mitigate risks associated with GenAI’s early stages and support a smoother transition to more sophisticated, capable systems over time.
Final considerations
To maximize GenAI’s contributions, it is essential to evaluate its impact holistically across PV. Gold standard testing should compare GenAI outputs with existing human review processes, ideally through randomized assessments that isolate AI-generated insights from traditional outputs. By scrutinizing each AI-driven decision for its broader implications, PV can more effectively determine when and where human oversight is required, ensuring that AI integration enhances rather than detracts from patient safety.
While promising, the future of GenAI in PV hinges on careful orchestration. By balancing regulatory demands, robust validation, and innovative training, GenAI can be a powerful tool in advancing PV and protecting patient health.
Conclusion
This perspective review outlines the evolving landscape of GenAI in PV, emphasizing the dual promise and complexity these technologies bring to high-stakes regulatory environments. Integrating GenAI into PV will enable a new era with enhancements in patient safety and optimizations of PV processes. GenAI’s capabilities to automate data-intensive tasks, analyze unstructured data, and generate insights at scale can significantly augment the efficiency with which PV scientists detect and evaluate safety signals. By leveraging these technologies, PV systems can expand beyond traditional structured data sources to incorporate RWD, social media streams, and other emerging datasets—much like widening the lens through which we monitor patient safety.
However, the path to fully integrating GenAI into PV is not without obstacles. GenAI models often function as “black boxes,” raising concerns about interpretability—an essential factor in a domain where understanding the rationale behind a decision is critical. Issues related to data privacy and regulatory compliance add further complexity, resembling constraints in an optimization problem that must be carefully balanced. These challenges are compounded by the novelty of GenAI in regulated domains, which lack precedents for evaluation or deployment at scale. Establishing a robust AI-PV-QMS, implementing structured change management, and providing ongoing workforce training and monitoring are crucial to ensure that GenAI serves as an enhancement rather than a disruption to existing PV practices. Achieving regulatory alignment will require collaborative efforts to balance the innovative thrust of AI with the stringent validation and monitoring standards mandated by regulatory bodies.
As a forward-looking reflection, continuous research and development are vital to refine GenAI applications in PV and ensure their responsible use. Establishing gold standards, conducting rigorous validation studies, and developing frameworks that support both human oversight and AI transparency are key steps in transitioning GenAI from experimental phases to routine practice. As the PV ecosystem evolves to integrate GenAI, maintaining a shared vision between technology developers and PV professionals becomes paramount. Furthermore, it remains an open question to what extent GenAI can truly support enhanced interactions between health authorities and regulated pharmaceutical companies, or if regulatory frameworks, which tend to evolve slowly and which do not fully address the potential of GenAI, will impede the potential positive impact that GenAI can afford. Ultimately, success will depend not only on technical innovation but also on cultivating trust, governance, and interdisciplinary collaboration. As best practice and guidance develop, GenAI can potentially act as a powerful tool—a computational catalyst—in advancing a safer and more efficient patient-focused PV landscape.
