Sage Journals: Discover world-class research

Abstract

The advent of generative artificial intelligence (GenAI) has introduced both remarkable opportunities and significant challenges in the field of pharmacovigilance (PV). This perspective review reﬂects on emerging trends, practical use cases, and conceptual frameworks shaping the integration of GenAI in high-risk domains such as drug and vaccine safety monitoring. We draw on current experiments and early real-world applications to examine the potential benefits, inherent risks, and propose a framework for integrating GenAI into PV systems, emphasizing the necessity of rigorous testing, human oversight, and ethical considerations. Our goal is to support PV professionals and stakeholders in navigating this rapidly evolving landscape by identifying promising strategies and implementation pathways.

Plain language summary

Generative AI in pharmacovigilance

Perspective Review: Orchestrating generative AI in pharmacovigilance—predicting and preempting the unpredictable

This perspective review explores how generative artificial intelligence (GenAI) may change the way we monitor the safety of medicines and vaccines—a process known as pharmacovigilance (PV). The paper outlines both the opportunities and the challenges of using GenAI in this high risk healthcare application.

• PV is the process of identifying, assessing, and preventing side effects from medicines once they are in use by patients.

• GenAI tools can help PV by analyzing large amounts of safety data, summarizing reports, and generating draft content that supports faster decision-making or enhancing process efficiencies.

• These tools also come with challenges, such as generating inaccurate information (“hallucinations”), missing key details, or producing results that are hard to explain or verify.

• To safely use GenAI in PV, organizations must design experiments that test how GenAI performs, establish clear safeguards, and appropriately implement and monitor elements as part of an overall risk-based PV system.

• With careful planning, GenAI could improve how quickly safety concerns are detected and addressed, ultimately helping protect patients and making drug monitoring systems more efficient.

Keywords

drug safety generative AI pharmacovigilance

Introduction

Recent advances in artificial intelligence (AI) technologies are dramatically revolutionizing processes and tasks across all disciplines, including medicine and healthcare,¹ particularly in pharmacovigilance (PV).² As the field continues to explore these innovations, there is a growing need for reﬂective perspectives that integrate technical, ethical, and practical viewpoints. The ability to effectively collect, manage, and analyze data, and act appropriately on outputs is at the heart of PV. These activities aim to both better understand the known or hypothesized effects of medicines and vaccines and to identify the entirely unexpected. The promise of AI across all aspects of the PV lifecycle is therefore enormous. While these technological strides present us with undeniable opportunities, they also confront us with challenges that are not fully understood. AI, much like any intricate algorithmic system, exhibits imperfect performance. The advent of generative AI (GenAI)—which produces novel outputs based on a set of inputs—introduces an additional layer of complexity to the risk–benefit (Risk–benefit means weighing the positive effects of a medicine against its potential risks. This includes considering any issues related to the medicine’s quality, safety, or effectiveness that might affect patients’ health or public health³) assessments that inform AI deployment decision-making.

When evaluating the intended purposes of a given AI application, risk needs to be considered. Several bodies have proposed risk categorizations. While the National Institute of Standards and Technology (NIST) defines three risk categories—low, moderate, and high—that are applicable to GenAI,^4,5 and the strategies we need to employ in an overall high-risk application may differ significantly from those in low-risk applications.⁶ Both NIST⁵ and the European Union (EU) AI Act⁷ define tiered categories of risk ranging from very low to very high, or from acceptable to unacceptable risk. Unacceptable risk within the EU AI Act refers to systems involving factors unsuitable for any business purpose, such as those that may result in manipulative or exploitative practices. High risk, however, takes into consideration the impact on individuals’ health and safety, in addition to organizational risk, and many areas of PV inherently fall into this high-risk category. Importantly, there are definitional differences of these concepts that exist and warrant harmonization.

Implementation, deployment, and ultimately routine usage of AI systems necessitate continuous assessment and management of their performance relative to the specific use cases to which they are applied. The propensity for risks associated with AI—such as errors of all types, including hallucinations, biases, and omissions ranging from subtle to overt—is a concern that impacts its trusted use in many situations. Moreover, there are practical challenges in optimally operationalizing AI as part of complex processes, including human-in-the-loop interactions within human–AI interfaces.

AI use cases must be intimately linked to their intended purposes, and the impact of imperfect performance can vary, potentially leading to undesirable consequences. Thus, when making trade-off decisions about implementing AI-supported processes, we must have the ability to preemptively develop plans to detect, assess, understand, and mitigate these issues to the extent possible, weighing them against the potential benefits offered by AIs. The level of testing, scrutiny, and mitigation strategies required, as well as the acceptability of using a given AI solution, are intrinsically linked to the perceived level of risk associated with its application within its intended purpose.

In this perspective, we aim to contextualize the rapidly evolving use of GenAI in PV by examining both emerging real-world use cases and experimental deployments, while also drawing parallels with other uses of AI or machine learning (ML) in PV. We reﬂect on both the promise and the pitfalls of these technologies and the broader implications for safety-critical domains.

That said, while the new challenges presented by these AI technologies are undeniable, it would be a mistake to assume that routine AI cannot play a major role in PV. Indeed, the history of AI and ML testing of applications in PV spans many decades.⁸ Work has ranged from routine execution of disproportionality analyses using association rule analysis with Bayesian shrinkage (with the extent of the shrinkage learnt from pre-existing data) for quantitative signal detection,⁹ to duplicate report identification,¹⁰ screening of free text for adverse event (AE) information or other PV-relevant data,¹¹ anomaly identification in classifications,¹² and case report seriousness classification.¹³

More recently, the U.S. Food and Drug Administration (FDA) developed and evaluated SPINEL (Supporting Pharmacovigilance by Leveraging Artificial Intelligence Methods to Analyze Electronic Health Records Data), an AI-enabled software prototype that extracts opioid-related adverse drug events (ADEs) from electronic health records (EHRs) using keywords and trigger phrase analysis.¹⁴ SPINEL demonstrated high accuracy in detecting known opioid-related ADEs and received positive usability feedback from FDA participants, showcasing the potential for domain-specific AI systems in routine safety surveillance tasks.

Similarly, the Uppsala Monitoring Centre deployed vigiRank as a routine signal detection method.¹⁵ VigiRank used ML in the shape of shrinkage logistic regression to identify predictive variables of case series of known emerging safety signals. Having estimated the respective weighted contributions of different variables for signal prediction, this weighted score is now used prospectively to predict future case series based on completeness and clinical relevance for prioritized clinical review.¹⁶

Performance testing of AI or ML has always been challenging in PV, particularly given the need to define and develop gold standards for testing—and their potential suboptimal use¹⁷—noisy, sparse, non-random training data⁸ perpetuated by changes in data access and evidence generation approaches within PV.¹⁸ Nevertheless, older AI/ML technologies tended to have certain attributes that facilitated their evaluation and use in PV, allowing for extensive testing and application.^19–21

Historically, the use of ML exhibited certain attributes: some degree of determinism and understanding/explainability of how an ML algorithm arrived at a specific solution, clarity on training (and test) data used to run the model, and fixed (often binary or discrete) outputs.⁸ However, in the era of GenAI, some or all of these attributes have changed, making evaluation more difficult. Figure 1 illustrates this contrast. Traditional AI systems operate in more predictable, controlled environments, whereas GenAI introduces broader creativity and adaptability—along with variability and uncertainty. This conceptual shift has significant implications for safety-critical domains like PV, where oversight and reproducibility have long been considered essential.

Figure 1.

Conceptual comparison between traditional AI/ML and GenAI. Traditional systems are designed for predictability, with known training datasets and explainable outputs. GenAI, while more ﬂexible and creative, produces less predictable results and requires new forms of oversight to ensure safe use in regulated domains like PV.

Several critical questions arise: How much testing is sufficient? How many repetitions of the same question are needed? How should inputs be varied? What constitutes conceptual similarity in an output? Could the widespread usage of GenAI across multiple problem tasks lead to other new or greater magnitude problems in PV? How to address issues like patient privacy violations that could arise when using GenAI? How to tackle any lack of clarity regarding what content/output is AI-generated and how its origin can be tracked? Is missing information as problematic as misleading information for output generation? How to conduct ongoing performance monitoring?

Given the rapid progress of GenAI, the numerous unanswered questions, and its impact on routine PV activities, it is crucial to emphasize the significance of designing and conducting proper experimentation that enables learning from these experiences. The ease of use and the alluring, rapid outputs that are generated by GenAI could lead to overlooking the further need for experimentation.²² As we reﬂect on the possible uses of this technology, we must consider various scenarios in which GenAI could impact PV, weighing their potential outcomes and implications.

For example, the lack of explainability may be particularly concerning and challenging for AI in PV in general. Explainability is traditionally considered essential to ensure compliance with regulatory standards, to build trust among stakeholders, and to enable the integration of AI in PV. The integration of explainable AI (XAI) techniques offers a promising solution to enhance transparency and trust in AI-driven decision-making processes. XAI can provide insights into how AI models derive their outputs, thereby making the decision-making process more interpretable for stakeholders.²³ Furthermore, multi-agent-based frameworks (i.e., systems involving many AI applications developed and trained for specific PV roles and associated tasks) offer the potential for GenAI to “check itself” for validity and reduce the occurrence of all types of errors.²⁴

Explainability of GenAI outputs can potentially be enhanced, to an extent, by combining AI with rules-based systems, thereby providing a clear, logical framework that complements the unpredictable capabilities of AI and making the decision-making process more transparent and understandable.^6,23 However, if AI and/or rules-based systems are used in GenAI explainability, given the unavoidable dimensional reduction, one must guard against obfuscating the reasons for an output and giving confidence to an erroneous output, risking wrong actions as part of a process.

A core aspect of PV is the ability to identify rare potential “black swan events,”²⁵ that is, safety signals that may be rare and unexpected but have the potential to impact the benefit–risk profile of a medicine in routine use, either in general or for specific populations. Routine PV needs to adopt a lifecycle risk-based approach focusing on quality management systems (QMSs), being able not only to identify and effectively analyze commonly received PV data but also rare and idiosyncratic ADEs.²⁶ It must provide confidence to users and other internal and external stakeholders regarding the effectiveness of the system in protecting patient safety.⁶

Experimentation with GenAI technologies, when conducted properly, can provide performance metrics and evidence for decision-making on whether and how to implement these tools into existing processes. However, deploying GenAI into production within the context of PV systems poses notable challenges, including validation and minimizing risks of problematic AI outputs when assessed in a risk-based manner. Due to the critical nature of certain aspects of PV, especially those directly impacting patient safety, there is a very low tolerance for errors. However, it is recognized that other processes within PV, such as literature search and synthesis, may also have more tolerance for certain types of irregularities.

To date, there is limited evidence of complex GenAI applications being put into production for routine use in PV. Furthermore, the use of AI/ML has often been more one-off research of specific use cases and has not accounted for the dynamic and interactive nature of the PV ecosystem, where various AI agents would work effectively with each other and humans as part of well-defined processes.

The aim of this perspective review is to explore how GenAI may be orchestrated within PV to anticipate and manage its inherent unpredictability. We describe the challenges and opportunities for widespread use of GenAI across PV, as illustrated by several experiments from across a range of PV applications. We show how GenAI could inform a future research agenda for enabling widespread routine uses in PV and other areas with potentially high-risk applications.

The phrase “predicting and preempting the unpredictable” is used here to express the need to enable learning from common or well-known aspects of PV for faster and more effective handling of safety data. This phrase also refers to the aspirational goal of GenAI in surfacing early or subtle outliers, where the novelty of an issue might make learning from prior data more challenging. We use it to refer to the model’s potential to identify these subtle but high-impact patterns that might otherwise be overlooked, not to imply deterministic foresight. This framing aligns with the emerging work on uncertainty-aware reasoning and probabilistic signal detection.

Although this perspective focuses on potential applications of GenAI in PV, we recognize that the real-world environment in which such systems would operate is highly complex. PV systems must navigate regulatory constraints, jurisdictional differences in safety reporting, and the need for compliance with international frameworks such as ICH E2E and EMA Good Pharmacovigilance Practices (GVP).^3,27 AE reports, even for the same medicine-outcome pair, remain enormously heterogeneous in context, often coming from very different healthcare settings, can be submitted in multiple languages, and can contain noisy, unstructured narratives, complicating automated interpretation.²⁸ Clinical trial data add further heterogeneity due to variations in study design, data formatting, and population characteristics. When considering the integration of GenAI into PV practice, these challenges underscore the need for careful system design, multilingual model capabilities, and rigorous validation.

Insights from GenAI experiments in PV

Experiments involving the deployment of GenAI in PV have highlighted both potential benefits and significant challenges. One experiment assessed the capability of Chatbot Generative Pre-trained Transformer (ChatGPT)-3.5 to extract signs and symptoms from medical literature and convert them into Medical Dictionary for Regulatory Activities (MedDRA) preferred term codes, comparing these outputs against a human-coded gold standard. The results indicated that ChatGPT-3.5 outperformed other algorithms with a 78% predictive accuracy and a kappa value of 1 across 10 iterations, suggesting its promise in PV applications.²⁹ However, defining acceptable thresholds for performance remains critical, as these thresholds must consider the purpose of the application, the severity of potential errors, and the predictability of failures, all of which directly impact mitigation strategies.

The generalizability of the performance needs to be considered at the time of study, but also over time. For example, would the model’s performance remain stable or degrade with continuous use? Effective performance monitoring strategies must be devised to identify both abrupt and gradual declines. Moreover, it is essential to determine mitigation measures to be taken in case of performance deterioration, and to establish frameworks for optimal human-in-the-loop interactions. This requires exploring whether model failures are predictable and if they consistently occur in specific contexts that could be ﬂagged for human review. The seriousness and systematic nature of these failures would inﬂuence their acceptability and the feasibility of deploying algorithmic or human-driven corrective actions.³⁰

Research underscores the importance of adequate prompt engineering for large language model (LLM) inputs to maximize the quality of LLM outputs, although prompt engineering alone may not fully ameliorate all errors, particularly in real-world use, where suboptimal prompting will occur with some frequency.²⁶ The potential for biased, incorrect, or non-actionable outputs presents an additional challenge. AI-generated outputs, which may seem technically correct, may lack the contextual depth needed to support decision-making, thereby limiting their practical utility. To address this, an experiment evaluated the utility of a GenAI-based chatbot for extracting actionable insights from complex user guides.³⁰ To ensure a systematic and robust evaluation, the experiment employed multiple testers to assess inter-rater variability, given the subjectivity in assessing outputs based on the nature of qualitative outputs. The dataset included various question types, ranging from straightforward retrieval to more complex, nuanced queries, and incorporated an audit trail for transparency and reanalysis. The sample size was chosen pragmatically to balance the need for a comprehensive evaluation against the inherent uncertainty of initial performance. Given the non-deterministic behavior of LLMs, questions were repeated verbatim and with altered wording to assess response variability. Assessors were provided with clear guidance to minimize subjectivity, though some level of subjective assessment remained unavoidable. The experiment showed that 73% of the answers generated by the LLM when prompted twice with the same question were consistent; when responses did not match, the variations were limited in both accuracy and completeness. Succinct prompts and questions yielded better LLM responses.³⁰

A significant challenge in the methodical testing of LLMs lies in bridging the gap between controlled assessments and real-world deployment. In practice, the value of outputs is often evaluated qualitatively based on whether the response enables human reviewers to take the correct actions effectively and reliably. This inherent subjectivity highlights the need for ongoing monitoring and assessment of LLM performance during routine use, as methodologically derived performance metrics may not fully predict operational outcomes.

In another experiment, OpenAI’s GPT-4 model, utilized within a retrieval-augmented generation (RAG) framework and enriched with a business context document, was tested to generate structured query language (SQL) code from natural language queries (NLQs) for complex relational PV databases.³¹ This approach significantly improved NLQ-to-SQL accuracy, from 8.3% with the database schema alone to 78.3% with the business context document. The tool’s performance was evaluated based on objective criteria: a pass for code that ran correctly without modifications, a partial fail for code requiring minor adjustments, and a fail for code that did not run at all. Initial findings demonstrated promising results, revealing that the development of context documents enriched with expert knowledge exposed to the LLM significantly enhanced its performance in generating accurate SQL code for complex relational PV databases. Unlike relying solely on the schema, these context documents provided the LLM with a deeper understanding of the intricate structures and relationships within the enterprise database, allowing it to capture subtleties that would otherwise be overlooked. This sharing of domain-specific knowledge empowered the LLM to better comprehend nuances, resulting in superior task execution for text-to-SQL conversion. While this approach showed clear benefits, further investigation is needed to determine the generalizability of using context-enriched documents for other programming tasks or databases and to develop standardized guidelines for creating these context documents to optimize LLM performance across different applications.

Lastly, experiments using LLMs fine-tuned on AE data to integrate structured and unstructured multilingual intake data into coherent English narratives demonstrated the model’s potential to hallucinate information, posing a risk in high-risk applications. This observation led to the implementation of “hard guardrails” to prevent critical errors, akin to medical “never events” that must be avoided due to their potential for significant harm.³² These guardrails included rule-based checks that ensured model outputs adhered to strict standards, effectively preventing hallucinations of key PV terms. This proactive approach highlights the importance of integrating robust safeguards and comprehensive human oversight—referred to as “soft guardrails”—to provide effective error management and enable human reviewers to address uncertainties efficiently. The experiment also demonstrated the importance of developing mechanisms for token-level uncertainty quantification, where the model quantifies where it is uncertain with outputs prepared (e.g., with visualizations) to enable humans to identify areas that may require scrutiny and review.

The body of evidence indicates that GenAI integration into PV systems requires an ecosystem of both algorithmic and human-driven safeguards to ensure reliability and maintain trust. Future work must continue to explore strategies for minimizing all types of errors, optimizing human–computer interactions, and establishing comprehensive testing frameworks tailored to the high-risk nature of PV.

Beyond experimental use cases, the biopharmaceutical industry now routinely employs GenAI for foundational tasks, leveraging sandboxed general LLM tools to support text preparation, summarization, and document interrogation.^33,34 As discussed above, regulatory agencies have also begun implementing GenAI in operational PV contexts, such as through the FDA’s new Elsa system.^35,36 These examples illustrate how GenAI is already being integrated into PV processes and underscore the growing importance of context-specific AI tools in global drug safety efforts.

Despite this growing adoption, there remains a paucity of peer-reviewed publications describing routine, end-to-end workﬂows that incorporate general GenAI capabilities or report resulting performance metrics. To date, the routine use of GenAI tools tends to be task-specific—such as text preparation, summarization, and interrogation of complex documents—often stratified according to domains of business-relevant content. While pipelines for GenAI use have been discussed, detailed descriptions largely reside in the gray literature.

In parallel, AI systems based on rule-based and statistical natural language processing (NLP) pipelines are already in routine use within PV workﬂows, particularly for literature screening. One of the earliest implementations was presented by Glaser et al., who described an NLP-driven system designed to prioritize biomedical publications relevant to drug safety.³⁷ This system, developed and evaluated using real-world literature sources, exemplifies how non-GenAI approaches remain critical in current workﬂows. Compared to emerging GenAI-enabled pipelines, these systems demonstrate the maturity and continued relevance of rule-based methods. Together, these complementary approaches illustrate the breadth of AI deployment in PV—from established, rules-based screening tools to next-generation GenAI models aimed at narrative generation and hypothesis exploration.

Considerations and practical approaches for GenAI in PV

Integrating GenAI into PV marks a shift from traditional ML techniques that rely on structured data and algorithmic transparency. While traditional methods require predefined models and extensive feature engineering to ensure reliability, GenAI offers enhanced ﬂexibility for handling unstructured data, generating human-like text, and uncovering complex patterns. However, these capabilities bring new challenges—such as interpretability, predictability, and risk management—which must be carefully addressed for safe, effective deployment in PV systems.

As the number and diversity of GenAI models grow, PV professionals must stay informed about the capabilities and limitations of different tools. These tools differ in their training data, architecture, and intended cases, and several are increasingly explored in health-related domains. Table 1 summarizes widely used GenAI models and their relevance to PV, including newly emerging tools such as DeepSeek-VL from China.³⁸

Table 1.

Well-known GenAI tools and their relevance to pharmacovigilance.

Tool name	Developer	Key features	Relevance to PV
GPT-4	OpenAI	High-quality text generation supports multi-step reasoning, widely used in industrial settings	NLP-based AE extraction, literature summarization, safety report drafting
Claude 3	Anthropic	Constitutional AI framework focused on safety, a large context window	Interactive QA, internal documentation synthesis
DeepSeek	DeepSeek-VLLM (China)	Multilingual foundation model, open-weight release with strong Chinese-English performance	Potential utility for non-English AE detection and global literature monitoring
Med-PaLM 2	Google DeepMind	Tuned on medical QA benchmarks, strong performance on USMLE-style reasoning	Evaluation of medical scenarios, potential for narrative consistency checks
BioGPT	Microsoft Research	Pre-trained on PubMed abstracts, designed for biomedical entity recognition and QA	Term normalization, named entity recognition, and AE classification

AE, adverse event; GenAI, generative artificial intelligence; NLP, natural language processing; PV, pharmacovigilance; QA, question answering; USMLE, United States Medical Licensing Examination.

A recent systematic review by Warner et al.³⁹ highlights that most real-world research efforts in AI for PV are now focused on signal detection, with models such as random forest and gradient boosting machines consistently outperforming traditional disproportionality methods. Their analysis confirms that supervised ML approaches not only improve performance but also help identify previously unknown safety signals when applied with methodological transparency and appropriate gold standard controls. Complementing this, Imran et al.⁴⁰ describe the successful deployment of an XGBoost model within a pharmaceutical company for signal validation, using SHAP (SHapley Additive exPlanations)-based explanations to improve expert acceptance and trust. This case illustrates how interpretable ML can be applied in routine PV settings to support human decisions while maintaining regulatory expectations. Although these examples do not employ GenAI, they underscore the broader potential of AI in PV and reinforce the importance of focusing GenAI development on high-impact tasks such as signal detection—provided interpretability and responsible validation remain central in real-world applications.

Understanding GenAI capabilities and limitations

GenAI’s functionalities, including natural language understanding, text generation, summarization, classification, data extraction, and question answering, hold significant promise for PV tasks. However, its limitations—finite context, lack of tool usage, and the risk of various types of errors—necessitate rigorous validation and monitoring frameworks to ensure reliable outputs. These safeguards are essential to maintain trust and ensure that GenAI complements, rather than compromises, decision-making in PV.

Risk-based planning and experimentation

A risk-based methodology guides the strategic deployment of GenAI, balancing testing intensity with patient safety. Experimentation or proof of concepts can help establish GenAI’s suitability for specific PV tasks, requiring careful design to generate meaningful performance metrics. Effective data selection should account for edge cases and rare events, while evaluation protocols must ensure that outputs meet real-world utility standards. Human evaluation should be well-planned to maximize expertise and manage resources effectively. Sequential experimentation on high-impact areas can reveal operational and scientific efficiencies while fostering gradual integration. Post-deployment monitoring is also essential to detect any deviations in performance with adequate preplanned interventions and predefined escalation mechanisms to address deficiencies using a risk-based approach.

Human–AI interaction and organizational readiness

Successful GenAI integration requires both technological and human factors, supported by structured strategies to accelerate skill acquisition and user confidence. Training programs should simulate real-world scenarios, allowing users to practice and understand limitations like suboptimal prompting. Interactive systems that provide feedback on input quality can enhance engagement and efficiency, while task-specialized AI agents and “chain-of-thought” interactions further improve human–AI collaboration. Ongoing user feedback mechanisms should be embedded in the system to support continuous learning and improvement.

To help illustrate how GenAI might function in future PV workﬂows, consider a hypothetical, but plausible, use case. A GenAI system tuned for PV monitors social media platforms such as Reddit and identifies a cluster of posts describing an unusual experience (e.g., blurred vision or panic attacks) following initiation of a newly released medication. Using contextual analysis, the model links the symptoms to the product name and generates a narrative summary for safety reviewers. To add context, the system queries an EHR data source via an application programming interface (API), retrieving de-identified records to estimate how often the drug has been prescribed in the past 6 months, stratified by age or comorbidity. This helps estimate background exposure and guides whether the signal may warrant further evaluation. For more illustrative examples from non-GenAI applications (i.e., how ML could be used), and how it might provide erroneous outputs, see Figure 1 in a study by Kjoersvik and Bate.²⁵

While this example is intended to be illustrative, we emphasize that such systems would face serious challenges: informal language in social media, inconsistent event timing, data linkage difficulties, and the risk of both false positives and missed patterns. Furthermore, integrating data from external sources like EHRs raises concerns around access permissions, interoperability, and data quality. Any real-world deployment would require strong traceability, validation pipelines, and rigorous human oversight to ensure that the signal is both methodologically and ethically actionable. However, it is important to note that even full human-in-the-loop oversight will sometimes not be sufficient by itself to mitigate and minimize all risks associated with erroneous AI outputs.

Ethical, legal, and data integrity considerations

Protecting patient data, respecting privacy laws, and addressing copyright concerns are non-negotiable requirements for the ethical deployment of GenAI in PV. Establishing clear, enforceable guidelines ensures that AI operations remain transparent, ethical, and compliant—fostering trust and supporting regulatory expectations. Ongoing performance monitoring is equally critical, with systems in place to systematically track metrics reﬂecting both operational benefits and potential risks, such as hallucinations, biases, or omissions. Legal review processes should be integrated into AI deployment pipelines to anticipate jurisdictional and international data use limitations.

The rapid growth in AI adoption within PV and other biomedical domains has raised complex ethical challenges. While in-depth discussions are provided elsewhere,^41,42 ethical considerations must remain central as AI capabilities and societal expectations evolve. Multiple ethical dimensions must be addressed in the application of AI, some of which are broadly relevant across domains, while others are specific to PV.^43,44 For example, the ethical use of EHR data and AI has been explored by the Primary Care Informatics Working Group of the International Medical Informatics Association. Additional challenges more specific to AI in PV are also emerging.⁴⁵

While many ethical principles are applicable across all uses of AI, certain issues are especially important in high-risk domains like medical safety and PV. Examples of GenAI-related ethical considerations include balancing the need to maximize data access for insight generation with strong data privacy protections, ensuring explainability in critical use cases where it may be computationally expensive to achieve, and determining the appropriate threshold and timeliness for routine deployment of AI systems once sufficient evidence of performance, generalizability, and actionability has been demonstrated.

Legal frameworks and interpretations continue to evolve, especially in light of the unprecedented scale of content generated using GenAI. This shift places increasing emphasis on unresolved legal challenges such as copyright protection and intellectual property rights.⁴⁶

Given the high-risk nature of PV, at least in part, AI systems must align not only with technical standards but also with established regulatory frameworks. These include international guidelines such as the ICH E2E Pharmacovigilance Planning Framework,²⁷ the EMA’s GVP,³ and emerging FDA guidance on the use of AI/ML in Software as a Medical Device (SaMD).⁴⁷ Although these documents do not yet explicitly address GenAI, they offer foundational expectations regarding validation, oversight, and performance monitoring that remain highly relevant. Future implementations of GenAI in PV will need to consider alignment with these and other evolving standards.

Enhancing transparency, interpretability, and safety

Transparency and interpretability are essential for trust and accountability in AI-powered PV, providing information on why a decision was made. Visualizations of GenAI outputs enable stakeholders to understand AI-generated insights and the model’s uncertainty, reinforcing confidence and facilitating regulatory monitoring. In addition, integrating clear, concise summaries of decision rationales and offering interactive question and answer features where users can query the AI about specific decisions can further enhance transparency. Particularly in high-stakes decisions, automated logging of explanation steps may help support both auditability and user trust.

Multimodal and synthetic data strategies

Expanding GenAI’s application to multimodal data enhances PV by synthesizing insights from varied sources, offering a more holistic view for better decision-making.⁴⁸ Generating synthetic data might address data sparsity issues, creating training datasets that prepare medical professionals for rare cases, thereby strengthening safety measures and PV readiness.⁴⁹ In addition, GenAI’s ability to analyze and aid the integration and interpretation of diverse data streams, such as real-world data (RWD) and social media,⁵⁰ introduces richer perspectives on patient safety, expanding PV’s data landscape, and enhancing the multidimensional approach to patient protection.⁵¹ Care will need to be taken to ensure that such enrichment does not inadvertently introduce bias, leading to poorer outcomes.

Operational integration pathways

Integrating GenAI into existing PV systems requires the coordinated alignment of technical infrastructure and regulatory considerations. While this perspective does not describe a complete working system, we outline potential integration pathways informed by emerging trends in healthcare AI. A data-driven, AI-enabled framework is central to this vision.⁵² For example, GenAI models can be deployed via secure APIs to access metadata and narrative fields from data sources such as the publicly available Freedom of Information version of FAERS (FDA Adverse Event Reporting System), supporting tasks such as summarization and triage. In terminology-driven systems like MedDRA, GenAI could assist with automated coding, ontology mapping, or cross-lingual harmonization. Recent PV implementations demonstrate the feasibility of combining ML outputs with explainability layers to support expert review.^39,40,53

Prompting strategies can be enhanced through RAG pipelines, which extract relevant context from safety databases, PV process documentation, or regulatory guidance prior to model inference.³⁰ Recent work demonstrates how multi-agent orchestration frameworks—such as MALADE—combine GenAI reasoning with external tools and structured query agents for AE extraction.⁵⁴ In other cases, a hybrid architecture routes GenAI-generated hypotheses through symbolic rules for validation—an approach known as symbolic-neural orchestration. These methods enable integration into biomedical workﬂows using prompt tuning, ontology-aware embeddings, and supervised task alignment.²⁸

GenAI is also beginning to appear in end-user workﬂows for literature screening and triage. Studies have shown that LLMs can significantly streamline PV literature review processes while maintaining reviewer-level sensitivity.⁵³ Additional use cases under active exploration include real-time summarization and signal extraction in platforms such as Sentinel, where LLMs are being considered to augment both structured query systems and free-text analysis pipelines.⁵⁵ These developments support embedding GenAI capabilities into PV dashboards to assist with case prioritization, narrative generation, and hypothesis formulation in human-in-the-loop environments.

Hybrid reasoning systems, which combine the ﬂexibility of neural language models with the rigor of rule-based logic, offer a promising approach for GenAI integration in PV. These systems can enhance interpretability by allowing symbolic checks to validate or constrain model-generated outputs—particularly important in contexts demanding traceability and regulatory compliance.²⁸

As GenAI systems evolve toward multimodal capabilities, careful consideration of system design will be required to manage oversight across diverse data streams—such as inputs from AI-driven clinical support systems—and to assess the implications of combining these inputs within PV processes and generating RWD outputs.⁵⁶

Ensuring robust validation and safeguards

Due to the high-risk nature of PV, comprehensive validation processes are crucial. Early involvement of computer system validation experts helps define a pragmatic validation framework aligned with GenAI’s specific requirements, covering data scope, quality control, and regulatory expectations. Validation strategies must be adaptable to both static models and those subject to frequent updates or retraining. User-friendly validation tools can assist developers, facilitating rapid, compliant deployment of AI tools within PV.¹⁹

Governance considerations

The integration of GenAI into PV must be guided by strong ethical and regulatory governance frameworks. Key considerations include ensuring human accountability, managing bias, supporting explainability, and maintaining patient trust. While this perspective does not report on deployed systems, it outlines principles that should guide implementation. These include human-in-the-loop oversight, interpretable models (e.g., using SHAP explanations), and continuous performance monitoring across diverse populations and use cases.

Recent literature has emphasized the importance of placing ethics at the forefront of GenAI deployment in PV and this should be foundational to a governance approach. Jain et al. highlight patient-centered safeguards and call for a structured approach to bias mitigation, transparency, and organizational readiness.⁴⁵ Glaser and Littlebury provide practical governance recommendations, including the role of cross-functional oversight and performance traceability in AI-enabled safety workﬂows.²⁶

Several regulatory frameworks remain relevant today or provide foundational guidance for AI oversight in high-risk domains. These include the ICH E2E Pharmacovigilance Planning Guideline,²⁷ the EMA’s GVP,³ the FDA’s AI/ML SaMD Action Plan,⁴⁷ and the recently adopted EU Artificial Intelligence Act.⁵⁷ While not all of these directly address GenAI at present, they establish key principles—such as transparency, robustness, and risk classification—that can inform GenAI system design and evaluation in PV.

In addition, the draft Council for International Organizations of Medical Sciences Working Group XIV report provides emerging international guidance which includes AI governance in PV, emphasizing the importance of accountability, lifecycle oversight, and human intervention in safety-critical tasks.⁵⁸

Although a full treatment of these ethical and regulatory topics is beyond the scope of this perspective, we acknowledge their importance and refer readers to these evolving resources to inform future implementation efforts.

Future considerations and potential challenges for GenAI in PV

While implementing a PV ecosystem that leverages GenAI technologies offers theoretical promise for improving economic, and operational efficiencies, and support deeper scientific insights, it also represents a significant shift from current PV practices. For GenAI to be routinely and safely incorporated into PV, an AI-PV-QMS agnostic to any GenAI technology will be essential. Such a system must support the rapid evolution of GenAI tools and capabilities while providing a stable foundation for quality assurance and regulatory compliance.

Challenges

Integrating GenAI into PV poses several complex challenges that demand ongoing research and innovation. Key challenges in PV, as well as broader considerations for GenAI, are presented in Tables 2 and 3. These challenges underscore the nuanced requirements of deploying GenAI in high-risk, highly regulated applications like PV. Many notable problems specific to PV are listed in Table 4.

Table 2.

General challenges for PV method testing important for GenAI development.

Challenges
• Lack of gold standards for time-stamped reference sets of true-positive and -negative associations
• PV interpretation combines clinical and quantitative judgment with noisy, often incomplete data, making human factors and behavioral science important components
• Operating in a heavily regulated environment with unique requirements for validation and compliance
• Limited, sparse, and noisy safety data, often missing not completely at random
• Limited access to potentially relevant forms of external data
• Complex processes make it hard to assess the overall impact of AI algorithms
• Ensuring data privacy by manually removing personally identifiable information from rich datasets

GenAI, generative artificial intelligence; PV, pharmacovigilance.

Table 3.

Generic challenges for GenAI and lessons for PV.

Challenges
• Optimizing prompt engineering and providing guidance for better prompt writing
• Determining sample sizes for GenAI experimentation to ensure generalizability
• Developing automated methods to reduce error rates
• Safeguarding against data drift and monitoring LLMs post-deployment
• Managing the fast-paced evolution of LLMs and LMMs
• Optimizing fine-tuning for specific tasks while balancing resource constraints
• Combining LLMs in multi-agent systems and integrating them with other AI/ML tools
• Addressing challenges of combining human and synthetic data for training
• Enhancing interpretability, actionability, and observability of LLM outputs
• Navigating complex copyright and intellectual property concerns

AI, artificial intelligence; GenAI, generative artificial intelligence; LLMs, large multimodal models; ML, machine learning; PV, pharmacovigilance.

Table 4.

GenAI problems, challenges, and possible mitigations of specific importance for PV.

Problem	Challenge	Potential mitigation(s)
Agreeability	A response (particularly textual or visual) seems reasonable, but important content is missing or misleadingly presented, risking inappropriate decision-making. Over-interpretation is a general risk in PV in the analysis of ICSRs.⁵⁹ Performance (when deployed in routine use, even more than experimentation) is likely to be seductive and suffer from the negative effect of “echo chambers”	Careful systematic testing with predefined criteria, overall monitoring of the system to identify outcome-focused performance. Training (ongoing) on the need for critical review of outputs
Confabulation	Infinite number of unpredictable combinations of outputs as final textual or visual output or as an intermediate subsequently classified as a binary or discrete outcome with potential for varying impact on human in the loop, risk bias in testing or routine use as human–computer interaction biased⁶⁰	—
Transient global amnesia	Limited learning across queries from individuals or across individuals. Impacts the quality of outputs and trust in the system as transparency and explainability of learning are hindered	Is explainability necessary? Or just beneficial?
Lack of memorialization	Outputs are instantaneous and non-deterministic. Lack of memorization (of both questions and outputs) makes it harder to convince oneself or others of performance characteristics	Ensure processes and tools are in place to record data and enable follow-up. Ensure rigorous systematic evaluations and that they are recorded, as the lack of memorialization makes it harder to convince oneself or others of performance characteristics
Complacency	AI algorithmic output is reviewed by humans—if it is repeatedly excellent, eventually human review becomes less diligent, introducing the risk of over-reliance in the system and therefore inappropriate decision as risk of drop off in routine use over time, unless mitigation is in place	Deliberately inject noise as outputs and ask human users to identify such phishing and monitor performance over time with preset mitigation strategies.⁶⁰ The path forward rests on designing and deploying AI in ways that enhance human vigilance rather than on relying on 100% human review as sufficient⁶¹
Shortsighted focus	Problem solution not generalized to system-wide performance—effective algorithmic performance on a specific task leads to downstream negative consequences in terms of risk	Comparative testing of computer plus human-to-human plus human, monitoring of performance in routine use
Memorability bias	Human users are likely to remember, or better remember, memorable outputs could negatively affect future decision-making	Hard guard rails³²—automating candidate list of “hard rules” or quantifying use and automated review cycles, etc., can LLMs automate other stuff, for example, use ABPI guidance to train another LLM to propose more hard rules? Training on expert knowledge clearly improves performance in PV,³¹ so automating the list or learning from newer events may be possible, although hybrid hard-coding in combination with automation is likely to be required for the foreseeable future. As others have proposed, the ability in the same system to fact check, in our application presented herein that would have to be done using other tools
Belief in the system	Even if the system is performing, do users believe it? How is trustworthiness articulated?	Robust
Imperfect explainability	Impossible to fully understand AI output	Maximize explainability while accepting that sometimes full explainability is impossible or requiring it can lead to suboptimal algorithmic performance; focus on outcome—measure and monitor—trustworthiness does not require full explainability.⁶ Maximizing interpretability and transparency around the whole process helps too
Over-interpretation	A general risk in PV in the analysis of ICSRs,⁵⁹ and this is only magnified if GenAI produces text that seems more compelling than the underlying data would suggest. There are many examples of LLMs output seeming more confident than it should be about erroneous facts. Better clarification of uncertainty in the LLM output, intrinsic or extrinsic, could mitigate this. “Over-interpretation or misinterpretation is the tendency or temptation to generalize when there is no justification for it. It has also been labeled the anecdotal fallacy”^31,59 This is not a shortcoming intrinsic to the method itself. Over-interpretation may be due to the phenomenon of case reports often having an emotional appeal to readers. The story implicitly claims truth. The reader might conclude prematurely that there is a causal connection. The phenomenon might be more clearly illustrated by the impact of the clinician’s load of personal cases on his or her practice. Here exemplified by a young doctor’s confession: “I often tell residents and medical students: The only thing that actually changes practice is adverse anecdote”⁶²	Uncertainty articulation in outputs using soft guard rails³²
Data privacy	Personally identifiable information exists in PV systems	Prevent training on personally identifiable information without appropriate safeguards in place. Ensure any LLM learning from PV data is firewalled
System dynamism	The PV system continually evolves in terms of data, methods, tools processes	—
Prioritized focus on the rare or idiosyncratic	While much of PV focuses as other fields focus on the known or frequent, equal, or greater weight must be on the rare or unexpected, for example, rare and unexpected emerging safety signals²⁵	Ensure testing focuses not just on overall good performance, but also the ability to effectively identify data outliers and emerging safety information very different from what has been seen before, and ensure ongoing monitoring for this, as well as learnings and feedback loops when any suboptimal outputs occur. AI is fully integrated into PV quality management systems

ABPI, Association of the British Pharmaceutical Industry; GenAI, generative artificial intelligence; ICSR, individual case safety report; LLM, large language model; PV, pharmacovigilance.

Regulatory constraints pose a significant hurdle to the integration of GenAI in PV. The highly regulated nature of PV demands strict adherence to validation and compliance standards, which can introduce redundancies that complicate seamless integration. As regulatory frameworks evolve to accommodate AI, it is essential to assess whether GenAI can provide superior solutions within these constraints or if certain rigid requirements may hinder its effective application. Given the speed of change in GenAI technologies, this will necessarily need to be iterative.

Preparing organizations is critical for the adoption of GenAI in PV. This involves upskilling the workforce, building trust, establishing clear roles between technical and functional teams, and fostering a shared understanding of GenAI’s potential. Initial training on simpler systems and earlier versions of GenAI can help users recognize common pitfalls and prepare for more advanced applications. As GenAI technology becomes more complex, the potential for subtle, hard-to-detect errors increases, making ongoing training essential for identifying limitations and missteps specific to GenAI. Continual monitoring is therefore essential, with purposeful simulation and incorporation of errors into outputs to test and reinforce the robustness and efficiency of human-in-the-loop reviews to detect errors and minimize the risk of complacency.

Understanding optimal human–computer interaction, including fault tolerance and mitigating human error, is crucial for high-risk applications of GenAI. Lessons learned from human factors and behavioral sciences should be incorporated to enhance the collaboration between humans and GenAI systems.

The learning curve for effectively utilizing GenAI in PV requires comprehensive change management strategies to accelerate skill development. Recognizing that users vary in their familiarity with technology and their trust in new systems is essential. Initially, end-users must gain a basic understanding of GenAI’s capabilities and limitations. As GenAI adoption widens, PV professionals need to be proficient in identifying scenarios where LLMs may produce suboptimal or incorrect outputs. These current limitations include handling complex queries, data tables, knowledge cutoffs, context window restrictions, and issues with ordering and clustering information.

To support effective upskilling, PV professionals should experiment with LLMs at an early stage to learn from the more apparent errors in less sophisticated versions. Developers should integrate behavioral considerations and implement safeguards and assistive features to minimize risks. This includes using behavioral forecasting and design thinking to predict risks and identify potential misuses, ensuring that LLM outputs are user-friendly and safe. Practical measures such as output confidence scores, templated responses, consistent tone, and moderation of outputs are vital, particularly for regulated documents. By prioritizing safety in the design of GenAI systems and educating users on common errors, risks can be mitigated, enabling GenAI tools to be fully harnessed.

Opportunities

Despite these challenges, GenAI offers substantial opportunities for enhancing PV. By introducing efficiencies across the PV lifecycle, GenAI can automate many routine tasks and unlock deeper insights from diverse data sources, including RWD and social media.⁵¹ These capabilities may lead to more comprehensive safety monitoring and a multidimensional approach to patient protection.

Furthermore, transparent and interpretable AI outputs are critical for building trust among stakeholders. Enhanced visualization techniques and clear reporting can make AI-driven insights more accessible, fostering confidence and enabling effective regulatory monitoring. As LLMs are integrated into PV, creating a checklist-based framework for evaluating GenAI systems can aid in systematically transitioning experimental GenAI applications into production.

Finally, a proactive approach to ensure that workforce upskilling happens early, providing users with the skills needed to engage effectively with GenAI. This preparation can help mitigate risks associated with GenAI’s early stages and support a smoother transition to more sophisticated, capable systems over time.

Final considerations

To maximize GenAI’s contributions, it is essential to evaluate its impact holistically across PV. Gold standard testing should compare GenAI outputs with existing human review processes, ideally through randomized assessments that isolate AI-generated insights from traditional outputs. By scrutinizing each AI-driven decision for its broader implications, PV can more effectively determine when and where human oversight is required, ensuring that AI integration enhances rather than detracts from patient safety.

While promising, the future of GenAI in PV hinges on careful orchestration. By balancing regulatory demands, robust validation, and innovative training, GenAI can be a powerful tool in advancing PV and protecting patient health.

Conclusion

This perspective review outlines the evolving landscape of GenAI in PV, emphasizing the dual promise and complexity these technologies bring to high-stakes regulatory environments. Integrating GenAI into PV will enable a new era with enhancements in patient safety and optimizations of PV processes. GenAI’s capabilities to automate data-intensive tasks, analyze unstructured data, and generate insights at scale can significantly augment the efficiency with which PV scientists detect and evaluate safety signals. By leveraging these technologies, PV systems can expand beyond traditional structured data sources to incorporate RWD, social media streams, and other emerging datasets—much like widening the lens through which we monitor patient safety.

However, the path to fully integrating GenAI into PV is not without obstacles. GenAI models often function as “black boxes,” raising concerns about interpretability—an essential factor in a domain where understanding the rationale behind a decision is critical. Issues related to data privacy and regulatory compliance add further complexity, resembling constraints in an optimization problem that must be carefully balanced. These challenges are compounded by the novelty of GenAI in regulated domains, which lack precedents for evaluation or deployment at scale. Establishing a robust AI-PV-QMS, implementing structured change management, and providing ongoing workforce training and monitoring are crucial to ensure that GenAI serves as an enhancement rather than a disruption to existing PV practices. Achieving regulatory alignment will require collaborative efforts to balance the innovative thrust of AI with the stringent validation and monitoring standards mandated by regulatory bodies.

As a forward-looking reﬂection, continuous research and development are vital to refine GenAI applications in PV and ensure their responsible use. Establishing gold standards, conducting rigorous validation studies, and developing frameworks that support both human oversight and AI transparency are key steps in transitioning GenAI from experimental phases to routine practice. As the PV ecosystem evolves to integrate GenAI, maintaining a shared vision between technology developers and PV professionals becomes paramount. Furthermore, it remains an open question to what extent GenAI can truly support enhanced interactions between health authorities and regulated pharmaceutical companies, or if regulatory frameworks, which tend to evolve slowly and which do not fully address the potential of GenAI, will impede the potential positive impact that GenAI can afford. Ultimately, success will depend not only on technical innovation but also on cultivating trust, governance, and interdisciplinary collaboration. As best practice and guidance develop, GenAI can potentially act as a powerful tool—a computational catalyst—in advancing a safer and more efficient patient-focused PV landscape.

Footnotes

Acknowledgements

The authors would like to thank Rory Littlebury for his review and feedback on the manuscript. The authors thank Akkodis Belgium (c/o GSK) for publication coordination support.

Declarations

ORCID iDs

Darmendra Ramcharran

Jeffery L. Painter

References

Topol

EJ.

High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25(1): 44–56.

Bate

Stegmann

JU.

Artificial intelligence and pharmacovigilance: what is happening, what could happen and what should happen?

Health Policy Technol 2023; 12(2): 100743.

EMA. Guideline on good pharmacovigilance practices (GVP). Module IX—Signal management (Rev 1). European Medicines Agency, https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-good-pharmacovigilance-practices-gvp-module-ix-signal-management-rev-1_en.pdf (2017, accessed 6 August 2025).

Blank

Gallagher

PD.

NIST special publication 800-30 revision 1. Guide for conducting risk assessments. National Institute of Standards and Technology, Gaithersburg, MD, 2012.

Raimondo

Locascio

LE.

National Institute of Standards and Technology: NIST AI 600-1. Artificial Intelligence risk management framework: generative artificial intelligence profile. National Institute of Standards and Technology, https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf (2024, accessed 28 November 2024).

Stegmann

Littlebury

Trengove

, et al Trustworthy AI for safe medicines. Nat Rev Drug Discov 2023; 22(10): 855–856.

Madiega

European Parliament: Briefing. EU legislation in progress. Artificial intelligence act. European Parliamentary Research Service, https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf (2024, accessed 28 November 2024).

Bate

Hobbiger

SF.

Artificial intelligence, real-world automation and the safety of medicines. Drug Saf 2021; 44(2): 125–132.

Dumouchel

Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat 1999; 53(3): 177–190.

10.

Norén

Orre

Bate

, et al Duplicate detection in adverse drug reaction surveillance. Data Min Knowl Discov 2007; 14: 305–328.

11.

Cocos

Fiks

Masino

AJ.

Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inf Assoc 2017; 24(4): 813–821.

12.

Painter

Haguinet

Cranfield

, et al MSR20 NLP and machine learning to automate identification of suspected medication errors from real world unstructured narratives. Value Health 2023; 26(6): 281.

13.

Bergman

Dürlich

Arthurson

, et al BERT based natural language processing for triage of adverse drug reaction reports shows close to human-level performance. PLoS Digital Health 2023; 2(12): e0000409.

14.

Sorbello

Haque

Hasan

, et al Artificial intelligence-enabled software prototype to inform opioid pharmacovigilance from electronic health records: development and usability study. JMIR AI 2023; 2: e45000.

15.

Olsson

Chapter 31—Recent developments in pharmacovigilance at UMC. In: Vohora

Singh

(eds) Pharmaceutical medicine and translational clinical research. Pharmaceutical Medicine and Translational Clinical Research, Academic Press, 2018, pp. 435–442.

16.

Caster

Sandberg

Bergvall

, et al VigiRank for statistical signal detection in pharmacovigilance: first results from prospective real-world use. Pharmacoepidemiol Drug Saf 2017; 26(8): 1006–1010.

17.

Magge

Sarker

Nikfarjam

, et al

Comment on: “Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts.”

J Am Med Inf Assoc 2019; 26(6): 577–579.

18.

Bourke

Bate

Sauer

, et al Evidence generation from healthcare databases: recommendations for managing change. Pharmacoepidemiol Drug Saf 2016; 25(7): 749–754.

19.

Huysentruyt

Kjoersvik

Dobracki

, et al Validating intelligent automation systems in pharmacovigilance: insights from good manufacturing practices. Drug Saf 2021; 44(3): 261–272.

20.

Kompa

Hakim

Palepu

, et al Artificial intelligence based on machine learning in pharmacovigilance: a scoping review. Drug Saf 2022; 45(5): 477–491.

21.

Painter

Kassekert

Bate

An industry perspective on the use of machine learning in drug and vaccine safety. Front Drug Saf Regul 2023; 3: 1110498.

22.

Wang

Zhang

, et al Safety challenges of AI in medicine. arXiv:2409.18968v2, 2025.

23.

Gilpin

Bau

Yuan

, et al Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA), Turin, Italy, 2018, pp. 80–89.

24.

Manakul

Liusie

Gales

MJF

. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. arXiv:2303.08896v3, 2023.

25.

Kjoersvik

Bate

Black swan events and intelligent automation for routine safety surveillance. Drug Saf 2022; 45(5): 419–427.

26.

Glaser

Littlebury

Governance of artificial intelligence and machine learning in pharmacovigilance: what works today and what more is needed?

Ther Adv Drug Saf 2024; 15: 20420986241293303.

27.

ICH Steering Committee. ICH harmonised tripartite guideline: pharmacovigilance planning E2E. International Council for Harmonisation, https://database.ich.org/sites/default/files/E2E_Guideline.pdf (2004, accessed 6 August 2025).

28.

Kalyan

Rajasekharan

Sangeetha

AMMU: a survey of transformer-based biomedical pretrained language models. J Biomed Inf 2022; 126: 103982.

29.

Dong

Bate

Haguinet

, et al Optimizing signal management in a vaccine adverse event reporting system: a proof-of-concept with COVID-19 vaccines using signs, symptoms, and natural language processing. Drug Saf 2024; 47(2): 173–182.

30.

Painter

Mahaux

Vanini

, et al Enhancing drug safety documentation search capabilities with large language models: a user-centric approach. In: 2023 international conference on computational science and computational intelligence (CSCI), Las Vegas, NV, USA, 2023, pp. 49–56.

31.

Painter

Chalamalasetti

Kassekert

, et al Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language. JAMIA Open 2025; 8(1): ooaf003.

32.

Hakim

Painter

Ramcharran

, et al The need for guardrails with large language models in medical safety-critical settings: an artificial intelligence application in the pharmacovigilance ecosystem. arXiv preprint arXiv:2407.18322v2, 2024.

33.

Brazil

How AI is transforming drug discovery. The Pharmaceutical Journal, 3 July, https://pharmaceutical-journal.com/article/feature/how-ai-is-transforming-drug-discovery (2024, accessed 14 July 2025).

34.

Zhavoronkov

Sanofi goes all-in on AI. Forbes, https://www.forbes.com/sites/alexzhavoronkov/2023/06/21/sanofi-goes-all-in-on-ai/ (2025, accessed 14 July 2025).

35.

Reuters. US FDA launches AI tool to reduce time taken for scientific reviews, https://www.reuters.com/business/healthcare-pharmaceuticals/us-fda-launches-ai-tool-reduce-time-taken-scientific-reviews-2025-06-02/#:~:text=US%20FDA%20launches%20AI%20tool%20to%20reduce%20time%20taken%20for%20scientific%20reviews,-By%20Reuters&text=June%202%20(Reuters)%20%2D%20The,its%20operations%2C%20including%20scientific%20reviews (2025, accessed 14 July 2025).

36.

Niazi

SK.

Regulatory perspectives for AI/ML implementation in pharmaceutical GMP environments. Pharmaceuticals 2025; 18(6): 901.

37.

Glaser

Cranfield

Dsouza

, et al Automating individual case safety report identification within scientific literature using natural language processing. Pharmacoepidemiol Drug Saf 2021; 30(Suppl 1): 118.

38.

Liu

Zhang

, et al DeepSeek-VL: towards real-world vision-language understanding. arXiv:2403.05525v2, 2024.

39.

Warner

Prada Jardim

Albera

Artificial intelligence: applications in pharmacovigilance signal management. Pharmaceut Med 2025; 39(3): 183–198.

40.

Imran

Bhatti

King

, et al Supervised machine learning-based decision support for signal validation classification. Drug Saf 2022; 45(5): 583–596.

41.

The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report: ethical principles and guidelines for the protection of human subjects of research, https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html (1978, 6 August 2025).

42.

World Health Organization. Ethics and governance of artificial intelligence for health. WHO guidance, https://www.who.int/publications/i/item/9789240029200 (2021, accessed 6 August 2025).

43.

Federico

Trotsyuk

AA.

Biomedical data science, artificial intelligence, and ethics: navigating challenges in the face of explosive growth. Annu Rev Biomed Data Sci 2024; 7(1): 1–14.

44.

Liaw

Liyanage

Kuziemsky

, et al Ethical use of electronic health record data and artificial intelligence: recommendations of the primary care informatics working group of the International Medical Informatics Association. Yearb Med Inf 2020; 29(1): 51–57.

45.

Jain

Salas

Aimer

, et al Safeguarding patients in the AI era: ethics at the forefront of pharmacovigilance. Drug Saf 2025; 48(2): 119–127.

46.

Rothman

Abbott

Benjamin

Chapter 20—A utilitarian approach to copyright law and generative artificial intelligence. In: Tusseau

(ed.) Research handbook on law and utilitarianism. Edward Elgar Publishing, 2024, p. 349.

47.

U.S. Food and Drug Administration. Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan, https://www.fda.gov/media/145022/download (2021, accessed 6 August 2025).

48.

Bertsimas

M3H: multimodal multitask machine learning for healthcare. arXiv:2404.18975v3, 2024.

49.

Bakkum

Hartjes

Piët

, et al Using artificial intelligence to create diverse and inclusive medical case vignettes for education. Br J Clin Pharmacol 2024; 90(3): 640–648.

50.

Garrison

Jr Neumann

Erickson

, et al Using real-world data for coverage and payment decisions: the ISPOR Real-World Data Task Force report. Value Health 2007; 10(5): 326–335.

51.

Tekumalla

Banda

JM.

Leveraging large language models and weak supervision for social media data annotation: an evaluation using COVID-19 self-reported vaccination tweets. In: Mori

Asahi

Coman

, et al. (eds.) HCI International 2023—late breaking papers. HCII 2023. Lecture Notes in Computer Science, vol. 14056. Cham: Springer, pp. 356–366.

52.

Painter

Girard

Glaser

, et al Leveraging data pathways for next generation safety monitoring of medicines and vaccines. In: 2022 international conference on computational science and computational intelligence (CSCI), Las Vegas, NV, USA, 2022, pp. 1570–1576.

53.

Zhang

, et al Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study. Front Drug Saf Regul 2024; 4: 1379260.

54.

Choi

Palumbo

Chalasani

, et al MALADE: orchestration of LLM-powered Agents with retrieval augmented generation for pharmacovigilance. arXiv:2408.01869v1, 2024.

55.

Matheny

Yang

Smith

, et al Enhancing postmarketing surveillance of medical products with large language models. JAMA Netw Open 2024; 7(8): e2428276.

56.

Saab

Freyberg

Park

, et al Advancing conversational diagnostic AI with multimodal reasoning. arXiv:2505.04653v1, 2025.

57.

European Union Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act) (Text with EEA relevance). PE/24/2024/REV/1 certain Union legislative acts. Official Journal of the European Union, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689 (2024, accessed 6 August 2025).

58.

CIOMS Working Group. CIOMS Working Group report: Artificial intelligence in pharmacovigilance. Draft, 1 May 2025, https://www.ihi.org/library/publications/patient-safety-and-artificial-intelligence-opportunities-and-challenges-care (2025, accessed 6 August 2025).

59.

Nissen

Wynn

The clinical case report: a review of its merits and limitations. BMC Res Notes 2014; 7: 1–7.

60.

Lucian Leape Institute. Patient safety and artificial intelligence: opportunities and challenges for care delivery. Boston, MA: Institute for Healthcare Improvement, https://www.ihi.org/resources/publications/patient-safety-and-artificial-intelligence-opportunities-and-challenges-care (2024).

61.

Adler-Milstein

Redelmeier

Wachter

RM.

The limits of clinician vigilance as an AI safety bulwark. JAMA 2024; 331(14): 1173–1174.

62.

Stuebe

AM.

Level IV evidence—adverse anecdote and clinical practice. N Engl J Med 2011; 365(1): 8–9.

Orchestrating generative AI in pharmacovigilance: predicting and preempting the unpredictable

Abstract

Plain language summary

Keywords

Introduction

Insights from GenAI experiments in PV

Considerations and practical approaches for GenAI in PV

Understanding GenAI capabilities and limitations

Risk-based planning and experimentation

Human–AI interaction and organizational readiness

Ethical, legal, and data integrity considerations

Enhancing transparency, interpretability, and safety

Multimodal and synthetic data strategies

Operational integration pathways

Ensuring robust validation and safeguards

Governance considerations

Future considerations and potential challenges for GenAI in PV

Challenges

Opportunities

Final considerations

Conclusion

Footnotes

Acknowledgements

Declarations

ORCID iDs

References