Sage Journals: Discover world-class research

Abstract

Advancements in digital pathology and artificial intelligence (AI) have enormous transformative potential for nonclinical toxicologic pathology and are already changing the ways in which pathologists work. However, due to the rapid evolution of digital pathology and AI, the toxicologic pathology community would benefit from an update on these advancements, which can be used to aid drug development. Here we identify key articles published on the use of digital pathology and AI in the field and provide current regulatory statuses and guidelines. For digital pathology, we outline the requirements for equipment, validation processes, workflows, and archiving. Challenges to achieve system interoperability and to establish harmonization through Digital Imaging and Communications in Medicine compatibility are also discussed. For AI, we highlight considerations for model development, including the determination of ground truth, problems that may arise due to bias, and how the accuracy and precision of AI algorithms can be assessed. Finally, we discuss the challenges and potential for AI-assisted toxicologic pathology, picturing a future where technology and scientific expertise work hand-in-hand to improve the quality and efficiency of nonclinical drug safety evaluation. This publication is a deliverable of the European Innovative Medicines Initiative 2 Joint Undertaking, “Bigpicture.”

Keywords

digital pathology computational pathology whole slide imaging toxicologic pathology artificial intelligence regulatory acceptance GLP

Introduction

The evolution from traditional light microscopy into the age of digital pathology is incrementally transforming the field of nonclinical toxicologic pathology. Despite the benefits of digital pathology, integrating this technology into the highly regulated world of drug development has presented the pharmaceutical industry with substantial challenges. The toxicologic pathology community and regulatory bodies have started to address these challenges, but a lack of specific regulatory guidance for the use of digital pathology in the Good Laboratory Practice (GLP) environment is still perceived as a gap in regulatory frameworks by many potential users, creating uncertainty for pathologists and scientists in recent years. Nevertheless, the continued advancement and adoption of digital pathology hold promise for enhancing the efficiency, accuracy, and collaborative potential of toxicologic assessments in drug development.

Several articles have examined the key aspects of digital pathology in the setting of nonclinical toxicologic pathology, from quality aspects and workflow integration to regulatory issues (Table 1). In addition, to address the need for guidance and structure, two workshops were commissioned by the European Society of Toxicologic Pathology (ESTP). In the seventh ESTP International Expert Workshop, an expert panel aimed to define a recommended set of minimal requirements for regulatory acceptance, with the scope of primary slide evaluation and peer review of whole slide images (WSIs).⁵⁷ Agreed-upon recommendations included the concept of WSIs as faithful replicas of original glass slides, measures to maintain image/data integrity, optimizing user training and acceptance, and the suggestion of fit-for-purpose workflow validation.⁵⁷

Table 1.

Key articles on the use of digital pathology in the regulated environment of nonclinical toxicologic pathology.

Authorship	Title	Citation	Overview
Cann J, et al.	Validation of Digital Pathology Systems in the Regulated Nonclinical Environment¹²	Digital Pathology Association, 2011. Available at: https://digitalpathologyassociation.org/_data/cms_files/files/DPA_White_Paper_Final_-_2011-11-17.pdf	• A high-level validation approach for a DPS, including technical aspects and WSS processes
Long RE, et al.	Scientific and Regulatory Policy Committee (SRPC) Paper: Validation of Digital Pathology Systems in the Regulated Nonclinical Environment³⁶	Toxicol Pathol. 2013;41(1):115–124	• Technical requirements for DPS validation in a regulated nonclinical environment, covering planning, DPS qualification and commissioning, system operation, and system decommissioning/retirement
Bradley AE, et al.	Toxicologic Pathology Forum*: Opinion on Considerations for the Use of Whole Slide Images in GLP Pathology Peer Review¹⁰	Toxicol Pathol. 2019;47(2):100–107	• WSI basic methodology, advantages and disadvantages vs light microscopy, WSS validation status, and regulatory acceptance concerns
Gauthier BE, et al.	Toxicologic Pathology Forum*: Opinion on Integrating Innovative Digital Pathology Tools in the Regulatory Framework²⁷	Toxicol Pathol. 2019;47(4):436–443	• A summary of the 2018 Annual Congress of the French Society of Toxicologic Pathology on questions and challenges for integrating digital pathology within a GLP framework
Schumacher VL, et al.	The Application, Challenges, and Advancement Toward Regulatory Acceptance of Digital Toxicologic Pathology: Results of the 7th ESTP International Expert Workshop (September 20-21, 2019)⁵⁷	Toxicol Pathol. 2021;49(4):720–737	• Minimal requirements for acquiring regulatory acceptance of digital toxicologic pathology in research and development, covering primary slide evaluation and peer review on WSIs
Bradley AE, et al.	Proof of Concept: The Use of Whole-Slide Images (WSI) for Peer Review of Tissues on Routine Regulatory Toxicology Studies⁹	Toxicol Pathol. 2021;49(4):750–754	• A study showing how accessing and assessing tissues for remote peer review is feasible and diagnostically equivalent to conventional light microscopy
Jacobsen M, et al.	Utilizing Whole Slide Images for the Primary Evaluation and Peer Review of a GLP-Compliant Rodent Toxicology Study³³	Toxicol Pathol. 2021;49(6):1164–1173	• An approach for delivering GLP validation of WSIs and the associated workflow for a GLP-compliant toxicity study
Forest T, et al.	Scientific and Regulatory Policy Committee Points to Consider: Primary Digital Histopathology Evaluation and Peer Review for Good Laboratory Practice (GLP) Nonclinical Toxicology Studies²⁶	Toxicol Pathol. 2022;50(4):531–543	• Digital pathology aspects relevant to nonclinical histopathology evaluation and points to consider
Boisclair J, et al.	IT/QA and Regulatory Aspects of Digital Pathology: Results of the 8th ESTP International Workshop⁸	Toxicol Pathol. 2022;50(6):793–807	• GLP qualification and validation of digital histopathology systems
Lempp C, et al.	A Minimal Approach to Demonstrate Concordance of Digital and Conventional Microscopy in Toxicologic Pathology³⁵	Toxicol Pathol. 2024;52(5):251-257	• Nonclinical safety study material allows statistical comparison of concordance rates for glass and digital slide evaluation, demonstrating the utility of digital slides for nonclinical safety evaluation

Abbreviations: DPS, digital pathology system; ESTP, European Society of Toxicologic Pathology; GLP, Good Laboratory Practice; IT, information technology; QA, quality assurance; WSI, whole slide imaging; WSS, whole slide scanner.

The eighth ESTP International Expert Workshop discussed in greater detail how to fulfill the regulatory requirements for the qualification and validation of digital histopathology evaluations in the GLP environment, which ensures that histopathology results submitted to regulatory authorities are of sufficient quality and rigor, and are verifiable.^8,48,50 It was determined that the general validation, qualification, and implementation pathways for digital pathology should follow general principles applied to other GLP processes, and that close collaboration between pathologists, information technology (IT) experts, quality assurance (QA) experts, and regulatory officials are needed to ensure the success of digital pathology in GLP-compliant pre-clinical studies.⁸

In computational pathology, while existing literature and regulatory guidelines predominantly address clinical applications, many of these principles are also applicable to toxicologic pathology. Based on the overall potential of artificial intelligence (AI), recognized for medicine as a whole and for pathology in particular, the implementation of AI solutions into digital toxicologic pathology needs to be considered, despite the challenges that will have to be overcome when incorporating AI into existing digital pathology workflows, especially under GLP regulations.

As technology is a rapidly evolving field and much has been published in recent years on the advancement of digital and computational pathology, one of the aims of the European Innovative Medicines Initiative (IMI) project, Bigpicture, is to provide an update on their current state in nonclinical toxicologic assessments for drug development. Ongoing efforts to adopt and integrate these novel approaches into existing workflows are discussed herein, with a focus on regulatory requirements and recommendations, as well as challenges and evolving standards and techniques. The future perspectives of these innovative technologies and their impact on the profession are also discussed.

Digital Pathology in Toxicologic Pathology

Digital pathology, defined as an image-based environment that enables the acquisition, management, and evaluation of pathology information generated from digitized glass slides (WSIs),²⁷ has become more and more widely established in nonclinical toxicology in recent years,²⁵ at least under non-GLP conditions. Digital pathology should be able to provide pathologists with an equivalent experience to that of traditional light microscopy; however, the advantages of digital pathology over light microscopy are plentiful.^10,27,30,40 The benefits that digital pathology can offer primarily include improved efficiency, increased data security, and a reduced need for travel or shipping glass slides between contract research organizations (CROs) and sponsor facilities, while simultaneously enabling remote working and virtual real-time consultations.⁵⁷ Second, access to WSIs enables the adoption of analysis techniques, such as the use of image analysis, AI, and content-based image retrieval (CBIR) systems.⁵⁷ Third, a repository of WSIs can be a valuable resource to efficiently consult for background findings, unique study designs, tissues, findings, or species.⁵⁷ Therefore, efforts are being made to validate workflows for use under GLP conditions.^33,35

In the clinical environment, several digital pathology systems (DPSs) have been cleared as medical devices through the United States Food and Drug Administration (FDA) 510(k) approvals (as for the Philips IntelliSite Pathology Solution and the Aperio AT2 DX System), whereas for toxicologic pathology, an institutional fit-for-purpose validation approach is more adequate.²⁶ “Fit for purpose,” meaning suitable for its intended use, was originally a concept for the validation of analytical methods and signifies that, in the context of digital histopathological evaluation, the processes in place as defined in standard operating procedures (SOPs) and protocols performed by trained personnel to use the qualified DPS, allow the pathologist to reliably perform their histopathological assessment.⁸ Working groups recommend that GLP DPS validation packages for toxicologic pathology demonstrate substantial equivalency or non-inferiority to traditional light microscopy in terms of sensitivity and specificity (and, at the end, in the evaluation results obtained), and that the selected slides used should support the intended use.^35,57 This can be achieved by testing with representative tissue materials from nonclinical toxicology systems during installation.⁵⁷

The approaches undertaken to deliver a GLP validation of WSIs and the associated workflow for the digital primary evaluation and peer review of a GLP-compliant toxicity studies were previously described.^33,35 The authors demonstrated that WSIs can effectively replace traditional glass slides for histopathological assessments and provide a robust digital alternative for nonclinical toxicologic pathology.^33,35 By sharing their approaches, the authors encouraged others to adopt WSI use in a GLP-compliant manner.^33,35

Status of Regulatory Guidelines

In June 2020, the Organization for Economic Co-operation and Development (OECD) issued and published the regulatory position on the use of digital pathology in a GLP environment through a Frequently Asked Questions (FAQ) document, which includes information on the use of digital pathology in regulated nonclinical toxicology studies.⁴⁶ The document emphasizes the importance of establishing the WSI as a “faithful replica of the original histology slide,” ensuring equipment and software are fit for purpose, maintaining data integrity, and warranting that the study can be reconstructed, if required.²⁶ In May 2023, the U.S. FDA also released a guidance document specifically addressing the use of WSIs in nonclinical toxicology studies.⁶⁷ Their guidance, presented in a short Question and Answer document, provides some basic interpretations and covers technical aspects of WSI systems, the importance of proper documentation and data integrity, and the requirements for archiving WSIs under various work scenarios in GLP studies.⁶⁷ Together, these recommendations aim to ensure that nonclinical digital pathology can be implemented in a way that maintains the high standards of scientific quality, accuracy, and data integrity that are required under GLP. The main considerations from these two guidance documents are listed in Table 2. In addition to this guidance from regulators, further recommendations derived from scientific publications are discussed below.

Table 2.

Main considerations from OECD and FDA guidance documents regarding WSI use in GLP nonclinical toxicity studies.^46,67

Regulatory Considerations
• A WSI is not considered to be an exact copy of the glass slide, but rather a faithful replica of the original glass slides, and should include all elements (image acquisition, image processing, image file storage, and display of images) needed for histopathological examination or pathology peer review; the digitization process should ensure traceability and integrity from the tissue sample to the digitized slides• If WSIs are assessed in lieu of the original glass slides or for retrospective peer review and result in the generation of raw data, the WSI files should be retained as study records and archived appropriately; consideration should be given to ensure that archived digital images remain viewable once software/hardware updates/versions are implemented or decommissioned• Any technical image processing modifications should be documented and retained, without obscuring elements captured from the glass slides (such as labels or artifacts); non-permanent adjustments made by the pathologist using the image viewing software during WSI evaluation (e.g., brightness, contrast, annotations) do not need to be documented or retained• Written procedures for WSI processes should be in place for pathologists and personnel (e.g., for training, slide scanning, software management, file access, and exchange control)• If WSIs are used to generate raw data, the WSI DPS (software and hardware) should be validated in a manner specific to the intended use, including all computerized systems implemented to manage equipment and store, or send/receive digitized slides• If WSIs are used to generate raw data, including when transmitted to external users, they should be protected to prevent loss or alteration of data, maintain chain of custody, control access, and secure data systems and data transmission (requirements under 21 CFR part 11 to maintain WSI image file integrity)• The study plan, signed pathology report, and pathology peer review statement should state whether glass slides or WSIs were used for histopathological evaluation

Regulatory Considerations

• A WSI is not considered to be an exact copy of the glass slide, but rather a faithful replica of the original glass slides, and should include all elements (image acquisition, image processing, image file storage, and display of images) needed for histopathological examination or pathology peer review; the digitization process should ensure traceability and integrity from the tissue sample to the digitized slides• If WSIs are assessed in lieu of the original glass slides or for retrospective peer review and result in the generation of raw data, the WSI files should be retained as study records and archived appropriately; consideration should be given to ensure that archived digital images remain viewable once software/hardware updates/versions are implemented or decommissioned• Any technical image processing modifications should be documented and retained, without obscuring elements captured from the glass slides (such as labels or artifacts); non-permanent adjustments made by the pathologist using the image viewing software during WSI evaluation (e.g., brightness, contrast, annotations) do not need to be documented or retained• Written procedures for WSI processes should be in place for pathologists and personnel (e.g., for training, slide scanning, software management, file access, and exchange control)• If WSIs are used to generate raw data, the WSI DPS (software and hardware) should be validated in a manner specific to the intended use, including all computerized systems implemented to manage equipment and store, or send/receive digitized slides• If WSIs are used to generate raw data, including when transmitted to external users, they should be protected to prevent loss or alteration of data, maintain chain of custody, control access, and secure data systems and data transmission (requirements under 21 CFR part 11 to maintain WSI image file integrity)• The study plan, signed pathology report, and pathology peer review statement should state whether glass slides or WSIs were used for histopathological evaluation

Abbreviations: CFR, code of federal regulations; DPS, digital pathology system; FDA, U.S. Food And Drug Administration; GLP, good laboratory practice; OECD, Organization for economic co-operation and development; WSI, whole slide imaging.

GLP Status of WSI

WSIs and their metadata, although digitally replicated from a physical specimen, do not meet existing definitions of specimens or raw data.²⁶ The current GLP status of WSI for nonclinical toxicologic pathology is that use of WSIs in lieu of glass slides is acceptable, providing it can be proven that WSIs are a faithful representation of the original glass slides, including all elements needed for histopathological examination or pathology peer review.^46,67 It is important to note that a “faithful replica,” a term mentioned in the OECD FAQ document, is not defined with specific GLP requirements. A WSI, considered a faithful replica, is created from a specimen (i.e., a histology slide) and contains all necessary information (i.e., the labeling, markings, artifacts, and defects), whereas a true copy is a version of a record verified to contain the same information as the original record.²⁶

The current regulatory guidance from the U.S. FDA foresees that WSIs and associated metadata need to be archived if they contribute to the creation of raw data or study reconstruction in GLP studies.¹³ This is the case for digital primary study evaluation and for retrospective peer review. Also, under certain circumstances of contemporaneous peer review, WSI retention would be required if WSIs are the basis for resolution of differences of opinion between pathologists, if there is an issue resolution process documented in the peer review memo, or if there is a peer review in which histopathology findings are only evident in the WSIs.²⁶ Although not explicitly required by the OECD, currently, in the scenario of primary slide evaluation on glass slides and contemporaneous digital peer review, archiving of WSIs can only be avoided if any discrepancy between pathologists is resolved using original tissue sections.²⁶ This is not a practical solution. In the future, with growing confidence in the equivalency of glass and digital slides, and the ongoing dialog with regulatory agencies, this requirement should possibly be reviewed if certain requirements are met in terms of “faithful replication” or “concordance.” Where there are disagreements between pathologists, the toxicologic pathology community considers that it can be left to the scientific judgment of the two pathologists involved, to base their agreement on the WSIs without the necessity for archiving. This flexibility would open the possibility for many institutions to use digital peer review without the need to establish a GLP-compliant digital archive solution. The concept of WSIs being equivalent to glass slides is discussed further in Box 1.

Box 1.

Concept of equivalency/concordance studies.

The concept of WSIs being equivalent to traditional glass slides for interpretation by the pathologist is based on the principle that systems must be fit for purpose; since traditional light microscopy is considered fit for purpose, the evaluation of WSIs should be equivalent for the intended use.²⁵ In the clinical environment, comparison studies were performed at large scale, comparing pathologists’ diagnoses between glass slides and WSIs, and both underpinned and preceded the qualification and subsequent 510(k) clearance of WSIs for medical use.²⁵ For nonclinical use, the scope and design of such studies may be adapted on an institutional level.²⁵ Ultimately, undertaking such an effort involves carefully defining the extent of concordance needed to qualify DPS components to match the intended use.²⁵ It has been proposed to differentiate between “equivalency testing” (an unblinded confirmation of a finding/lesion with the digital method) and “concordance testing” (a blinded comparison of two methods with statistical evaluation).¹³ Such validation of diagnostic concordance between light microscopy and digital pathology can be designed using the following three-tiered approach:³⁶

1) Confirm that various tissues/organs and cellular elements can be identified in digital images. The number of tissues/organs should be large enough to cover relevant organ systems and/or species

2) Confirm that a spectrum of basic pathological processes, such as inflammation and necrosis, can be identified in the digital images

3) Confirm that histological changes observed in representative toxicology studies via light microscopy can be identified via the evaluation of digital images

A minimal approach to demonstrate concordance of digital and light microscopy in a toxicologic pathology setting was described by Lempp et al., in 2024. ³⁵ Using a benchmark study design, the study evaluated and compared the diagnostic intra-observer concordance rate between both viewing modalities and demonstrated that digital slide evaluation fits the purpose of nonclinical safety evaluation, adding to other reports of successful workflow validation and supporting digital pathology in the regulatory field. ³⁵ This study may serve as a proof of concept to demonstrate that digital examination of WSIs leads to a similar degree of intra-observer agreement as the examination of glass slides. However, there is still some uncertainty about whether it is necessary for each institution to perform their own concordance studies, tailored to company-specific workflow and use cases, or whether the recent successful concordance study can be relied upon for all future toxicity studies. ³⁵

Requirements for IT Equipment

A core requirement for the successful implementation of digital pathology is the availability of high-resolution slide scanners and monitors, as well as computers that fulfill specific hardware requirements.⁸ Scanners must be capable of capturing images at multiple magnifications with high fidelity, preserving the quality of the original glass slides.¹⁰ Calibrated, high-resolution, large-screen monitors should ensure there is consistent color representation to allow for accurate image interpretation;^26,57 however, pathologists should be aware that ambient lighting and reflections can also affect display performance.¹⁰ The colors displayed on any viewing device should be vivid, with good white balance capabilities, and the display should be free from glare.¹⁰

Color calibration procedures for both scanners and monitors must be described in relevant SOPs.⁵⁷ A process to ensure consistent color reproduction across WSI scanners and enhance color homogeneity in WSIs has recently been described.¹⁵ The process includes two modules: (1) assessing scanner color reproducibility and (2) applying color correction to minimize deviation/variation.¹⁵ Color variability can arise from inconsistencies in slide preparation, scanner hardware, and display devices.¹⁵ Techniques reported to address this issue included stain normalization, internal color calibration, and external monitor calibration.¹⁵ While some pathologists consider color standardization less critical for digital primary reads and peer review due to the adaptability of toxicologic pathologists to different stain profiles, it is more essential for image analysis purposes.⁸ Therefore, routine scanner calibration and maintaining minimum standards for monitor quality and calibration are recommended.⁸

As digital pathology can generate a large amount of data, associated IT infrastructure must be able to support high-speed data transfer and ensure storage systems can accommodate large collections of image files.⁵⁷ GLP-compliant archiving is further discussed below, but cloud-based storage solutions are increasingly being used to support multi-site collaborations; however, these systems must comply with GLP data integrity requirements, including those for traceability and secure data access, and their use as a GLP archive is still a matter of debate.^27,57 Protecting the integrity and confidentiality of nonclinical study data are also critical and IT infrastructure supporting digital pathology must comply with cybersecurity standards to prevent unauthorized access, data breaches, or loss of data.⁸

Validation Requirements for Equipment and Workflow

The use of WSI in GLP toxicity studies requires validation of the entire digital pathology workflow, from glass slide generation/labeling/scanning to viewing at the pathologist’s workstation through archiving. DPSs involve multiple hardware and software interfaces, and so there is potential for unnoticed critical failures that may produce a nonobvious loss of image quality.⁵⁷ To address this, all equipment used in this workflow must be qualified and staff must be adequately trained.²⁷ Risk-based validation efforts should consider the entire “pixel pathway” from image acquisition to display as a single system, acknowledging that, with time, some components such as monitors, may need to be swapped/upgraded without a requirement for revalidation if performance remains fit for purpose.^8,57 Components of this pathway may vary according to hardware/software elements, the number of sites and their geographic locations, as well as the business need and intended use cases.⁸ If the entire pathway occurs at a single institution (either single- or multi-site), the qualification and validation are the sole responsibility of the institution.⁸ On the contrary, if the pathway results from a collaboration between institutions (i.e., between a CRO and a Sponsor), communication must be optimal as each partner is accountable for their respective activities, and this accountability must be clearly outlined in their respective validation documentation, written procedures, and in the study documentation.^8,33 In addition, in multisite scenarios across international borders, one must be aware of possibly notable differences between national GLP regulations if pathologists are generating data in remote locations.³²

The validation of GLP-compliant DPSs should ensure scanned images accurately represent the original glass slides, with no omission of tissue areas and no loss of quality or data integrity throughout the workflow.^3,33,57 DPSs are subject to the same requirements as any other GLP-computerized system, meaning they must meet AlCOA standards of being Attributable, Legible, Contemporaneous, Original, and Accurate.⁵⁶ As such, security measures ensuring image and metadata integrity must be validated. Validation processes should demonstrate that there is limited and authorized access to systems, WSIs, and metadata, and that roles have been established to define the level of access pathologists and other study personnel have to study material.³⁶ Electronic records should be retained securely and the ability to expediently retrieve the records must be in place.³⁶ In addition, raw images must remain unmodifiable.³⁶ If any alterations are required, these should not change the raw image but should instead be layered over it; these changes should be tracked via audit trails employing user-independent computer-generated time stamps.³⁶ In contrast to this, simple annotations like pins may be considered as pathologist notes and do not need to be validated nor retained. Another consideration to address during the planning of validation is the security of the WSIs and metadata when transmitting to external users.³⁶

An example outline for the validation of a scanner and image acquisition software for nonclinical toxicologic pathology has been previously published.⁵⁷ The plan states that test cases should be selected from relevant study types, and user acceptance criteria should be established to ensure digital images faithfully replicate the original glass slides, findings are visible, and annotations/labels are correct.⁵⁷ Additional outcomes of the validation plan are to ensure the fulfillment of user acceptance criteria, the purpose of the validation, stating in which cases the scanner will be used and to ensure responsibilities for troubleshooting are assigned.⁵⁷

Following a successful validation, the use of WSIs on GLP toxicity studies should be detailed in study plans and SOPs, and must cover quality control (QC), metadata, image inclusion/exclusion criteria, and chain of custody, referencing any relevant SOPs for the image management system (IMS) and WSI archiving.²⁶

GLP-Compliant Archiving of WSI and Metadata

To enable digital slide evaluation of GLP studies, dedicated storage must be provided, which requires the use of qualified servers.⁸ The associated database must also be subject to minimum requirements (including, but not limited to, defined roles, user account management, and audit trails, etc.). In addition, the current regulatory view on use cases necessitating the archiving of WSIs and metadata is that archiving is required for a primary read (generating raw data) or for a retrospective peer review, just as glass slides and corresponding blocks should be archived appropriately.^27,49 In situations where WSIs should be archived, the retention period should be consistent with applicable data retention policies and national regulations.²⁶

GLP-compliant archiving of WSIs presents unique challenges, particularly regarding long-term data preservation. WSI files must be stored securely, with appropriate audit trails, and GLP compliance requires that archived files should remain accessible and unaltered as they move through the system, between institutions, and over time.⁵⁷ One of the key considerations for archiving is how to manage technology obsolescence and ensure that archived WSI file formats can still be accessed decades into the future.^9,33 This is a particular concern for the archiving of scanner-proprietary file formats. In addition, if “Software as a Service” is used in the digital pathology workflow, it must be ensured that WSIs are still readable in their native format, even if the service ends (exit strategy). The aspect of system interoperability is further discussed in Box 2.

Box 2.

System interoperability.

The lack of interoperability between different DPSs is a major hurdle for the field of toxicologic pathology. Until recently, high-throughput digital slide scanners all had their own proprietary file format, leading to the development of specific tools and libraries to read each of those formats.¹⁶ Although these file formats can have specific individual advantages, efforts are ongoing to establish harmonization through Digital Imaging and Communications in Medicine (DICOM) compatibility. Use of a single standard image format will help preserve the utility of WSIs and help comply with the ALCOA standard.^56,57 Therefore, developing a standardized file format is a priority for ensuring seamless data exchange.²⁶ The DICOM standard defines formats for image pixel data and metadata encoding, as well as protocols for exchange and management.⁴⁵ DICOM is a dedicated biomedical standard that is ubiquitous in clinical radiology and is increasingly being applied across medical imaging specialties that have made the digital transition within the healthcare setting.¹⁷ It enables ‘FAIR’ data capture during image acquisition and standardization of file formats, ensuring images/data are Findable, Accessible, Interoperable, and Reuseable.^16,28,72 The main benefits of DICOM over other formats include DICOM’s comprehensive biomedical-specific information model, offering precise metadata definitions for identifying and describing subject and acquisition information specific to applications; a feature universal across all specialties and modalities.¹⁶ There is also no other standard that includes protocols for the automated exchange and management of objects.¹⁶ Additionally, DICOM is a fully open standard that is free to access, read, and implement without any licensing fees, and is also available in a machine-readable format, making toolkit and library updates simpler.¹⁶ There are also high-quality open-source reference implementations available for DICOM, along with commercially supported toolkits for most common platforms and programming languages.¹⁶ Another benefit is that backward compatibility is a core principle of DICOM, which ensures changes add new features without invalidating already-installed base systems, software, and archived images.¹⁶ Industry pressure is now influencing scanner manufacturers to develop systems that scan directly to the DICOM format. The Digital Pathology Association and the medical community widely recognize and promote the use of DICOM as a standard for digital pathology image management and interoperability, facilitating data sharing and integration across different systems and vendors. DICOM has now been implemented by hospitals, diagnostic centers, and pharmaceutical companies (personal author experience). Additionally, the DICOM format has been adopted by the European IMI Bigpicture project for all images uploaded to the WSI repository.

Both on-site and cloud-based solutions can be employed for WSI archiving, each with unique advantages and challenges. The choice between data storage on premises or in the cloud for archiving should be based on suitable risk assessments, and the following regulatory guidelines should be considered: OECD documents #17⁵⁰ (including Supplement 1)⁴⁷ and #22,⁵¹ and the U.S. FDA guidance for industry on the use of WSI in nonclinical toxicology studies.⁶⁷ On-site storage solutions are more controlled but require significant investment in infrastructure and maintenance.⁵⁷ Systems that use cloud-based solutions offer scalability and remote access, including read-only access. However, they may face challenges if stored in unknown or geographically dispersed locations, as regulatory inspections often require on-site visits under general GLP requirements so that protection measures can be inspected.^8,57 To avoid such problems, some countries only allow archiving of GLP data in GLP-certified facilities, which is often not the case for data centers. For cloud-based solutions in GLP environments, irrespective of whether they are internally managed or outsourced to an external cloud service provider, it is critical that appropriate knowledge, awareness, and oversight, and control of the system remain with the test facility. Transparency and awareness of the responsibilities of all involved parties is crucial and this is outlined in the OECD document #17 Supplement 1.⁴⁷ For the implementation of a cloud-based solution, the following key elements are instrumental for its GLP compliance: a detailed risk assessment, thorough cloud service provider assessment(s), clearly defined service level agreements, and the validation of the computerized systems hosted in cloud-based services.⁴⁷ Therefore, cloud-based solutions should be carefully considered and based on an open dialogue with national GLP authorities.⁸

Any IT infrastructure supporting long-term WSI storage should guarantee data is maintained over time and allow data access/retrieval in the event of regulatory inspection.⁸ The physical address of data centers hosting WSIs and their metadata may be required by some inspecting regulatory agencies, and secured access and GLP compliance, including disaster recovery plan, should also be specified in the service-level agreement when WSI storage/archival is taken care of by a third party.⁸

An increasing number of software vendors are turning to cloud-based storage solutions, suggesting regulators and QA should consider virtual inspections to verify data center compliance.⁵⁷ Without this, the progress of digital pathology could be hindered, necessitating the invention and implementation of alternative technical solutions.⁵⁷ Archiving systems must, therefore, balance accessibility with the need to preserve data in an unaltered state over time, and this challenge is exacerbated by the growing size of WSI files.²⁶

Although debates are still ongoing regarding which file format should be archived, the original file from the scanner or the copy assessed by the pathologist, a risk-based decision is warranted.^50,51 Other outstanding questions include:⁵ Will cloud-based archiving be accepted universally by GLP authorities? What requirements must be fulfilled for archives to serve as read-only database simultaneously? Will the increasing confidence in the digital pathology workflow lead to more flexibility in terms of GLP retention period of WSIs if certain requirements are met in terms of “faithful replication” or “concordance,” since the original glass slides will always allow reconstruction of the study?

Lessons Learned and Outlook

The success of implementing a GLP-compliant, end-to-end digital pathology workflow hinges heavily on the collaborative efforts between the main stakeholders. These stakeholders include:

End users, including pathologists, laboratory technicians, and archivists.

IT colleagues, who outline and ensure adherence to a validated software development life cycle.

QA colleagues, who provide compliance oversight according to available regulatory guidance and assist in drafting the necessary documents that demonstrate all components were validated according to their intended use.

The rapid pace of technological advancement in digital pathology poses a substantial challenge for regulatory compliance and validation efforts. The pathologists and IT teams must continually re-validate through change control processes to account for new software updates, scanner models, and image management platforms.³³ Once digital pathology is better established, the integration of AI tools, which are discussed below, may further augment pathologists’ capabilities.^8,27,33

AI in Toxicologic Pathology

In September 2021, the first U.S. FDA market authorization for an AI system in pathology was granted to Paige Prostate Detect, a software algorithm intended to assist pathologists in the detection of foci suspicious for cancer in WSIs from prostate needle biopsies.¹¹ Classified as a Class II medical device under the generic name “software algorithm device to assist users in digital pathology,” special controls were established for Paige Prostate by the U.S. FDA, including extensive requirements regarding design verification and validation, across three key areas:⁶⁶

A detailed description of the device software, including its algorithm and its development, together with a description of any datasets used to train, tune, or test the software algorithm.

Analytical studies to demonstrate acceptable analytical device performance, including detailed documentation.

Validation studies with clinical specimens and detailed documentation.

This first market authorization of an AI tool in diagnostic pathology can be seen as a door-opener for strategies to obtain regulatory acceptance of AI tools for routine clinical work.⁵⁴

Regulatory Status and Guidance

Current regulatory status of AI in nonclinical toxicologic pathology

Despite the U.S. FDA’s approval of the Paige Prostate Detect system, the limited regulatory guidance for a validation concept of AI is still the greatest hurdle for the successful development and implementation of AI in the lifecycle of medicines, especially in the field of toxicologic pathology under GLP.⁵³ Machine learning (ML) techniques integrated into image analysis software, such as random forest and neural networks, enable the segmentation, classification, and quantification of histological structures. However, these approaches often rely heavily on manually engineered features and exhibit low reusability. Subject matter experts typically tailor the software for specific studies, analyze the data, and frequently need to redesign the system for subsequent studies. Examples of qualification methods of AI-enhanced image analysis software applied to toxicologic pathology have been previously described.⁷³ To the authors’ knowledge, more advanced AI tools that can eliminate the necessity for time-consuming manual feature engineering have not yet been validated or widely deployed in the field of toxicologic pathology. While AI tools capable of WSI QC, slide triaging, and/or quantifying specific common lesions in rats’ organs are emerging, they still require validation. Nevertheless, since the end of the last decade, a number of publications from clinical and toxicologic pathologists, as well as opinion pieces and white papers resulting from initiatives of professional societies, have started to outline basic principles for AI tool generation and application. Publications covering key aspects regarding the use of AI in toxicologic pathology are listed in Table 3.

Table 3.

Key articles on the use of AI in nonclinical toxicologic pathology.

Authorship	Title	Citation	Overview
Abels E, et al.	Computational Pathology Definitions, Best Practices, and Recommendations for Regulatory Guidance: A White Paper from the Digital Pathology Association¹	J Pathol. 2019;249(3):286–294	• A review of computational pathology with a focus on its application to histological images
Turner OC, et al.	Society of Toxicologic Pathology Digital Pathology and Image Analysis Special Interest Group Article: Opinion on the Application of Artificial Intelligence and Machine Learning to Digital Toxicologic Pathology⁶³	Toxicol Pathol. 2020;48(2):277–294	• A review of AI and ML applications in digital toxicologic pathology
Aeffner F, et al.	Special Issue on Digital Pathology, Tissue Image Analysis, Artificial Intelligence, and Machine Learning: Approximation of the Effect of Novel Technologies on Toxicologic Pathology²	Toxicol Pathol. 2021;49(4):705–708	• Highlights a range of key topics, including important points to consider
Turner OC, et al.	Mini Review: The Last Mile-Opportunities and Challenges for Machine Learning in Digital Toxicologic Pathology⁶⁴	Toxicol Pathol. 2021;49(4):714–719	• Summarizes opportunities and challenges for applying ML algorithms in toxicologic pathology
Zuraw A, et al.	Developing a Qualification and Verification Strategy for Digital Tissue Image Analysis in Toxicological Pathology⁷³	Toxicol Pathol. 2021;49(4):773–783	• Common digital tissue image analysis endpoints and sources of analysis errors, and recommended approaches for ensuring quality and correctness of results
Mehrvar S, et al.	Deep Learning Approaches and Applications in Toxicologic Histopathology: Current Status and Future Perspectives⁴¹	J Pathol Inform. 2021;12:42	• Reviews major DL concepts, emerging applications, and examples of DL in toxicologic pathology image analysis
Palazzi X, et al.	Results of the European Society of Toxicologic Pathology Survey on the Use of Artificial Intelligence in Toxicologic Pathology⁵³	Toxicol Pathol. 2023;51(4):216–224	• Survey results from 2021 demonstrate variability of AI awareness among toxicologic pathologists

Abbreviations: AI, artificial intelligence; DL, deep learning; ML, machine learning.

In November 2019, Abels and other experts of the Digital Pathology Association published a white paper on best practices and recommendations for computational pathology, with a focus on its application to WSI.¹ As a general principle for training any AI algorithm, it is recommended to split annotated data into “training” and “validation” datasets and to compare results with a test/‘ground truth’ dataset, established by subject matter experts (Figure 1). Likewise, in the field of toxicologic pathology, the importance of strictly separating training, validation, and test datasets has been emphasized.² In this regard, the “gold-standard paradox”⁴ arises from the dilemma that the gold standard is histopathological results from a human pathologist, whereas the algorithm data may, in fact, be more reproducible than human assessment; however, this is still matter of debate.¹

Figure 1.

Data split during AI model development and final assessment.

Earlier in 2019, the Special Interest Group on Digital Pathology and Image Analysis of the Society of Toxicologic Pathology published an opinion piece introducing AI and ML to the toxicologic pathology community.⁶³ Besides defining relevant terminology, describing data generation and interpretation, and giving examples of use cases, they also suggested some basic parameters to be covered in AI model validation (i.e., AI model accuracy and precision, detection and quantitation limits, linearity, range and robustness, plausibility, relevance to target/mechanism in question, and the link to outcome of a disease or toxicity).⁶³ The concept of eXplainable AI (XAI),^6,64 addressing the requirements of AI transparency, builds on this opinion piece. XAI aims to foster greater trust in AI, ensuring algorithms are rigorously validated and their outputs are scientifically sound, to ultimately increase the adoption and integration of AI. The U.S. FDA and European Medicines Agency (EMA) recognize this need and are working to establish guidelines that promote the responsible use of AI and ensure regulatory standards are adhered to.

U.S. FDA discussion paper and request for feedback on the use of AI/ML

In May 2023, the U.S. FDA released a discussion paper and request for feedback: “Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products” to facilitate a discussion with stakeholders.⁶⁸ Resulting from a collaboration among the U.S. FDA’s Center for Drug Evaluation and Research, the Center for Biologics Evaluation and Research, and the Center for Devices and Radiological Health, including its Digital Health Center of Excellence, the paper addresses several key topics in the context of AI and ML. These include current and potential uses of AI/ML, considerations for the responsible use of AI/ML, and engagement with stakeholders to include, pharmaceutical companies, academia, patients and patient groups, and regulatory and other authorities. Following an overview of the current and potential future uses for AI/ML in therapeutic development, the paper then discusses the possible concerns and risks associated with these innovations and ways to address them. To this end, the U.S. FDA solicited feedback in areas such as validation and verification of AI/ML, model development, performance, and monitoring, quality and reliability of data, and governance and transparency of AI/ML model development.

EMA reflection paper on the use of AI

In July 2023, the EMA published a draft reflection paper on the use of AI to support the safe and effective development, regulation, and use of human and veterinary medicines, recognizing the rapid development of AI applications in the medical field and the need to prepare for the associated regulatory challenges. Open for public consultation until the end of December 2023, the final version of this reflection paper was issued in September 2024.²² Here, it was stated that

The level of scrutiny [depends] on the level of risk and regulatory impact posed by the system. A key principle is that it is the responsibility of the [clinical trial sponsor, marketing authorization applicant/holder or manufacturer] to ensure that all algorithms, models, datasets, and data processing pipelines used are fit for purpose and are in line with [legal, ethical, technical, scientific, and regulatory standards as described in European Union (EU) legislation, Good Practice (GxP) standards, and current EMA guidelines]. Of note, these requirements may in some respects be stricter than what is considered standard practice in the field of data science. ²²

The EMA reflection paper also provides recommendations for technical aspects that should be covered in the development of AI/ML models. A combined summary of U.S. FDA and EMA considerations for the development and use of AI/ML models is outlined in Table 4.

Table 4.

Technical and regulatory considerations for AI/ML model development.^22,68

Technical aspects	Considerations
Data acquisition and augmentation	• Ensuring quality, reliability, and fit-for-use data (i.e., relevant for the specific intended use and population) is critical• Potential data-related issues to consider include bias, integrity (completeness, consistency, and accuracy of data), privacy and security, provenance (i.e., a record trail for the origin of a piece of data), relevance, replicability, reproducibility, and representativeness• Augmentation techniques may be applied to the training dataset; if limitations remain, these should be clearly presented
Training, validation, and test datasets	• There should be sufficiently large and balanced, independent training, validation, and test datasets• Validation datasets may be used to inform the selection of model architecture and hyperparameter tuning• Test datasets should be used to evaluate the final performance of the model
Model development	• SOPs should promote a development practice that favors model generalizability and robustness• Model risk should be determined by model influence (i.e., the weight of the model in the totality of evidence for a specific decision) and decision consequence (the potential consequences of a wrong decision)• There should be consistent, reliable, relevant, and traceable documentation and development logs to allow secondary assessment of development practices
Performance assessment	• Predefined thresholds for performance metrics should be used that are related to the context of use• Performance requirements should depend on the level of patient risk• Details on the performance of the AI/ML model should be included when employing independent, external testing data to support verification and validation (“external validity”)
Interpretability and explainability	• Use of transparent models is preferred; however, “black box” models may be acceptable if transparent models show unsatisfactory performance or robustness• Pre-defined monitoring and system risk management plans should be available for mitigating non-transparent issues
Model deployment	• Re-evaluation of system performance is important in case of nontrivial change in software and/or hardware stack that is supporting the model• Monitoring of model performance should allow early detection of draft/degradation against clearly defined thresholds for acceptable model performance• A system risk management plan should be in place
Regulatory aspects	Considerations
Governance, integrity, and data protection	• Human-led AI/ML governance and clear accountability is needed during planning, development, use, modification, and discontinuation (as applicable) of AI/ML in the drug development process• SOPs implementing GxP principles on data and algorithm governance should be used• A risk management plan that considers the context of use should be applied• Transparency and documentation can provide critical, contextual insight on the initial planning, development, function, and any modifications of the AI/ML• Integrity-preserving measures to anonymize data may be needed
Ethical and trustworthy AI	• Basic ethical principles should be applied, as defined in the Ethics Guidelines for Trustworthy AI and presented in the ALTAI from the independent High-Level Expert Group on AI that was established by the European Commission²¹

Abbreviations: AI, artificial intelligence; ALTAI, assessment list for trustworthy artificial intelligence for self-assessment; GxP, good practice; ML, machine learning; SOPs, standard operating procedures.

EU AI Act

Following the U.S. FDA’s discussion paper and request for feedback on the use of AI/ML,⁶⁸ and the EMA’s reflection paper on the use of AI,²² the EU became the first major regulatory body to have issued a legal framework for the use of AI. On August 1, 2024, the EU implemented the EU AI Act, a comprehensive legal framework with the intention to harmonize rules on the use of AI throughout the EU.²³ While fostering innovation, its main goal is to ensure that fundamental human rights are respected via safe AI systems.²³ Regulation under this Act is based on a risk-based approach regarding the potential impact of a given AI system on an individual’s health, safety, values, and fundamental rights. Only AI systems used for military or national security purposes, or for pure scientific research and development, are exempt from these regulations.²³ It is also important to note that as for the EU’s General Data Protection Regulation, the EU AI Act can apply extraterritorially to providers from outside the EU, if they have users within the EU.

Nonexempt AI applications are classified by their risk of causing harm and applications with unacceptable risks are banned (e.g., those providing “social scoring,” which may lead to discriminatory outcomes).²³ High-risk applications (e.g., medical software) must comply with security, transparency, and quality obligations, and must undergo conformity assessments.²³ Limited-risk applications, such as online chatbots, only have transparency obligations and minimal-risk applications, such as spam filters, are not regulated by the EU AI Act.²³

Ethics guidelines for trustworthy AI

Ethical guidelines for trustworthy AI have been developed by the High-Level Expert Group on Artificial Intelligence (AI HLEG), an independent expert group established by the European Commission in June 2018, and were made public on April 8, 2019.²¹ Although largely centered on the human aspects of AI, requirements for trustworthy AI in toxicologic pathology can be drawn from chapter II (Seven key requirements to achieve trustworthy AI) and chapter III (AI assessment list to verify these key requirements) of the document;⁴⁶ these are listed in Table 5.

Table 5.

Seven key requirements for trustworthy AI and selected suggestions for AI in regulated toxicologic pathology taken from AI HLEG.²¹

Requirement 1: Human agency and oversight
• Safeguards to prevent AI system overconfidence/overreliance• Description of the level of human control/involvement• Mechanisms and measures to ensure human control/oversight• Measures to enable audit and remedy governing AI autonomy issues
Requirement 2: Technical robustness and safety
• Assessment of potential forms of attacks• Measures/systems against potential attacks• Fallback plan in case of attacks or other unexpected situations• Assessment of potential safety risks, including accidental/malicious misuse and a risk management plan• Assessment of any likely impact of AI system failure and a fallback plan• Define level and definition of AI system accuracy and use case• Documentation for monitoring and assuring accuracy• Assessment of harm in case of inaccuracy• Identify any specific contexts/particular conditions needed to ensure reproducibility
Requirement 3: Privacy and data governance
• Establish oversight mechanisms for data collection, storage, processing, and use• Include processes to ensure data quality and integrity• Verify that datasets have not been compromised/hacked
Requirement 4: Transparency
• Document methods for designing and developing the algorithmic system (e.g., method of algorithm training, including data gathered and selected)• Document methods and data used to test and validate the algorithmic system• Ensure decisions and outcomes made by the AI system can be understood• Consider AI system design interpretability from the start• Consider choosing the simplest and most interpretable model possible for the application in question• Consider communicating characteristics, limitations, and potential shortcomings of the AI system to deployers
Requirement 5: Diversity, nondiscrimination, and fairness
• Assess possible limitations resulting from the composition of used datasets• Consider measures to improve your understanding of the data, model, and system performance• Test and monitor for potential biases during the development, deployment, and use phase of the system• Consider mechanisms to flag issues during AI system use that may be related to bias/poor performance
Requirement 6: Societal and environmental wellbeing
• Assess the impact of the AI system on the environment• Assess the impact of the AI system on human work and skills• Consider measures to counteract de-skilling risks
Requirement 7: Accountability
• Establish mechanisms to facilitate auditability (traceability and logging of the AI system’s processes and outcomes)• Facilitate independent audits• Enable third parties to report potential vulnerabilities or risks• Establish mechanisms to allow for redress in case of any harm or adverse impact

Abbreviations: AI, artificial intelligence; AI HLEG, high-level expert group on artificial intelligence.

AI maturity model for GxP application

Once it has been established that AI models are trustworthy, their maturity concerning GxP standards should be established, aligning with the aims of the International Society for Pharmaceutical Engineering (ISPE).²⁰ The ISPE mission is connecting pharmaceutical knowledge to deliver manufacturing and supply chain innovation, operational excellence, and regulatory insights to enhance industry efforts to develop, manufacture, and reliably deliver quality medicines to patients. In April 2022, the ISPE issued an AI maturity model as a foundation for AI validation in a GxP environment, whereby AI system maturity is understood as “the extent to which an AI system can take control and evolve based on its own mechanisms, subject to the constraints imposed on the system in the form of user or regulatory requirements.”²⁰

In this concept, AI model maturity is determined by control design on one hand and autonomy on the other (Tables 6 and 7). Control design refers to the system’s ability to take over controls that safeguard product quality and patient safety, while autonomy describes the system’s feasibility of automatically performing updates and improvements.²⁰ Depending on AI system maturity, six AI validation levels are suggested at a high level, to achieve regulatory compliance (Figure 2). However, it is understood that detailed QA requirements should be defined on a case-by-case basis.

Table 6.

Control design stages for the ISPE AI maturity model.²⁰

Stage 1	Stage 2	Stage 3	Stage 4	Stage 5
The system is used in parallel to the normal GxP processes	The system is executing a GxP process automatically but must be actively approved by the operator	The system is executing the process automatically but can be revised by the operator	The system is running automatically and controls itself	The system is running automatically and corrects itself

Table reproduced from Erdmann et al,²⁰ with permission from the ISPE.

Abbreviations: AI, artificial intelligence; GxP, Good Practice; ISPE, International society for pharmaceutical engineering.

Table 7.

Autonomy stages for the ISPE AI maturity model.²⁰

Stage 0	Stage 1	Stage 2	Stage 3	Stage 4	Stage 5
Fixed algorithms are used (no machine learning)	The system is used in a locked state. Updates are performed by manual retraining with new training data sets	Updates are performed after indication by the system with a manual retraining	Updates are performed by automated retraining with a manual verification step	The system is fully automated and learns independently with a quantifiable optimization goal	The system is fully automated and self-determines its task competency and strategy

Table reproduced from Erdmann et al,²⁰ with permission from the ISPE.

Abbreviations: AI, artificial intelligence; ISPE, International society for pharmaceutical engineering.

Figure 2.

ISPE AI maturity model.²⁰

The six ISPE validation levels from the model are described in Table 8.²⁰ Briefly, systems in validation level I have no influence on product quality and patient safety; therefore, validation is not mandatory. However, safeguards are recommended to ensure that the operator is handling results based on critical thinking and does not use these results to justify decisions. Systems in validation level II are AI applications that are not based on ML and, therefore, do not require training. Instead, validation can be performed with a conventional computerized system validation approach. Validation level III systems are AI applications that are based on training with data for generating their outputs and that operate in a locked state until a retraining process is performed. Additional validation aspects regarding data integrity and model quality should also be performed. Validation level IV systems include AI applications with greater autonomy (i.e., with partially automated update processes). For these, validation should focus on controlling model quality during operation. Systems in validation level V are AI applications with a high level of autonomy and self-control, meaning that stronger controls should be in place and should be reflected in the validation strategy. The systems in validation level VI are self-learning AI systems and currently, a validation concept for this system level does not exist but is under development.

Table 8.

ISPE AI maturity validation levels.²⁰

Level	Description	Minimum validation activities and requirements
I	Parallel (AI) CS	No validation required
II	Classical non-AI CS	Validation of computerized system required, but no dedicated focus on AI
III	Piecewise^a locked state AI CS	In addition to the above requirements:• Documented justification on why a model was selected• Training data verification• Model QA after training• Input data monitoring in operation• Retraining procedures defined
IV	Self-triggered learning AI CS with human operation and update control at all times	In addition to the above requirements:• Monitoring of model quality in operation• Controlling quality KPIs and notification process• Validation of human factors (depending on control design) with regards to overrides, qualifications, and AI system acceptance
V	Self-triggered learning AI CS with update control; overall or sampled operation control only	In addition to the above requirements:• Periodic re-testing with a defined test dataset• Assurance of self-control• Control of AI system outcomes by samples for a defined, risk-oriented fraction, and adequate stratification of input/output instances
VI	AI CS with autonomous learning	Validation concept currently under development

Abbreviations: AI, artificial intelligence; CS, computerized system; ISPE, International Society for Pharmaceutical Engineering; KPIs, key performance indicators; QA, quality assurance.

Piecewise means that the system may be manually updated to another version but provides one exact output to an instance of input data within such a version.

Table reproduced from Erdmann et al²⁰ with permission from the ISPE.

It is important to note that a dynamic development path for AI applications is suggested in this ISPE model for AI maturity.²⁰ For instance, initially a model may be developed with less autonomy and more control, resulting in lower validation requirements. During the system’s lifecycle, based on continuous evaluation of model success, there may be changes in use cases and newly identified risks, resulting in the adaptation of model design, either to allow more human control or to expand the system’s autonomy. While these systems and processes discussed above were not designed with toxicologic pathology specifically in mind, their principles are, nevertheless, applicable to the field of toxicologic pathology.

Use of AI to Support Regulatory Decision-Making in Drug Development

In January 2025, the U.S. FDA issued a draft guidance on the role of AI in the regulatory decision-making process for determining a drug’s safety, efficacy, and quality.⁶⁵ The draft guidance provides a risk-based credibility assessment approach to determining the reliability of an AI model based on risk and specific to the model’s context-of-use. It outlines a framework that sponsors may use to demonstrate risk and credibility of an AI model deployed in a decision-making environment during drug development.⁶⁵

Considerations for AI model development

Although the literature and regulatory guidelines published so far are generally meant for clinical applications, most of the principles developed in these articles may be applied to toxicologic pathology. AI algorithms for toxicologic pathology applications usually involve some type of pattern recognition task, such as detecting specific features, structures, or lesions in a histological image. Pattern recognition typically occurs at the patch level, object level, or pixel level. In some instances, the task could involve label prediction at the WSI level, for example, to determine whether an image is normal or abnormal.

How to determine ground truth

A fundamental requirement for developing successful AI algorithms is the availability of high quality, ground truth datasets. Creating high-quality ground truth annotations requires access to WSIs from multiple sources, such as different institutions, scanners, and staining protocols, and so on. Assembling such a dataset is not straightforward. However, the IMI Bigpicture project, a European public-private initiative, aims to address this issue by creating a repository of more than 3 million WSIs with associated metadata from multiple clinical and nonclinical sources.⁴⁴ Together, it is hoped that more than 2 million WSIs will be sourced from animal models and nonclinical toxicity studies from different organizations using a diversity of slide scanners and staining protocols.

Human-generated ground truth annotations are invariably susceptible to intraobserver and interobserver variability, but also to inconsistencies in image magnification, annotation tools (e.g., polygon vs free-hand drawing tools), and protocols (overlapping vs nonoverlapping annotations), to name a few. To address these challenges, recent efforts have proposed a consensus-driven approach, particularly the use of reference datasets. These reference datasets comprise small, curated sets of ground truth annotations, accompanied by detailed guidelines for annotation practices. Such datasets can reduce variability and enhance the robustness of ground truth data.^42,70

The curated reference datasets serve two purposes; they enable annotators to benchmark their annotations against a standard, improving consistency, and the detailed guidelines help minimize inaccuracies and variability during the annotation process. Nevertheless, this consensus approach still relies on human annotations, which are often a bottleneck due to the limited availability of subject matter experts (i.e., pathologists). The recruitment of multiple pathologists for ground truth generation is typically restricted to large consortia and well-funded programs, making it impractical for many applications. To address this issue, crowdsourcing-based approaches have gained traction. These methods leverage nonexperts to generate ground truth data, often by embedding the annotation process in a virtual or gaming environment that incentivizes active participation. Typically, the same dataset is annotated by multiple participants and interrater agreement is used as a measure of annotation quality. However, in highly specialized fields such as histopathology, subject matter expertise is crucial for generating reliable ground truth due to the inherent contextual complexity of histological data. Currently, there are numerous efforts to improve the quality of ground truth labels for these specialized applications.

In Lopez-Perez et al., the authors trained an AI algorithm to detect tumor, stroma, and immune infiltrates from tumor biopsy hematoxylin and eosin (H&E)-stained images using two different training datasets.³⁷ One of the training datasets, which the authors considered as the gold standard, was generated by board certified pathologists, while the other training dataset was generated by medical students. Importantly, the authors used statistical ML methods to enrich the training dataset generated by the medical students, which resulted in comparable performance of the AI algorithm when the gold standard was used as training data.

Quality requirements for ground truth datasets

The quality and the size of ground truth datasets depend on the type of labels being predicted and the complexity of the AI algorithm. A good ground truth dataset must exhibit accuracy, class balance, completeness, and diversity. Accuracy is essential, as errors in ground truth can lead to poor model performance. Class balance ensures that the dataset provides appropriate proportions of labels, minimizing any imbalances that may affect training outcomes. The specific approach to achieving class balance depends on the application. For image-level classification tasks (e.g., normal vs abnormal slides), balancing the number of WSIs per class is crucial. For region-level detection (e.g., glomeruli in the kidney), the proportion of annotated pixel areas matters. For instance-level tasks (e.g., cell phenotyping), a balanced count of labeled instances for each category (e.g., tumor cells, lymphocytes, and endothelial cells, etc.) is essential.

Ground truth data should be representative of a wide range of real-world scenarios that the AI algorithm could encounter, which will enable it to perform well with new, unseen data. The ground truth should also cover a wide range of data scenarios that account for pre-analytic and analytic variables, such as variations in tissue source (e.g., different strains/species), staining (e.g., different H&E staining kits), sample format (e.g., tissue microarrays and longitudinal vs transverse sections), and slide digitization (e.g., slide scanner and magnification, etc.).

The roles of training and test datasets in algorithm development and validation introduce subtle but important differences in their quality requirements. While error-free labels are generally preferred for training, techniques exist to handle label noise during this phase, making minor inaccuracies manageable.³⁴ However, for regulatory approval and performance metrics calculations, the test data must be error-free. Similarly, while real-world datasets often exhibit intrinsic imbalances due to the natural prevalence of certain labels, care should be taken during test dataset preparation to avoid over- or under- sampling specific classes, which could bias performance metrics.

How to detect and manage bias

Bias in ground truth data is a major concern for AI algorithms, particularly in pathology applications. Bias can arise from visual and cognitive traps that affect pathologists’ annotations.⁴ Visual traps occur when perception deviates from objective reality, such as optical illusions impacting the size, intensity, or color of features. Cognitive traps include diagnostic drift, where diagnoses or scores systematically shift across a study. Confirmation bias can also be an issue, where knowledge of treatment groups influences diagnosis.

Bias can also stem from nonhuman factors that inadvertently influence AI algorithms. For example, recently, an AI algorithm trained to classify tumor types using H&E images from The Cancer Genome Atlas learned to identify tumors based on their hospital of origin rather than morphologic tissue features.¹⁸ This underscores the importance of appropriate data preparation such as removing metadata from WSIs and sourcing data from diverse origins, to prevent algorithms from learning non-histologic, source-specific patterns.

Bias reduction strategies include minimizing class imbalance, using data augmentation, human-interpretable features, and model cross-validation. Cross-validation evaluates model performance by training on different subsets of the ground truth data and testing on the remaining data. This approach helps assess generalization to unseen data and identify potential overfitting, provided biases in the ground truth are controlled. In toxicologic pathology, cross-validation poses additional challenges, as patches extracted from WSIs may lead to contamination if similar lesions from the same experiment appear in both training and testing sets. This could lead to some spillover/contamination and the resulting validation results do not adequately generalize the algorithm performance to unseen data. Thus, a mixing of image patches from the same animal in test and training datasets should be controlled for. In addition, rare classes may lack sufficient ground truth data, leaving them potentially unrepresented in either the training or test datasets during cross-validation.

How to use data augmentation

Training deep neural networks on histopathology image datasets poses unique challenges due to high data dimensionality, limited labeled samples, and variability in staining protocols and imaging equipment. Data augmentation is a key strategy to enhance dataset diversity, mitigating overfitting and improving model generalization to unseen data. Augmentation methods for histopathology must provide domain-specific needs, such as handling color variations from staining and preserving the spatial structure of biological features. Current research spans stain-specific color transformations, geometric augmentations, generative synthetic data, and fully automatic augmentation pipelines, but no consensus yet exists on the optimal data augmentation strategies for toxicologic pathology.

Color augmentations tailored to H&E staining are well-studied for addressing variability in slide preparation and imaging.^52,60,62 Techniques like H&E color transformations introduce realistic stain variations, improving robustness in tasks like mitosis detection and tumor classification.⁶¹ Automated frameworks, such as bilevel optimization and RandAugment, further refine augmentation by eliminating manual selection and outperform traditional methods.²⁴

Generative models, especially generative adversarial networks (GANs), offer another approach by creating synthetic histopathology images to address data scarcity and class imbalance.^19,29,39 Methods like Auxiliary Classifier GAN and diffusion models have shown promise in augmenting underrepresented classes and boosting model performance. However, concerns remain regarding the realism and biological fidelity of synthetic images. Geometric and patch-wise augmentations, including patch stitching and region-level transformations have also been explored.^59,71 These methods enable biologically meaningful augmentations while maintaining spatial consistency, making them particularly useful for tasks like multiple instance learning, where structural information at the subregion level is critical.

Assessing accuracy and precision of AI algorithms

Performance metrics for AI algorithms in pathology can be broadly categorized by task: classification or segmentation. Segmentation algorithms commonly use metrics like the Dice similarity coefficient (F1 score) or Intersection over Union, which measure the overlap between the true and predicted areas of a class on a scale from 0 to 1. A value of 0 indicates no overlap (poor performance), while a value of 1 indicates perfect overlap (ideal performance). Classification algorithms often rely on the area under the receiver operating characteristic (ROC) curve, which also ranges between 0 and 1. The ROC curve plots the true positive rate against the false positive rate across varying thresholds, and for multiclass problems, partial ROC curves are generated for each class by treating one as positive and the rest negative. However, these metrics have limitations, particularly due to challenges like class imbalance and WSI-level aggregation. In addition, their interpretability in downstream decision-making can be problematic. For example, while an F1 score near to 1 is desirable, practical considerations (data quality and diversity, task complexity, and algorithm selection) often result in less-than-ideal scores (e.g., 0.8). While higher scores generally indicate better performance, they should not be the sole acceptance criteria for the algorithm. Instead, calculating confidence intervals for metrics can provide a clearer picture of their reliability by showing the range of possible values.

Using independent datasets with orthogonal endpoints, such as pathologist scores, can further validate algorithm performance on real-world data. For example, if an AI algorithm detects a specific lesion, comparing its output to pathologists’ score on a study with control and treated groups provides a more realistic assessment of its utility. More broadly, it is important to evaluate whether the AI algorithm delivers tangible benefits, such as improved speed, accuracy, or scalability in processing images and generating meaningful information. Finally, interpreting and reporting these metrics requires careful consideration of the specific domain and task, and incorporating expert feedback (“pathologist in the loop”) is critical for evaluating the algorithm’s practical relevance.

Explainability of AI algorithms

Beyond AI performance metrics lies explainability. Toxicologic pathology AI models should be interpretable so their decision-making processes can be understood by pathologists and other stakeholders. XAI addresses the “black box” nature of AI by offering insights into how conclusions are reached, fostering trust and credibility. Regulatory agencies such as the U.S. FDA and the EMA emphasize the importance of human interpretable features (HIFs) for AI algorithms, although guidelines specific to toxicologic pathology remain limited.^22,68

Heat maps and saliency maps are commonly used HIFs to visualize how AI models make decisions.^43,55,58 Saliency maps highlight the contribution of individual pixels to a classification decision. For example, incorrect classification can arise if an algorithm prioritizes background pixels over those of interest. In contrast, heat maps show how broader regions in an image can influence decisions and provide a complimentary, higher-level view. While saliency maps represent bottom-up perceptual features, heat maps represent top-down semantic or contextual information.

Despite their widespread use, these qualitative tools have limitations. Their interpretation is often subjective and requires expertise from pathologists. Moreover, heat and saliency maps are model-specific, meaning the same task performed by different AI architectures could produce inconsistent visualizations. This variability can reduce their utility across models. Recent advancements aim to address these issues by introducing quantitative endpoints for HIFs. For example, Boggust et al,⁷ have proposed a saliency score that is a feature-wise importance score, and describes the feature’s influence on the model’s output for a given label.

Life cycle management of AI algorithms.

The successful development of an AI algorithm is typically marked by its routine use for the intended application. In the GLP space, the sustained use of validated AI algorithms imposes the need for life cycle management to maintain its credibility. Life cycle management refers to the monitoring and management of changes to AI algorithm inputs or outputs, whether accidentally or deliberately, to ensure that the algorithm remains fit for use. Draft guidance from the U.S. FDA recommends the implementation of a life cycle maintenance plan for AI algorithms that support/generate regulatory decision-making data.⁶⁵ Specifically, the guidance recommends that the life cycle plan include checks for data drifts and periodic monitoring of model performance to ensure that the algorithm remains fit for purpose for the intended use. A risk-based approach for life cycle management is recommended to assess the impact of changes to the input data for the AI algorithm. In toxicologic pathology, changes in the input data could occur when the histology images are generated at a different source (e.g., a new CRO), or if there are systematic changes to the staining protocol that could affect the color profile of the stained slide. Depending on the outcome of the assessment, some retraining and/or retesting may be required.

Challenges and Outlook for AI

Overcoming the specific challenges to the use of AI in toxicologic pathology

The comprehensive pathology assessments routinely conducted by pathologists in nonclinical toxicology studies involve complex tasks that are challenging to replicate computationally. In addition, the field of toxicologic pathology poses significant challenges for AI due to its extensive diversity, which includes numerous species, up to 60 organs, each with substructures, and a wide range of potential artifactual, background, procedural, or test article-related findings per organ or even substructure. In established AI applications for pathology, models are typically trained and tested on the same set of predefined classes, allowing them to quantify expected treatment-related findings for a specific organ and species. However, this “closed-set” approach has limitations in the diverse and unpredictable scenarios that are encountered in toxicologic pathology. To address the variability and ensure real-world applicability, an “open-set” of recognition techniques should be aimed for. These methods allow AI models to create more robust models, identify and classify novel or unexpected features not included in their training, making them better suited for routine assessments in toxicologic pathology.

Histology-based foundation models, and generative and multimodal AI applied to pathology are, at the time of this publication, promising emerging approaches that, if deployed alone or in combination in routine pathology workflows, could significantly impact the field of toxicologic pathology. The integration of AI tools into the toxicologic pathology workflow offers numerous benefits for the future. These include streamlining histological processing, reducing animal usage, enhancing the efficiency, consistency, and accuracy of pathology assessments, minimizing bias and variability between observers, speeding up reporting, and uncovering new insights in pathological evaluations.

Foundation models

To address the unique toxicologic pathology challenges, AI models must go beyond clinical foundation models, which are primarily focused on oncology and cancer-specific tasks.^38,69 Toxicologic pathology demands a broader and, therefore, a more versatile framework capable of capturing the full diversity of tissue structures and histopathological patterns across multiple organ systems and species. This inclusivity is essential to manage the wide variability of lesions encountered in toxicologic pathology. In this regard, HistoNet, developed by Novartis, represents a pioneering effort.³¹ It is the first published foundation model specifically trained on toxicologic pathology data and was created even before the term “foundation model” was formally established.³¹ HistoNet demonstrated the potential of such models to generalize across tasks and datasets, enabling applications like artifact detection, lesion grading, and CBIR. Looking ahead, the IMI Bigpicture project, with its ambitious collection of over 2 million nonclinical slides, provides a critical resource for advancing toxicologic pathology AI. By leveraging this dataset, future models will be able to achieve a new level of diversity and specificity, supporting developments tailored to toxicologic pathology’s unique challenges.

Generative and multimodal AI

Generative AI refers to AI systems trained on vast datasets that can create new content, such as text, images, or videos, based on the patterns and structures learned from existing data. Large language models can provide draft lesion descriptions using standardized terminology or generate draft pathology reports by synthesizing data from observation tables. They can also compare tabulated data with draft reports to flag discrepancies and suggest refinements. For image models, generative AI will enhance data diversity by generating synthetic examples of rare lesions or patterns for model training. Multimodal AI combines data from various sources or modalities, such as histopathological images, clinical pathology, and molecular profiles, to provide a more comprehensive or holistic assessment. This approach leverages the strengths of different data types and provides a holistic view of an animal’s condition. These approaches will help AI algorithms to model the adversity of lesions, identify the no-observed-adverse-effect level, and streamline the integration of relevant pathology descriptions into toxicology reports.

Applying AI tools in routine toxicologic pathology

The implementation of AI-assisted artifact detection, such as identifying technical or digital artifacts on WSIs, could automate decisions regarding the need for recutting, resampling, or rescanning, and function as a QC step in the workflow. Integrating such a model during the histology processing phase can save time and resources, while segmenting artifacts for downstream AI models. Similarly, AI-driven outlier or lesion detection algorithms could identify tissue areas from specific organs that deviate from the norm, such as when compared to concurrent or virtual controls. Eventually, WSIs from a study could be uploaded to an AI-powered platform, which would analyze the images, determine potential target organs at a group level, flag outliers at the slide level, and segment the relevant regions. By identifying possible target organs early in the process, AI tools could trigger the histology processing of intermediate and recovery groups before pathologist evaluation, thus saving time. The pathologists would then review the regions flagged by the AI platform, allowing them to focus their expertise on the most critical areas, improving overall efficiency and accuracy. In addition, AI models could propose preliminary diagnoses and grading for the flagged lesions, which the pathologist would then confirm, thereby reducing interobserver and intraobserver variability. Ultimately, further productivity gains could be achieved with generative AI models that automate report generation based upon the outputs of pathologist-confirmed lesion detection models, streamlining the entire workflow.

CBIR systems retrieve visually similar images from large databases based on the content of a query image.¹⁴ This approach is beneficial for background changes by providing access to similar cases and their corresponding diagnoses in an extensive, centralized virtual control database. Pathologists could then use CBIR systems to search for similar cases by uploading a query image. The AI would retrieve and display images with similar features, along with their diagnostic information and animal metadata, helping the pathologist assess the spontaneous nature of a finding.

The concept of synthetic controls relies on the creation of artificial control organs or animals using generative AI models that have been trained on extensive control datasets. These datasets can be either monomodal (e.g., WSIs only) or multimodal (e.g., including both WSIs and a selection, or all in vivo and ex vivo parameters, measured in animals from toxicological studies). These synthetic controls could augment or ultimately replace existing control datasets by providing standards for comparison in toxicologic studies, thereby reducing animal use.

Conclusion

The future of toxicologic pathology is closely tied to the integration of digital pathology and the full transition to AI-assisted digital pathology represents a game-changing opportunity for the field. Pathologists must now actively shape how AI is adopted and applied to mitigate the influence of other stakeholders, potentially diminishing the pathologist’s central role in toxicity assessments. This moment calls for growth and adaptation; pathologists need to upgrade their skills to work effectively with digital pathology and AI, not only to use them but to guide their development and applications in ways that best serve the field. Training programs are critical to ensure that pathologists are equipped to understand and collaborate with these tools, enhancing their expertise while maintaining their vital role in decision-making. At the same time, it is important for pathologists to engage with health authorities and regulatory bodies to communicate the shared benefits of digital pathology and AI adoption—streamline workflows, improve consistency and accuracy, and ultimately, reduce the use of animals in preclinical testing. Despite the expected challenges of implementing such novel approaches into the highly regulated world of drug development, AI-assisted digital pathology will not only enhance efficiency but will also strengthen the overall quality and impact of pathologists’ contributions. By embracing these advancements and leading their implementation, pathologists can ensure that AI becomes a tool to enhance their expertise, not replace it. In essence, this is a pivotal era for toxicologic pathology, which requires an industry-wide commitment to growth and innovation, thus paving the way for more accurate data interpretation, more efficient workflows, and ultimately, the advancement of scientific discovery in nonclinical toxicology.

Footnotes

Acknowledgements

This publication is a deliverable of the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No. 945358 (Bigpicture, https://www,bigpicture.eu/). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation program and EFPIA (). The authors would like to thank Thomas Lucotte, Holger Höfling, and Manuel Hermann for their critical and impactful review comments on this manuscript. They also thank Ben McDermott of the Bioscript Group (Macclesfield, UK) for medical writing support in accordance with Good Publication Practice guidelines, which was funded by Boehringer Ingelheim Pharma GmbH & Co. KG.

Authors’ Contributions

GPE conceived and drafted the initial manuscript. CH, JB, and SR provided substantial writing, reviewing, and editing of the initial draft. All authors provided scientific expertise associated with the topic, critically revised the manuscript and made additions and edits to reach the final submitted and approved version.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Publication costs associated with this article were funded by Boehringer Ingelheim Pharma GmbH & Co. KG.

ORCID iDs

Gabriele Pohlmeyer-Esch

Charles Halsey

Julie Boisclair

Sripad Ram

Sarah Kirschner-Kitz

Brian Knight

Pierre Moulin

Anna-Lena Frisk

References

Abels

Pantanowitz

Aeffner

, et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association. J Pathol. 2019;249(3):286-294. doi:10.1002/path.5331.

Aeffner

Sing

Turner

. Special issue on digital pathology, tissue image analysis, artificial intelligence, and machine learning: approximation of the effect of novel technologies on toxicologic pathology. Toxicol Pathol. 2021;49(4):705-708. doi:10.1177/0192623321993756.

Aeffner

Wilson

Bolon

, et al. Commentary: roles for pathologists in a high-throughput image analysis team. Toxicol Pathol. 2016;44(6):825-834. doi:10.1177/0192623316653492.

Aeffner

Wilson

Martin

, et al. The gold standard paradox in digital image analysis: manual versus automated scoring as ground truth. Arch Pathol Lab Med. 2017;141(9):1267-1275. doi:10.5858/arpa.2016-0386-RA.

Bach

Barale -Thomas

Knight

, et al. Recommendations for GLP-conform archiving of whole slide images. Paper Presented at the 20th European Congress of Toxicologic Pathology; September 26–29, 2023; Basel, Switzerland.

Barredo Arrieta

Díaz-Rodríguez

Del Ser

, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020;58:82-115. doi:10.1016/j.inffus.2019.12.012.

Boggust

Suresh

Strobelt

Guttag

Satyanarayan

. Saliency cards: a framework to characterize and compare saliency methods. Facct ‘23: Proc 2023 ACM Conference on Fairness, Accountability, and Transparency; June 12–15, 2023; Chicago, IL. doi:10.1145/3593013.3593997.

Boisclair

Bawa

Barale -Thomas

, et al. IT/QA and regulatory aspects of digital pathology: results of the 8th ESTP International workshop. Toxicol Pathol. 2022;50(6):793-807. doi:10.1177/01926233221113275.

Bradley

Cary

Isobe

Naylor

Drew

. Proof of concept: the use of whole-slide images (WSI) for peer review of tissues on routine regulatory toxicology studies. Toxicol Pathol. 2021;49(4):750-754. doi:10.1177/0192623320983252.

10.

Bradley

Jacobsen

. Toxicologic Pathology Forum*: opinion on considerations for the use of whole slide images in GLP pathology peer review. Toxicol Pathol. 2019;47(2):100-107. doi:10.1177/0192623318818790.

11.

CADTH Horizon Scans. The Paige Prostate Suite: Assistive Artificial Intelligence for Prostate Cancer Diagnosis: Emerging Health Technologies, 2024. Accessed March 19, 2025. https://pubmed.ncbi.nlm.nih.gov/39466926/.

12.

Cann

Chlipala

Ellin

, et al. Validation of Digital Pathology Systems in the Regulated Nonclinical Environment. Digital Pathology Association, 2011. Accessed March 20, 2025. https://digitalpathologyassociation.org/_data/cms_files/files/DPA_White_Paper_Final_-_2011-11-17.pdf.

13.

Carter

Bach

Boisclair

, et al. Recommendations for the utility of equivalency and concordance studies in digital pathology. Paper Presented at the 20th European Congress of Toxicologic Pathology; September 26–29, 2023; Basel, Switzerland.

14.

Chen

Williamson

DFK

, et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat Biomed Eng. 2022;6(12):1420-1434. doi:10.1038/s41551-022-00929-8.

15.

Chu

Eastham

, et al. Assessment of color reproducibility and mitigation of color variation in whole slide image scanners for toxicologic pathology. Toxicol Pathol. 2023;51(6):313-328. doi:10.1177/01926233231224468.

16.

Clunie

. DICOM format and protocol standardization-a core requirement for digital pathology success. Toxicol Pathol. 2021;49(4):738-749. doi:10.1177/0192623320965893.

17.

Clunie

Dennison

Cram

, et al. Technical challenges of enterprise imaging: HIMSS-SIIM collaborative white paper. J Digit Imaging. 2016;29(5):583-614. doi:10.1007/s10278-016-9899-4.

18.

Dehkharghanian

Bidgoli

Riasatian

, et al. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn Pathol. 2023;18(1):67. doi:10.1186/s13000-023-01355-3.

19.

Dhivya

Mohanavalli

Karthika

Shivani

Mageswari

. GAN based data augmentation for enhanced tumor classification. 2020 4Th International Conference on Computer, Communication and Signal Processing (ICCCSP). 28–29 September, 2020. doi:10.1109/ICCCSP49186.2020.9315189.

20.

Erdmann

Blumenthal

Baumann

Kaufmann

. AI maturity model for GxP application: a foundation for AI validation. Pharmaceutical Engineering, 2022. Accessed March 19, 2025. https://ispe.org/pharmaceutical-engineering/march-april-2022/ai-maturity-model-gxp-application-foundation-ai.

21.

European Commission. High-level expert group on artificial intelligence. Ethics Guidelines for Trustworthy AI, 2019. Accessed March 20, 2025. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.

22.

European Medicines Agency. Reflection paper on the use of artificial intelligence (AI) in the medicinal product lifecycle, 2023. Accessed March 20, 2025. https://www.ema.europa.eu/en/documents/scientific-guideline/draft-reflection-paper-use-artificial-intelligence-ai-medicinal-product-lifecycle_en.pdf.

23.

European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act). Accessed March 19, 2025. https://eur-lex.europa.eu/eli/reg/2024/1689/oj.

24.

Faryna

van der Laak

Litjens

. Tailoring automated data augmentation to H&E-stained histopathology. Proc Fourth Conf Med Imag Deep Learn. 2021;143:168-178.

25.

Forest

Aeffner

Bangari

, et al. Scientific and regulatory policy committee brief communication: 2019 survey on use of digital histopathology systems in nonclinical toxicology studies. Toxicol Pathol. 2022;50(3):397-401. doi:10.1177/01926233221084621.

26.

Forest

Aeffner

Bangari

, et al. Scientific and regulatory policy committee points to consider: primary digital histopathology evaluation and peer review for good laboratory practice (GLP) nonclinical toxicology studies. Toxicol Pathol. 2022;50(4):531-543. doi:10.1177/01926233221099273.

27.

Gauthier

Gervais

Hamm

, et al. Toxicologic pathology forum*: opinion on integrating innovative digital pathology tools in the regulatory framework. Toxicol Pathol. 2019;47(4):436-443. doi:10.1177/0192623319827485.

28.

Funk

Albrecht

, et al. Toxicologic pathology forum*: a roadmap for building state-of-the-art digital image data resources for toxicologic pathology in the pharmaceutical industry. Toxicol Pathol. 2022;50(8):942-949. doi:10.1177/01926233221132747.

29.

Gheshlaghi

Nok Enoch Kan

. Breast cancer histopathological image classification with adversarial image synthesis. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:3387-3390. doi:10.1109/embc46164.2021.9630678.

30.

Hanna

Reuter

Hameed

, et al. Whole slide imaging equivalency and efficiency study: experience at a large academic center. Mod Pathol. 2019;32(7):916-928. doi:10.1038/s41379-019-0205-0.

31.

Hoefling

Sing

Hossain

, et al. HistoNet: a deep learning-based model of normal histology. Toxicol Pathol. 2021;49(4):784-797. doi:10.1177/0192623321993425.

32.

Hukkanen

Trapani

Derringer

, et al. Toxicologic pathology forum: opinion on performing good laboratory practice histopathology evaluation for nonclinical toxicity studies in a remote location. Toxicol Pathol. 2023;51(3):148-152. doi:10.1177/01926233231168133.

33.

Jacobsen

Lewis

Baily

, et al. Utilizing whole slide images for the primary evaluation and peer review of a GLP-compliant rodent toxicology study. Toxicol Pathol. 2021;49(6):1164-1173. doi:10.1177/01926233211017031.

34.

Karimi

Dou

Warfield

Gholipour

. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med Image Anal. 2020;65:101759. doi:10.1016/j.media.2020.101759.

35.

Lempp

Arms

Bertram

, et al. A minimal approach to demonstrate concordance of digital and conventional microscopy in toxicologic pathology. Toxicol Pathol. 2024;52(5):251-257. doi:10.1177/01926233241255125.

36.

Long

Smith

Machotka

, et al. Scientific and Regulatory Policy Committee (SRPC) paper: validation of digital pathology systems in the regulated nonclinical environment. Toxicol Pathol. 2013;41(1):115-124. doi:10.1177/0192623312451162.

37.

López-Pérez

Amgad

Morales- Álvarez

, et al. Learning from crowds in digital pathology using scalable variational Gaussian processes. Sci Rep. 2021;11(1):11612. doi:10.1038/s41598-021-90821-3.

38.

Chen

Williamson

DFK

, et al. A visual-language foundation model for computational pathology. Nat Med. 2024;30(3):863-874. doi:10.1038/s41591-024-02856-4.

39.

Feng

Zheng

Wang

. Generated model based data augmentations for classification of HER2 immunohistochemical pathological images in breast cancer. Proc 2021 6Th International Conference on Mathematics and Artificial Intelligence. March 19–21, 2021; Chengdu, China. doi:10.1145/3460569.3460585.

40.

Malarkey

Willson

, et al. Utilizing whole slide images for pathology peer review and working groups. Toxicol Pathol. 2015;43(8):1149-1157. doi:10.1177/0192623315605933.

41.

Mehrvar

Himmel

Babburi

, et al. Deep learning approaches and applications in toxicologic histopathology: current status and future perspectives. J Pathol Inform. 2021;12:42. doi:10.4103/jpi.jpi_36_21.

42.

Montezuma

Oliveira

Neto

, et al. Annotating for artificial intelligence applications in digital pathology: a practical guide for pathologists and researchers. Mod Pathol. 2023;36(4):100086. doi:10.1016/j.modpat.2022.100086.

43.

Morch

NJS

Kjems

Hansen

, et al. Visualization of neural networks using saliency maps. Proceedings of ICNN’95. 1 December, 1995; WA, Australia.

44.

Moulin

Grünberg

Barale -Thomas

der Laak

. IMI-Bigpicture: a central repository for digital pathology. Toxicol Pathol. 2021;49(4):711-713. doi:10.1177/0192623321989644.

45.

NEMA. DICOM Standard. Accessed March 20, 2024. http://www.dicomstandard.org/.

46.

Organization for Economic Co-operation Development. OECD good laboratory practice: frequently asked questions (FAQ), 2020. Accessed March 20, 2025. https://web-archive.oecd.org/temp/2024-03-21/311228-glp-frequently-asked-questions.htm#Histopathology.

47.

Organization for Economic Co-operation Development. OECD series on principles of good laboratory practice and compliance monitoring. Advisory Document on GLP and Cloud Computing. Supplement 1 to Document Number 17 on Application of GLP Principles to Computerised Systems, 2023. Accessed March 20, 2025. https://one.oecd.org/document/ENV/CBC/MONO(2023)27/en/pdf.

48.

Organization for Economic Co-operation Development. Series on principles of good laboratory practice and compliance monitoring, number 1: OECD principles of good laboratory practice, 1998. Accessed March 20, 2025. https://www.oecd-ilibrary.org/environment/oecd-series-on-principles-of-good-laboratory-practice-and-compliance-monitoring_2077785x.

49.

Organization for Economic Co-operation Development. Series on principles of good laboratory practice and compliance monitoring, number 16: guidance on the GLP requirements for peer review of histopathology, 2015. Accessed March 20, 2025. https://www.oecd-ilibrary.org/environment/oecd-series-on-principles-of-good-laboratory-practice-and-compliance-monitoring_2077785x.

50.

Organization for Economic Co-operation Development. Series on principles of good laboratory practice and compliance monitoring, number 17: application of GLP principles to computerised systems, 2022. Accessed March 20, 2025. https://one.oecd.org/document/env/jm/mono(2016)13/en/pdf.

51.

Organization for Economic Co-operation Development. Series on principles of good laboratory practice and compliance monitoring, number 22: GLP data integrity, 2021. Accessed March 20, 2025. https://one.oecd.org/document/env/cbc/mono(2021)26/en/pdf.

52.

Otálora

Atzori

Andrearczyk

Khan

Müller

. Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology. Front Bioeng Biotechnol. 2019;7:198. doi:10.3389/fbioe.2019.00198.

53.

Palazzi

Barale -Thomas

Bawa

, et al. Results of the European society of toxicologic pathology survey on the use of artificial intelligence in toxicologic pathology. Toxicol Pathol. 2023;51(4):216-224. doi:10.1177/01926233231182115.

54.

Raciti

Sue

Retamero

, et al. Clinical validation of artificial intelligence–augmented pathology diagnosis demonstrates significant gains in diagnostic accuracy in prostate cancer detection. Arch Pathol Lab Med. 2023;147(10):1178-1185. doi:10.5858/arpa.2022-0066-OA.

55.

Ras

Xie

van Gerven

Doran

. Explainable deep learning: a field guide for the uninitiated. J Artif Int Res. 2022;73:68. doi:10.1613/jair.1.13200.

56.

Rattan

. Data integrity: history, issues, and remediation of issues. PDA J Pharm Sci Technol. 2018;72(2):105. doi:10.5731/pdajpst.2017.007765.

57.

Schumacher

Aeffner

Barale -Thomas

, et al. The application, challenges, and advancement toward regulatory acceptance of digital toxicologic pathology: results of the 7th ESTP International Expert Workshop (September 20–21, 2019). Toxicol Pathol. 2021;49(4):720-737. doi:10.1177/0192623320975841.

58.

Singh

Sengupta

Lakshminarayanan

. Explainable deep learning models in medical image analysis. J Imaging. 2020;6(6):52. doi:10.3390/jimaging6060052.

59.

Tavolara

Khalid Khan Niazi

Gurcan

. Simple patch-wise transformations serve as a mechanism for slide-level augmentation for multiple instance learning applications. Procspie. 2023;12471:124711E. doi:10.1117/12.2653878.

60.

Tellez

Balkenhol

MCA

Karssemeijer

, et al. H and E stain augmentation improves generalization of convolutional networks for histopathological mitosis detection. Proc SPIE 10581, Medical Imaging 2018: Digital Pathology; 6 March 2018; Houstan, TX.

61.

Tellez

Balkenhol

Otte -Holler

, et al. Whole-slide mitosis detection in H&E breast histology using PHH3 as a reference to train distilled stain-invariant convolutional networks. IEEE Trans Med Imaging. 2018;37(9):2126-2136. doi:10.1109/tmi.2018.2820199.

62.

Tellez

Litjens

Bándi

, et al. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med Image Anal. 2019;58:101544. doi:10.1016/j.media.2019.101544.

63.

Turner

Aeffner

Bangari

, et al. Society of toxicologic pathology digital pathology and image analysis special interest group article: opinion on the application of artificial intelligence and machine learning to digital toxicologic pathology. Toxicol Pathol. 2020;48(2):277-294. doi:10.1177/0192623319881401.

64.

Turner

Knight

Zuraw

Litjens

Rudmann

. Mini review: the last mile-opportunities and challenges for machine learning in digital toxicologic pathology. Toxicol Pathol. 2021;49(4):714-719. doi:10.1177/0192623321990375.

65.

United States Food and Drug Administration. Considerations for the use of artificial intelligence to support regulatory decision-making for drug and biological products. Draft Guidance for Industry and Other Interested Parties, 2024. Accessed January 29, 2025. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-use-artificial-intelligence-support-regulatory-decision-making-drug-and-biological.

66.

United States Food and Drug Administration. Software algorithm device to assist users in digital pathology, 2021. Accessed March 19, 2025. https://www.accessdata.fda.gov/cdrh_docs/pdf20/DEN200080.pdf.

67.

United States Food and Drug Administration. Use of whole slide imaging in nonclinical toxicology studies: questions and answers. Guidance for Industry, 2023. Accessed March 19, 2025. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-whole-slide-imaging-nonclinical-toxicology-studies-questions-and-answers.

68.

United States Food and Drug Administration. Using artificial intelligence and machine learning in the development of drug and biological products. Discussion Paper and Request for Feedback, 2023. Accessed March 19, 2025. https://www.fda.gov/media/167973/download.

69.

Vorontsov

Bozkurt

Casson

, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med. 2024;30(10):2924-2935. doi:10.1038/s41591-024-03141-0.

70.

Wahab

Miligy

Dodd

, et al. Semantic annotation for computational pathology: multidisciplinary experience and best practice recommendations. J Pathol Clin Res. 2022;8(2):116-128. doi:10.1002/cjp2.256.

71.

Wang

Kim

C-S

Kwak

. Patch stitching data augmentation for cancer classification in pathology images. Procspie. 2024;12933:129331B. doi:10.1117/12.3006330.

72.

Wilkinson

Dumontier

Aalbersberg

, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi:10.1038/sdata.2016.18.

73.

Zuraw

Staup

Klopfleisch

, et al. Developing a qualification and verification strategy for digital tissue image analysis in toxicological pathology. Toxicol Pathol. 2021;49(4):773-783. doi:10.1177/0192623320980310.

Digital Pathology and Artificial Intelligence Applied to Nonclinical Toxicology Pathology—The Current State,Challenges,and Future Directions

Abstract

Keywords

Introduction

Digital Pathology in Toxicologic Pathology

Status of Regulatory Guidelines

GLP Status of WSI

Requirements for IT Equipment

Validation Requirements for Equipment and Workflow

GLP-Compliant Archiving of WSI and Metadata

Lessons Learned and Outlook

AI in Toxicologic Pathology

Regulatory Status and Guidance

Current regulatory status of AI in nonclinical toxicologic pathology

U.S. FDA discussion paper and request for feedback on the use of AI/ML

EMA reflection paper on the use of AI

EU AI Act

Ethics guidelines for trustworthy AI

AI maturity model for GxP application

Use of AI to Support Regulatory Decision-Making in Drug Development

Considerations for AI model development

How to determine ground truth

Quality requirements for ground truth datasets

How to detect and manage bias

How to use data augmentation

Assessing accuracy and precision of AI algorithms

Explainability of AI algorithms

Challenges and Outlook for AI

Overcoming the specific challenges to the use of AI in toxicologic pathology

Foundation models

Generative and multimodal AI

Applying AI tools in routine toxicologic pathology

Conclusion

Footnotes

Acknowledgements

Authors’ Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References