Abstract
For decades, it has been postulated that digital pathology is the future. By now it is safe to say that we are living that future. Digital pathology has expanded into all aspects of pathology, including human diagnostic pathology, veterinary diagnostics, research, drug development, regulatory toxicologic pathology primary reads, and peer review. Digital tissue image analysis has enabled users to extract quantitative and complex data from digitized whole-slide images. The following editorial provides an overview of the content of this special issue of Toxicologic Pathology to highlight the range of key topics that are included in this compilation. In addition, the editors provide a commentary on important current aspects to consider in this space, such as accessibility of publication content to the machine learning-novice pathologist, the importance of adequate test set selection, and allowing for data reproducibility.
For decades, it has been postulated that digital pathology is the future. By now it is safe to say that we are living that future. Digital pathology has expanded into all aspects of pathology, including human diagnostic pathology, veterinary diagnostics, research, drug development, regulatory toxicologic pathology primary reads, and peer review. Digital tissue image analysis has enabled users to extract quantitative and complex data from digitized whole-slide images. With recent advancements of artificial intelligence (AI)-based tools in the field of computer vision, great improvements both in quantitative image analysis and morphometric assessments for decision-making aids are on the near horizon. It is important that toxicologic pathologists gain a common basic understanding of digital pathology, image analysis tools, and machine learning (ML) approaches. This skill is not only necessary to be able to effectively leverage these applications, but more importantly, also to be able to impact their development and advancements in a meaningful way. The American College of Veterinary Pathologists (ACVP) highlighted this need for education in the digital pathology space by incorporating it into its current strategic plan. As a direct consequence, the ACVP’s Pathology Informatics Education Committee was formed. 1 This parallels the pathology informatics efforts in the human pathology field. Another overarching effort highlighted in this special issue is the Innovative Medicines Initiative (IMI) project related to digital pathology. 2 This multiyear, European, private and public consortium effort is the first of its kind. It will involve digitizing, annotating, and linking millions of both nonclinical and clinical whole-slide images with metadata and other diagnostic and research end points, ensuring availability of data and algorithms to all legally and ethically entitled stakeholders. Given the breakthrough catalyzing effects of similar efforts in general-purpose computer vision, the output of the IMI can be expected to stimulate a novel generation of digital pathology projects.
One challenge for pathologists who want to familiarize themselves with digital pathology, image analysis, and ML is the specialized terminology and its inconsistent use. “WSI” for example: Does it stand for Whole Slide Image, Whole Slide Images, or Whole Slide Imaging? A small inconsistency perhaps, but it can lead to frustrations and possibly errors if not clarified. To aid in establishing a shared vocabulary that gains consistent use, the European Society of Toxicologic Pathology (ESTP) workshop paper contains a table with definitions of commonly used digital pathology terms. 3
The seemingly endless stream of new companies and software offerings joining the market poses a true triaging challenge when assessing capabilities and interoperability of systems. While initially, every brand of whole-slide scanners produced files in its own proprietary format, there are now efforts underway to standardize file formats and metadata integration, similar to those pioneered by digital radiology. This Digital Imaging and Communications in Medicine (DICOM) standard and its application to toxicologic pathology is introduced to the readers by Dr Clunie. 4
Digital primary reads of toxicologic pathology studies and digital peer review are an active area of discussion that currently engages professional societies, key opinion leaders, and representatives of regulatory agencies alike. The ESTP workshop paper summarizes how far we have come toward regulatory acceptance of digital toxicologic pathology and highlights key challenges. 3 Bradley et al explore digital primary reads and peer review using whole-slide images in a limited scope, proof-of-concept study. 5
In 2019, the Special Interest Group on Digital Pathology and Image Analysis of the Society of Toxicologic Pathology published an opinion piece introducing AI and ML to the toxicologic pathology community. 6 The minireview in this special issue provides an update on this subject, opines on the ongoing challenges and opportunities of ML in toxicologic pathology, and aims at encouraging our peers to learn more about these novel technologies. 7
Transforming any pathology lab workflow is no small undertaking in general but requires additional consideration for digital toxicologic pathology adoption. Workflows should be designed to minimize the impact of preanalytical variables on digital pathology and downstream image analysis. 8 The pathologist is uniquely qualified to play an important role in quality control along key steps of the workflow, not only early when it comes to tissue, slides, and staining quality but also in project planning, monitoring ongoing work, and finalizing image analysis projects. While some quality control concepts equally apply to both traditional image analysis and work using AI-based tools, other aspects are unique to each approach. 9
With the advancement of AI, computer vision, and ML, technological building blocks have become available that enable the use of AI-based approaches as a more global tool, both across an entire study and even across an entire digital database. Hoefling et al describe the development of a deep learning-based model trained on normal histology slides from toxicologic pathology studies. 10 The application of this model to then distinguish normal from abnormal tissue is demonstrated by Freyre et al. 11 We believe that toxicologic pathology will benefit from such foundational models which can be adapted for specific purposes or turned for example into general abnormality detectors, rather than having an exploding number of unrelated task-specific models. Kuklyte et al demonstrate the need to consider and the value of multimagnification convolutional neural networks for the determination and quantitation of lesions in nonclinical pathology studies. 12
Aside from these examples of the overarching use of AI-based morphometric assessments to entire studies, this special issue incorporates specific image analysis use-cases relevant for toxicologic pathology, many of which utilized AI-based tools. These include proprietary in-house built solutions, such as AI models built to count ovarian follicles, 13 or to quantify changes within retinal layer morphology, 14 and detection of endothelial tip cells in the oxygen-induced retinopathy model, 15 as well the utilization of commercially available application for spermatogenic staging, 16 analysis of rodent cardiomyocytes, 17 to support scoring of dextran sulfate sodium-induced colitis mouse model histology, 18 enumeration of cynomolgus bone marrow histology, 19 quantitative evaluation of hepatocellular cell hypertrophy in rats, 20 quantitate cell proliferation via common immunohistochemical biomarkers, 21 and for verification of changes observed in the Tg-rasH2 mouse used in carcinogenicity studies. 22 A fluorescence-based image analysis use-case (commercial software) is provided by Wilson et al. 23 As novel applications at the periphery of the bread-and-butter imaging work of a toxicologic pathologist are continuously emerging, Rousselle et al introduce a digital 3D topographic microscopy technique called scanning optical microscopy to evaluate re-endothelialization of vascular lumen after endovascular procedures. 24
Due to a substantial knowledge gap between those who are using AI-based tools in pathology, and those who do not, it is becoming increasingly challenging to write and publish papers on the subject that both contain the needed technical details and appropriate terminology to enable reproducibility of scientific data generation, as well as are written in a language that is accessible to all Toxicologic Pathology readers. In addition, due to this technology still being rather new, publishing standards vary. Balance can be achieved by ensuring that the body of an article can be fully understood by the target audience of the journal, while the information in the Materials and Methods section (plus Supplementary Material when applicable) should allow computational pathologists/scientists to reproduce the work as closely as possible. With AI functionality increasingly integrated into graphical user interfaces of commercial histopathology tools, more common ML tasks can be performed by a pathologist without specialized coding skills. However, the apparent ease of performing such tasks often does not extend toward reproducibility or systematic evaluation of predictive performance. These AI-reproducibility issues in the digital pathology space are discussed in detail by Bizzego et al. 25 The serious “reproducibility crisis” of ML in general has been featured prominently, 26 and in response to this the foremost ML conference Neural Information Processing Systems (NeurIPS) now requires researchers to fill out a reproducibility checklist. In addition, NeurIPS has launched a “reproducibility challenge” with the sole purpose of reproducing published results. 27 With regard to AI in toxicologic pathology publications, we recommend a pragmatic balance: Not fulfilling strict reproducibility criteria should by itself not be grounds for manuscript rejection, as in some cases this is not feasible (yet). In particular, it is often impossible to share proprietary data used in publications. Nevertheless, we recommend several key points that should be discussed in articles when not fulfilled. Most important is a rigorous approach toward separating training data (“training set) from data that are used to measure the predictive performance (“test set”). It is not enough to state that a model performed well on the histology slides on which it was trained. Only a separate test set can ensure a meaningful assessment. Moreover, there are subtle pitfalls even in assembling the test set, which can compromise the assumption of its independence from the training set. For example, different regions from a single slide should not be allocated to training and test sets, respectively. Similar limitations apply to different slides originating from a single animal. Ideally, a truly independent test set should consist of one or more completely separate studies, possibly from different staining laboratories. Such a rigorous approach will inevitably lead to lower metrics of predictive performance, and this—besides avoiding publication bias—is one of the reasons why we recommend at this time not to make high values of predictive performance metrics a strict criterion for acceptance or rejection. However, not making quantitative assessments of predictive performance at all and instead solely relying on anecdotal “visual assessments” will not suffice to draw general conclusions about how well a model is suited for a given task. Among other pitfalls is the fact that the test set should only be used once. It is a serious mistake, for example, to tune parameters of a classifier, repeatedly assess the performance on the test set, and report the outcome of the most successful experiment. For such purposes, a separate “validation” data set should be used, and only once a final model has been chosen should the test set be used for generating a definitive estimate of predictive performance.
Even with the best possible standards, performance estimates reported in different articles should not be compared naively due to the many possible sources of variation originating from both data sets and methodology. Therefore, articles repeating previous works or comparing approaches from several articles should be highly welcomed. Other application fields of AI have already developed publicly available benchmark data sets which can be used to improve comparability of results across different papers—something which would be desirable to have in toxicologic pathology as well, and which can be enabled by a large public repository that is forthcoming as part of the IMI (discussed above). This issue of reproducibility could be made a subject for a forthcoming community challenge in computational (toxicologic) pathology, akin to previous tissue image analysis challenges such as CAMELYON, TUPAC, or ACDC-LungHP. 28 –30
The task of finding and selecting the most appropriate data sets for a computational pathology study is complicated by the fact that currently histopathology data are often not fully organized according to FAIR data principles: Findable, Accessible, Interoperable, and Reusable. 31
Along with detailed information about the data, articles should also contain details about the annotations performed by pathologists. For example, were annotations performed on whole slides or only within regions of interest (ROIs)? In the latter case, how were ROIs selected? On which magnification level were the annotations made? Were smaller patches generated from the full slides, and if so, what was the exact procedure?
The special issue editors (Figure 1) hope that this compilation of articles on digital toxicologic pathology and related topics will be useful to readers of Toxicologic Pathology in learning about this continuously developing field and its applications as we expect it to impact how we conduct our profession, not only now and in the near new future, but for generations to come.

The guest editors of this special issue. (A) Oliver Turner BSc (Hons), BVSc, MRCVS, PhD, DACVP, DABT; (B) Famke Aeffner, DVM, PhD, DACVP; (C) Tobias Sing, PhD.
Footnotes
Acknowledgments
The authors would like to thank all colleagues involved in the peer review process of the papers included in this issue, many of which were willing to provide expert feedback on more than one submission.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. F.A. is employed by Amgen Inc. and holds shares in the company. O.T. and T.S. are employed by Novartis and hold shares in the company.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
