Utilizing Whole Slide Images for Pathology Peer Review and Working Groups

Abstract

This article describes the results of comparisons of digitally scanned whole slide images (WSIs) and glass microscope slides for diagnosis of tissues under peer review by the National Toxicology Program. Findings in this article were developed as a result of the data collected from 6 pathology working groups (PWGs), 1 pathology peer review, and survey comments from over 25 participating pathologists. For each PWG, 6–14 pathologists examined 10–143 tissues per study from 6- and 9-month perinatal studies and 2-year carcinogenicity studies. Overall it was found that evaluation of WSIs is generally equivalent to using glass slides. Concordance of PWG consensus diagnoses based upon review of WSIs versus glass slides ranged from 74% to 100% (median 86%). The intra- and interobserver diagnostic variation did not appear to influence the conclusions of any study. Based upon user opinions collected from surveys, WSIs may be less optimal than glass slides for evaluation of subtle lesions, large complex lesions, small lesions in a large section of tissue, and foci of altered hepatocytes. These results indicate that, although there may be some limitations, the use of WSIs can effectively accomplish the objectives of a conventional glass slide review and definitely serves as a useful adjunct to the conduct of PWGs.

Keywords

digital images digital pathology digital slides pathology working group peer review whole slide images whole slide imaging

Introduction

The rapid technical advances recently achieved in digital scanning of glass microscope slides, image processing, and digital storage, combined with high speed networks and improvements in personal computers, are making examination of digitally imaged tissue sections an increasingly viable option for histopathology evaluation. The potential benefits resulting from this technology are seemingly limitless. Diagnostic evaluations and second opinions can be done remotely, worldwide, and without the additional expense and time associated with travel. Slides do not need to be shipped, thus avoiding loss, breakage, or shipping expenses. As other disciplines increasingly utilize the digital medium to advance their work, the use of this technology may be viewed as an essential progression for pathologists.

Many of the newer concepts, strategies, and approaches discussed by McCullough et al. (2004) in regard to the integration of digital imaging and light microscopy have come to fruition. However, it is imperative that digital technology not be automatically implemented in toxicologic pathology without due consideration of accuracy, advantages, and disadvantages in the regulated environment. Guidelines for validation of digital pathology systems in the regulated nonclinical environment or the clinical diagnostic environment are documented in Long et al. (2013) and Pantanowitz et al. (2013). As indicated by McCullough et al. (2004), there will be an expected reluctance to accept new or unproven methodology. Therefore, to improve the comfort level of pathologists and ensure adequacy, these new methods need to be compared with, and evaluated against, time-tested and accepted conventional methodologies.

The National Toxicology Program (NTP) staff began the transition from film-based technology to digital technology to capture images of lesions on microscopic slides in 1997. In 2002, the NTP, and later the National Center for Toxicologic Research (NCTR), began using whole slide scanning technology to acquire and store whole slide images (WSIs or, hereafter, WSI) of glass microscope slides. With the advent of this technology, the NTP has created a digital image database consisting of digitally converted 2 × 2 slides, original digital photographs and photomicrographs, magnetic resonance imaging images, ultrasound images, and digitally scanned WSI. There are currently over 80,000 digital images in the NTP database. The NTP’s digital images have been used in a variety of endeavors, including international nomenclature harmonization efforts and the development of the online NTP nonneoplastic lesion atlas (Cesta et al. 2014).

The NTP has recently evaluated the use of WSI for pathology peer reviews and pathology working groups (PWGs). The current peer review process for NTP studies entails a multilevel review of the findings initially generated by the study pathologist (SP) at a contract laboratory, including a pathology data review, an audit of pathology specimens, a pathology quality assessment (QA) peer review, and finally a PWG. The PWG review, the last stage of the peer review process for NTP studies, is typically a face-to-face meeting in which a panel of pathologists, including experts in the topic area of the PWG, reviews slides and then discusses and votes on the diagnoses of challenging lesions. The majority vote is considered the consensus diagnosis. The SP and QA pathologist attend the PWG in person or via tele- or videoconference.

There has been interest in using WSI for reviews currently performed by NTP staff and NCTR support personnel, with the goals of increasing efficiency and decreasing travel time and expenses while involving the SP and more outside expert opinions. After individual pathologists have evaluated the WSI remotely, they can then use web conferencing to discuss and determine a consensus diagnosis with other pathologists while the images are being viewed in “real time” by a PWG. To this end, the NTP carried out 2 initial “virtual” PWGs in 2007 using WSI. The diagnoses and conclusions for these studies were subsequently confirmed by PWGs using the original glass microscope slides. Since these initial PWGs in 2007, WSI have been used in some manner in all NTP PWGs. Typically, the consensus opinions are derived from direct examination of the tissue sections on glass microscope slides. The WSI have largely been utilized to supplement the PWG process by illustrating findings, pointing out areas of interest during discussion of diagnoses, resolving questions, and ensuring that all PWG members observe key diagnostic features of a finding and are considering the same tissue or region of tissue as intended by the PWG coordinator. This article describes a series of evaluations to define and refine the use of this technology in the PWG and peer review settings.

The objective of this series of exercises was to compare diagnoses made using digitally scanned WSI in PWGs (hereafter referred to as a digital PWG) with diagnoses made based on glass slides (hereafter referred to as a conventional glass PWG) to evaluate the utility of WSI for pathology peer review. This study also presents the diagnoses and opinion poll results of over 25 participating pathologists evaluating the experience of using WSI for peer review and PWGs. The overall goal was to assess the sufficiency of using WSI for data interpretation in a regulated nonclinical environment for pathology peer review and PWGs.

Materials and Methods

This article describes a series of 7 exercises used to determine whether the examination of digitally scanned WSI would result in the same outcome as the direct examination of the glass microscope slides. Studies included 6- and 9-month perinatal studies and 2-year carcinogenicity bioassays. The exercises included proliferative and nonproliferative findings. Most changes under review were proliferative and included those in a continuum of preneoplastic lesions, hyperplasias, and neoplasms. All slides and images were relabeled prior to scanning so the participating pathologists were blinded to any treatment or control designations. Diagnostic reproducibility, observational data by the authors, and comments from participating pathologists were documented. For exercises in which the same pathologists participated in both the digital and conventional glass PWGs/reviews, the time between the 2 PWGs/reviews ranged from 2 to 7 weeks. The first 4 exercises with ongoing conventional glass slide PWGs compared the diagnoses between the conventional glass and digital PWGs. In one instance, a QA peer review was conducted at a contract research organization using WSI (exercise 3). A larger group of pathologists with no prior knowledge of individual diagnoses from the original conventional glass PWGs participated in the final 3 exercises, one of which focused on relatively subtle findings in common target organs (exercise 7). The results of these comparisons were used to evaluate the usefulness of WSI. In all cases, the glass slide evaluations remained the official NTP and NCTR data.

Results

Exercise No. 1: Comparison of Conventional Glass and Digital PWGs on Proliferative Lesions in p53 (+/−) Transgenic Mice in 6- and 9-month Perinatal Studies

An initial digital PWG was followed by a conventional glass PWG 3 weeks later. Forty-seven slides were scanned at 40× magnification representing the following issues: differences of opinion between the SP and the QA pathologist that were not resolved following the QA review; potential treatment-related proliferative changes in the liver, spleen, lymph nodes, and thymus; and unusual neoplasms. PWG participants reviewed the WSI and were instructed to record their diagnoses on worksheets for each image prior to the digital PWG. The digital PWG was held via teleconference with the PWG coordinator and the 6 participants located in North Carolina, Arkansas, and the state of Washington. During the digital PWG teleconference, each image was reviewed and available for viewing via projection. The pathologists discussed the diagnosis(es) for each image and reexamined images if necessary. Consensus diagnoses of the PWG were reached when at least 4 of the 6 PWG participants were in agreement. The same 6 pathologists reviewed the same 47 slides in a conventional glass PWG held at the NCTR in Arkansas 3 weeks later. This exercise compared concordance of the consensus diagnoses from the digital PWG to those of the conventional glass slide PWG.

Exercise no. 1 results

Overall, there was concordance between the consensus diagnoses on 46 (98%) of 47 cases between the digital PWG and the conventional glass PWG held 3 weeks apart with the same participants. This single discrepant diagnosis, which had no effect on the study conclusions, was in the consensus diagnosis for the liver of 1 animal (hepatocellular adenoma in the digital PWG, basophilic cell focus in the conventional glass PWG). In the digital PWG, only 3 cases out of 47 did not have a consensus concerning the diagnosis prior to further discussion. These involved the differentiation between foci of altered hepatocytes and hepatocellular adenoma and between hepatocellular adenoma and carcinoma. Participants noted that it was very difficult to differentiate between normal splenic tissue and minimal degrees of lymphoid hyperplasia using WSI in the digital PWG and between lymphoid hyperplasia and lymphoma using glass slides in the conventional PWG. A variety of distinctive tumors and lesions were easily confirmed, including olfactory neuroblastoma, alveolar bronchiolar adenoma, yolk sac carcinoma, and seminiferous tubule degeneration.

Exercise No. 2: Comparison of Conventional Glass PWG after a Digital Review of Carcinogenicity Bioassays in F344 Rats and B6C3F1 Mice

The PWG review included a 2-phase process with the same 7 pathologists and PWG coordinator in each phase. In the first phase, the pathologists remotely examined the scanned images at their work locations in North Carolina, Arkansas, and the state of Washington. The pathologists were instructed to complete their digital review of the scanned images using the web viewer and submit their recorded diagnoses to the PWG coordinator. The slides were scanned at 20× magnification. Slides selected for rats included degenerative lesions in the mesenteric lymph node as well as proliferative lesions in the forestomach and large intestine (100 total rat cases). Slides selected for mice included proliferative lesions of the Harderian gland, large intestine, glandular stomach, liver, and pituitary gland (48 total mouse cases). The second phase was a conventional glass PWG held 2 weeks later at the NCTR. This exercise compared concordance of the consensus diagnoses based on the review of WSI to those of the conventional glass slide PWG.

Exercise no. 2 results: F344 rat

Agreement between the digital and glass slide consensus diagnoses for the rat lesions was 89%. The final PWG conclusion would have been the same whether using WSI or glass slides. Of the 100 rat cases evaluated using both methods, there were 11 differences between the digital versus glass slide consensus diagnoses, and they often involved the distinction between hyperplasia, adenoma, and carcinoma (Table 1). Individual pathologist agreement of WSI diagnosis versus glass slide diagnosis (intraobserver variability) for the 6 of 7 pathologists that completed the rat cases ranged from 55% to 91% (mean 74% and median 73%).

Table 1.

Discordant diagnoses between digital images and glass slides for rats (11/100 cases, exercise no. 2).

Digital consensus	Glass consensus	No. of cases
Forestomach—hyperplasia	Forestomach—papilloma	1
Large intestine—adenoma	Large intestine—autolysis	1
Large intestine—adenoma	Large intestine—carcinoma	1
Large intestine—adenoma	Large intestine—hyperplasia	2
Large intestine—carcinoma	Large intestine—adenoma	1
Large intestine—carcinoma	Large intestine—hyperplasia	1
Large intestine—hyperplasia	Large intestine—adenoma	1
Large intestine—hyperplasia	Large intestine—carcinoma	3

Exercise no. 2 results: B6C3F1 mouse

Agreement between the consensus diagnoses based on mouse WSI with those based on glass slides was 75%. In the mouse review, the final PWG conclusions and study interpretation would also have been the same regardless of methodology. Of the 48 cases reviewed using WSI and glass slides, there were 12 differences (Table 2). Intraobserver variability for the 6 of 7 individual pathologists who completed the mouse cases ranged from 67% to 89% (mean 83% and median 84%). Lack of concordance between the WSI and glass slide review consensus diagnoses for both the rat and mouse lesions usually centered on distinguishing minimal lesions and borderlines between the proliferative continuums (Tables 1 and 2).

Table 2.

Discordant diagnoses between digital images and glass slides for mice (12/48 cases, exercise no. 2).

Digital consensus	Glass consensus	No. of cases
Colon—goblet cell hyperplasia	No remarkable lesion	1
Harderian gland—adenoma	Harderian gland—carcinoma	1
Hepatocellular adenoma	Hepatocellular carcinoma	1
No consensus	Glandular stomach—epithelial hyperplasia	2
No consensus	Hepatocellular adenoma	2
No consensus	Pituitary gland, pars distalis—hyperplasia	1
No remarkable lesion	Glandular stomach—epithelial hyperplasia	2
Pituitary gland, pars distalis—hyperplasia	No remarkable lesion	1
Small intestine—adenoma	Small intestine—carcinoma	1

Note: Because only 6 of the 7 pathologists completed the review of the mouse digital images, no consensus was established if the diagnoses were evenly divided between the 2 diagnostic choices.

Exercise No. 3: Focused Digital Review of Brain and Spinal Cord Proliferative Lesions

This exercise describes our experience with a relatively large QA peer review of digitally scanned slides. Rat and mouse brain and spinal cord slides (1,916 slides) from a 2-year carcinogenesis bioassay were scanned at 20× magnification, and the WSI were remotely examined using a web viewer. One pathologist evaluated the WSI of the rat and a second pathologist reviewed the WSI of the mouse. Both were asked to review for proliferative lesions only. The results were compared with the SP’s diagnoses.

Exercise no. 3 results

Although there were delays owing to limitations in the amount of storage available on the server, the time required to scan and view the large number of slides, and the slow image loading in the web viewer, the QA review process using WSI was completed sooner than could have been achieved by having the reviewing pathologists coordinate traveling to the NCTR to examine the glass microscope slides. Additionally, the digital QA review resulted in significant cost savings. This digital review, which focused on identification of proliferative lesions, was sufficient to identify 5 additional brain tumors and 3 cases of spinal cord gliosis. These new findings prompted a glass slide review, which verified the findings of the digital review.

Exercise No. 4: Comparison of Digital and Conventional Glass PWGs of a Chronic Bioassay in p53 (+/−) Transgenic Mice

This exercise compared consensus diagnoses from a conventional glass PWG held at the NCTR with those from a digital PWG held 50 days later. The 91 slides were scanned at 20× magnification and represented ranges of normal (notable variability in pancreatic islet size), spontaneous lesions (hydronephrosis), and diagnostic difficulty (lymphoreticular neoplasms). Images reviewed at the digital PWG were the same cases evaluated at the conventional glass PWG. All participants in the digital PWG reviewed the images ahead of time and submitted their diagnoses to the PWG coordinator. Five participants from the conventional glass PWG, plus 2 additional pathologists, participated in the digital PWG at their work locations in Arkansas, North Carolina, Maryland, and Connecticut via computer connections and teleconference.

Exercise no. 4 results

Overall, there was 85% agreement (77/91 cases) in consensus diagnoses made based on glass slides from the original conventional PWG compared with those made using WSI during the subsequent digital PWG. There was 100% agreement on the relatively straightforward diagnoses of pancreatic islet hyperplasia (20/20 cases), hydronephrosis (9/9 cases), and Harderian gland tumor (1/1 case). There was 83% agreement (5/6 cases) when distinguishing sarcoma, fibrosarcoma, leiomyosarcoma, neuroblastoma, and granuloma in various tissues. As one might expect, there was less agreement (80%, 8/10 cases) on lesions within the diagnostic continuum of basophilic focus of altered hepatocytes, hepatocyte hyperplasia, hepatocellular adenoma, and hepatocellular carcinoma. The lowest agreement (75%, 34/45 cases) occurred when distinguishing between lymphoid hyperplasia, malignant lymphoma, histiocytic sarcoma, and granulocytic leukemia.

Exercise No. 5: Comparison of Digital and Conventional Glass PWGs with Each Other and Previously Recorded PWGs

This exercise compared results from contemporaneous digital and glass PWGs with each other, and each were also compared with the original conventional glass NTP PWGs of 2-year carcinogenicity bioassays, using 14 pathologists who had no prior knowledge of the original conventional glass NTP PWG diagnoses. There were 17 rat slides and 11 mouse slides, all scanned at 20× magnification. Changes reviewed included proliferative lesions of the clitoral gland, larynx, liver, mammary gland, nose, pituitary gland, skin, small intestine, thyroid gland, and nose. Nonproliferative lesions included inflammation of the larynx, metaplasia of the larynx, inflammation of the Harderian gland, and inflammation and atrophy in the nose. Two groups of 7 pathologists reviewed both the glass slides and the WSI separated in time by 1 month. All review answer sheets were submitted to and tabulated by the PWG coordinator. The PWGs in this exercise were conducted in the traditional “roundtable” manner where the participants reviewed the data, examined either the glass slides or the WSI, and discussed and voted on the lesions to achieve consensus (Figure 1). Questionnaires were distributed after the PWGs to evaluate the users’ experiences and opinions.

Figure 1.

Pathologists participating in a digital pathology working group.

Exercise no. 5 results

Concordance of consensus diagnoses was 100% for rats (17/17) and 91% for mice (10/11), for an average concordance of 96% between the digital and glass PWGs within this exercise. Interestingly, there were larger differences between the consensus diagnoses between either of the digital and glass PWGs from this exercise and the original glass NTP PWGs, which used different groups of pathologists than in the current exercise. The digital PWG consensus diagnoses in this exercise were the same as the original glass NTP PWGs 71% of the time. The glass PWG consensus diagnoses in this exercise were the same as the original glass NTP PWGs 75% of the time. Most disagreements were in diagnoses involving the diagnostic continuum of normal, hyperplasia, adenoma, and carcinoma. This suggests that intrinsic diagnostic variability may be a greater issue than the method of evaluation of lesions using glass slides or WSI. A disadvantage to this experimental design was that most pathologists (10/14) reported having a memory of cases from the previous review 1 month prior, be it glass or digital.

Participants noted that an important advantage of using WSI for a PWG included the ease of viewing critical areas of pertinent lesions on the screen by the group. The majority (12/14) of the participants agreed with the statement “digitized images best serve as an adjunct to traditional PWGs.” The majority (11/14) of the participants thought it was very useful or essential to have the ability to project WSI of problem cases during the past conventional NTP PWGs. The majority (12/14) of the participants agreed with the statement “experts on the organ of interest should be able to view images digitally and vote remotely.”

Opinion questionnaires captured the concerns of participants that WSI evaluation may be more difficult than glass slide evaluation for the following: large tissue sections (e.g., brain), complex large lesions, large benign neoplasms versus minimally invasive carcinomas, foci of altered hepatocytes (eosinophilic vs. mixed vs. basophilic), inflammatory processes with exuberant fibroplasia versus mesenchymal neoplasia, nuclear changes such as contracted condensed nuclei vs. mitotic figures, subtle lesions, small lesions and/or intracellular changes for which 20× or 40× magnification is required, multifocal lesions such as cardiomyopathy, and tinctorial changes. Evaluation may be hampered by possible limitations in adjusting the focus depth in a tissue using WSI. These limitations perceived by the participants may have led to disagreement by the majority (9/14) of the participants with the statement “digital PWGs will replace conventional PWGs in the future.” Most participants (10/14) felt that the quality of WSI was worse than viewing glass slides on a microscope. It should be noted, however, that a limitation of this exercise and some of the other exercises herein was that slides were scanned at 20× magnification rather than 40×. The conventional glass slide PWG took approximately the same amount of time as the digital PWG in this exercise, but the majority of participants felt it was more accurate.

Exercise No. 6: Comparison of Digital Review with Previously Recorded Conventional Glass PWG

This exercise represents the largest comparison of WSI versus glass slide review of PWG slides done thus far by the NTP. A panel of 14 pathologists, none of whom were present at the original PWG, reviewed WSI scanned at 20× magnification of 122 cases for rats and 143 cases for mice. Participants were given about 1 month to finish the task and submit their diagnoses.

Exercise no. 6 results

Agreement of consensus diagnoses between the WSI review in this exercise and the original glass PWG was 86% for rat and 84% for mouse lesions (Tables 3 and 4). Only 9 of the 14 participants completed the task, for which there was no formal digital PWG conference. Participants took from 8 to >20 hr to complete the review. Lack of agreement usually centered on the distinction along the proliferative continuum of hyperplasia, adenoma, and carcinoma, especially for thyroid and clitoral glands in rats or for skin in mice.

Table 3.

Digital and glass diagnostic agreement in a chronic study in rats (exercise no. 6).

Original glass slide PWG diagnosis	No. of cases	Disagreements digital versus glass	% Agreement digital and glass
Spleen, liver—mononuclear cell leukemia	8	0	100
Testis, epididymis—mesothelioma	20	0	100
Zymbal’s gland carcinoma	7	0	100
Oral mucosa, forestomach, tongue—hyperplasia, papilloma	20	2	90
Miscellaneous tumors	14	2	86
Mammary gland tumors	11	2	82
Heart—cardiomyopathy, schwannoma	20	4	80
Thyroid gland—adenoma, carcinoma	12	3	75
Clitoral gland—hyperplasia, adenoma, carcinoma	10	4	60
Total	122	17	86

Note: PWG = pathology working group.

Table 4.

Digital and glass diagnostic agreement in a chronic study in mice (exercise no. 6).

Original glass slide PWG diagnosis	No. of cases	Disagreements digital versus glass	% Agreement digital and glass
Lung proliferative lesions	9	0	100
Various lesions	8	0	100
Mammary gland tumors	16	1	94
Forestomach proliferative lesions	27	2	93
Ovary tumors	8	1	88
Harderian gland tumors, eye—cataract	34	5	85
Miscellaneous tissues/tumors for which SP and QAP had disagreed	10	3	70
Skin/subcutaneous tumors	31	11	65
Total	143	23	84

Note: PWG = pathology working group; SP = study pathologist; QAP = quality assessment review pathologist.

Exercise No. 7: Comparison of Digital Review of Minimal to Mild Lesions in Common Target Organs with Previously Recorded Conventional Glass PWGs

The final exercise was designed to evaluate the use of WSI in the diagnosis of subtle, usually minimal to mild, lesions in 6 common target organs: liver, kidney, lung, skin, brain, and nasal cavity. A total of 22 slides with 31 diagnoses from previously recorded conventional PWGs were scanned at 40× magnification. A panel of 9 pathologists with no prior knowledge of the original diagnoses recorded their diagnoses on a worksheet with predetermined multiple-choice options to select from for each diagnosis. Results were compared with the diagnoses from previously conducted conventional glass PWGs. The pathologists were asked to rate tissues in order of ease of evaluation using a grading scale ranging from 1 (easiest) to 6 (most difficult). Following this exercise, participants completed a questionnaire evaluating their experiences and confidence in evaluating WSI.

Exercise no. 7 results

Overall, there was 74% agreement of the individual diagnoses for participating pathologists with the original glass PWG diagnoses (range 58–84%). The least successful (22%) diagnostic concordance with an original PWG diagnosis was for a mixed cell focus in a mouse liver. The most successful (100%) diagnostic concordances involved diagnoses of hematopoietic cell proliferation and hepatocellular hypertrophy in the liver and papillary necrosis and tubular necrosis in the kidney in both rats and mice.

Of the 6 tissue types evaluated, skin (mean score = 1.8) was considered as the least difficult tissue to evaluate using WSI, followed by lung (2.1). Nasal cavity (3.2) and kidney (4.4) were considered intermediate in difficulty. Liver (4.9) was considered the most difficult tissue to evaluate digitally, followed by brain (4.6).

All participants agreed that the process was simple and easy to learn and use, but that the WSI quality was slightly inferior to glass slides. Participants were most confident in using WSI to identify lesions that could be easily diagnosed at low magnification. They were least confident when using digital technology to detect subtle intracellular changes (e.g., hyaline droplet accumulation in the kidney) as well as slight tinctorial differences. Some participants expressed the opinion that large organs were especially challenging and difficult to examine. Users strongly disagreed with the effectiveness of using WSI by the SP to evaluate an entire 90-day subchronic or a 2-year chronic study.

A few participants commented that they were dissatisfied and frustrated with some of the technical features and limitations of the image-viewing technology (server access, refresh rate, restricted fine focus, etc.). Several participants commented on the difficulty in achieving optimal image focus (particularly at high magnification). Inefficient viewing was stated to result from protracted “refresh/resolution rates,” which led participants to describe the process as “too slow” or “not good/quick enough.” Several also mentioned that the time required to review WSI as compared with glass slides was generally excessive. Despite the technical limitations, participants stated that remote access to WSI, the capability of sharing images at multiple sites, and the ease of recording and archiving images were highly beneficial applications.

Discussion

This series of exercises suggests that the review of WSI is a useful, reliable, potentially cost-saving, and productive method in the PWG and/or peer review process. In general, the diagnostic accuracy of toxicological pathology rodent lesion diagnoses made by viewing digitally scanned WSI equals that using glass slides.

Accuracy of WSI for Peer Review and PWGs

There was generally a high degree of diagnostic concordance between rodent lesion diagnoses based on glass slides and WSI evaluated during the PWG stage of the NTP peer review process (range 74–100%, mean 87%, and median 86%). This range is remarkably similar to the concordance range of 73–98% reported in a literature review of studies validating WSI for clinical diagnostic use (Pantanowitz et al. 2013). The percentage of discordance may be less important than the type of discordance. In studies comparing concordance of diagnoses based on WSI and glass slides in the clinical setting of human medicine, concordance is often classified as no discrepancy (complete agreement between the 2 diagnoses), minor discrepancy (a difference in the 2 diagnoses with no effect on clinical care or prognosis), and major discrepancy (a difference that affects clinical care or prognosis; e.g., see Jones et al. 2015). In the exercises described herein, many of the discrepancies were along the proliferative continuum (e.g., hyperplasia, adenoma, and carcinoma). Such discrepancies would certainly have clinical ramifications (e.g., a patient with a precursor lesion vs. benign vs. malignant tumor). However, all proliferative findings along the continuum are taken into account in the final carcinogenic activity calls in NTP studies (i.e., clear evidence, some evidence, equivocal evidence, or no evidence of carcinogenic activity), so the final outcomes in the nonclinical, toxicologic pathology setting would generally not be affected. Although diagnoses based on the glass slides remain the official NTP data, it is important to note that the study interpretations as a result of PWGs based on using WSI would not have differed from those based on examination of conventional glass slides.

The results seem to indicate that evaluation of WSI is adequate for obtaining an accurate diagnosis for most types of lesions. Discrepant diagnoses in many exercises frequently centered on distinguishing minimal lesions from normal; the diagnostic continuum of hyperplasia, adenoma, and carcinoma; and between lymphoid hyperplasia, lymphoma, and lymphoma types. However, difficulties in these diagnoses are also typical with conventional glass slide reviews. It appeared that for many of the exercises, different pathologists might render different opinions due to the differences in the normal range of diagnostic interpretation, irrespective of whether glass slides or WSI are used. In exercise no. 5, there was excellent agreement (96%) between the digital and glass consensus diagnoses made by the same pathologists, but lesser agreement (71% and 75%, respectively) when compared with the original previously recorded PWG diagnoses made by a different group of pathologists. Intrinsic diagnostic variability may thus be a greater issue than methodology in some instances.

In some cases, however, effective use of WSI may be somewhat limited. Based on user opinions, WSI may not be optimal for evaluation of subtle lesions, large complex lesions, small lesions in a large section of tissue, foci of altered hepatocytes, cardiomyopathy, or when focus on varying depths within a tissue is required. Pathologists had more confidence using WSI for lesions that could be diagnosed at lower magnification than lesions that required higher magnification and distinction of intracellular details (mitotic figures, cellular pleomorphism, intracellular accumulations, etc.). There was a lesser degree of agreement on distinguishing malignant lymphoma, histiocytic sarcoma, and granulocytic leukemia, which are lesions sometimes dependent upon a more careful detailed cytologic evaluation. When viewing WSI, participants frequently mentioned the limited ability to differentiate subtle color variation and intracellular changes. These challenges were likely reflected in poor correlative scores for commonly observed lesions such as hyaline droplet accumulation in the kidney, foci of altered hepatocytes, and necrosis in the brain, which are lesions dependent on tinctorial contrast to accurately diagnose.

Other factors that may have contributed to diagnostic disagreements include the absence of a concurrent control (e.g., sebaceous hyperplasia in skin), marginal image quality as a result of either poor staining or the age of the section, and user experience and technical ability with the image viewer. Certain measures can be taken to ensure data quality when evaluating WSI. In general, slides should be scanned at the maximum magnification (40×). In 5 of the 7 exercises herein, slides were scanned at 20× magnification, which may have contributed to difficulty ascertaining intracellular detail, decreased diagnostic correlation, and user dissatisfaction. In addition, measures should be in place to ensure that all tissues on the glass slide are present in the WSI, the tinctorial quality of the glass slide is accurately reflected, and the image is otherwise a true representation of the glass slide (Long et al. 2013; Pantanowitz et al. 2013).

The time between digital and conventional glass PWGs by the same participants in this article ranged from 2 to 7 weeks. Most pathologists in exercise 5 mentioned that they had some memory of their previous diagnoses when the PWGs were separated by about 1 month. However, the time intervals in these exercises are in line with the evidence-based guidelines for validating whole slide imaging systems for diagnostic purposes by the College of American Pathologists as described by Pantanowitz et al. (2013) of a “washout period” between viewing WSI and glass slides of at least 2 weeks.

Advantages of WSI in Peer Review and PWGs

WSI are unarguably beneficial to the PWG process in many ways. Use of WSI allows accurate measurement of a lesion, which can be important in the diagnosis of hyperplastic lesions versus adenomas, for example. Having WSI available to review in advance of the PWG can aid in developing familiarity with a study being reviewed. Participants can preview the slides at their convenience, spend as little or as much time as preferred, and prepare for active discussion of the lesions being reviewed. Consequently, the PWG can function more efficiently and expediently.

Having the WSI available during the PWG is generally very helpful for demonstrating lesions and guiding pathologists to a critical area of a lesion or key features to aid in diagnosis, and annotating WSI is generally easier and more accurate than annotating glass slides. The ability to display WSI during the PWG discussion and voting phase ensures that all pathologists are viewing the correct area of the slide or lesion for cases that are controversial or show a lack of consensus. This assures that each PWG participant observes and considers the same morphologic features that form the basis for a specific diagnosis, thereby facilitating consensus building.

A distinct advantage of the use of WSI in the peer review and PWG processes is the flexibility it allows. Digital pathology systems allow remote participation by the SP at the contract laboratory or outside experts who can contribute significantly to the PWG review and discussion. Most participants supported the concept that experts on the tissue of interest should be able to evaluate WSI and vote remotely and that WSI are an excellent adjunct to PWGs. A valuable part of the PWG experience is the face-to-face roundtable discussion atmosphere. Participants preferred having face-to-face interactions in PWGs to foster discussion and resolve discrepancies in the diagnoses. Technology such as videoconferencing can enable remote pathologists to demonstrate specific controversial lesions and help maintain the roundtable discussions of the PWG.

Disadvantages of WSI in Peer Review and PWGs

In general, although the diagnostic accuracy of the pathology results of these rodent findings made by using WSI may have been equivalent to using glass slides, the time required to evaluate WSI was generally regarded as excessive. Processing delays associated with the scanning of the tissue sections and data storage also slowed the overall review process, but this is improved as the technology advances. Completion of WSI evaluation prior to a formal PWG meeting may be limited by individual time constraints. It is significant that 5 of the 14 pathologists selected to do exercise no. 6 did not complete the task when given a 1-month turnaround time. However, there was no formal teleconference that otherwise may have prompted completion of the task.

Because of technical limitations, pathologists in several exercises noted that image loading and refresh times in the viewing software made viewing of large areas of tissues very tedious and slower than if one was evaluating the glass slide. Monitor size and resolution and computer, server, and connectivity speeds all influence the WSI viewing experience. Some of these limitations likely led the participants to list the liver and brain as the most difficult organs to evaluate. It should be noted that advances in digital pathology technology since commencement of these exercises, in addition to diminution of data storage and connectivity limitations, have resulted in decreased slide scanning time and faster loading and viewing of high-resolution WSI.

Pathologists commented that it took some time to learn how to navigate the use of the digital pathology system and this increased the time spent viewing WSI as compared with glass slides. Some participants noted that time spent evaluating WSI decreased with practice. Therefore, a recommendation to increase efficiency, apart from technological advances, would be to have formal training on image viewing technology for participants. Adequate training of pathologists in the use of digital pathology technology has been shown in a literature review to result in greater accuracy of WSI interpretation (95% with training vs. 79% without training), better concordance between WSI and glass slide diagnoses (89% with training vs. 84% without training), and shorter interpretation time (4.9 min with training vs. 11.5 min without training; Pantanowitz et al. 2013).

Conclusions

The quantitative results of this exercise, as well as the subjective survey comments by participants, indicate that WSI evaluation has equivalent applications as the evaluation of glass slides to the PWG and pathology peer review processes with some limitations in time, image quality, and lesion types. Concordance between diagnoses based on WSI and glass slide evaluation in PWGs ranged from 74% to 100%. While it is difficult to specify what percentage of concordance is acceptable or unacceptable, the ultimate answer in these cases is whether or not the outcomes of the PWGs, which lead ultimately to the conclusions of the studies, would have differed based on WSI versus glass slide evaluation. Since the conclusions of the studies would have been the same regardless of methodology, evaluations of WSI or glass slides were equivalent in the PWGs described herein. The use of WSI for digital PWGs has met the goals of involving remote participants such as the original SP and outside experts, improving the quality of PWG discussions and consensus building, and saving resources and costs by eliminating or limiting travel as well as (potentially) slide shipping expenses. The use of WSI has to some degree met the goals of saving pathologists’ time and providing high-quality images with convenient access and easily navigable viewing software. Furthermore, it is expected that digital imaging technology will continue to develop finer image quality and faster refresh/resolution rates which should result in a more user-friendly experience. This should permit the evaluation of larger numbers of images as well as enhance the diagnostic confidence of the user.

Given its current state of development, this technology is an effective tool to review, discuss, and acquire valuable input on a limited number of images and/or lesions. Changes along a diagnostic continuum (e.g., normal, hyperplasia, adenoma, and carcinoma) may be diagnostically challenging regardless of whether glass slides or WSI are used. If the WSI cannot be duly interpreted, the glass microscope slide should be used for pathology peer review or the PWG (Toumari et al. 2007). It is up to the pathologist’s judgment to assess whether the WSI is of sufficient quality to render a diagnosis, as holds true for light microscopy. In conclusion, although technical limitations may curb user confidence in the evaluation of subtle lesions as well as large sections of tissue, digital pathology systems equal glass slide evaluation in the accuracy of toxicological pathology rodent lesions and have multiple useful applications for the peer review and PWG processes.

Footnotes

Acknowledgments

The authors thank all participating pathologists in the exercises reported herein: Amy Brix, Karen Cimon, Mark Cesta, John Cullen, Gordon Flake, Sabine Francke-Carroll, Ronald Herbert, Georgette Hill, Mark Hoenerhoff, Brian Knight, Linda Kooistra, John Latendresse, Robert Maronpot, Paul Mellick, Steven Mog, Rebecca Moore, James Morrison, Todd Painter, Cynthia Shackleford, Robert Sills, and Jerrold Ward. The authors also thank Ann Chavis, Lorri Ezedin, Julie Foley, Carrie Prince, Maureen Puccini, Annette Shambley, Emily Singletary, Alan Warbritton, and Lisa Wiley for technical support. We thank Arun Pandiri and Vivian Chen for their reviews of this article.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NIH, National Institute of Environmental Health Sciences.

Authors’ Contribution

Authors contributed to conception or design (DM, GW, CW, EA, GO, WW, SE, RM, JH, TC, and MB); data acquisition, analysis, or interpretation (DM, GW, CW, EA, GO, WW, SE, RM, JH, TC, and MB); drafting the manuscript (DM, GW, CW, and RM); and critically revising the manuscript (DM, GW, CW, EA, GO, WW, SE, RM, JH, TC, and MB). All authors gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Abbreviations

References

Cesta

M. F.

Malarkey

D. E.

Herbert

R. A.

Brix

Hamlin

M. H.

II Singletary

Sills

R. C.

Bucher

J. R.

Birnbaum

L. S.

(2014). The National Toxicology Program web-based nonneoplastic lesion atlas: A global toxicology and pathology resource. Toxicol Pathol 42, 458–60. Accessed May 3, 2015. http://ntp.niehs.nih.gov/nnl/.

Jones

N. C.

Nazarian

R. M.

Duncan

L. M.

Kamionek

Lauwers

G. Y.

Tambouret

R. H.

C. L.

Nielsen

G. P.

Brachtel

E. F.

Mark

E. J.

Sadow

P. M.

Grabbe

J. P.

Wilbur

D. C.

(2015). Interinstitutional whole slide imaging teleconsultation service development: Assessment using internal training and clinical consultation cases. Arch Pathol Lab Med 139, 627–35.

Long

R. E.

Smith

Machotka

S. V.

Chlipala

Cann

Knight

Kawano

Ellin

Lowe

(2013). Scientific and regulatory policy committee (SRPC) paper: Validation of digital pathology systems in the regulated nonclinical environment. Toxicol Pathol 41, 115–24.

McCullough

Ying

Monticello

Bonnefoi

(2004). Digital microscopy imaging and new approaches in toxicologic pathology. Toxicol Pathol 32, 49–58.

Pantanowitz

Sinard

J. H.

Henricks

W. H.

Fatheree

L. A.

Carter

A. B.

Contis

Beckwith

B. A.

Evans

A. J.

Otis

C. N.

Lal

Parwani

A. V.

(2013). Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the College of American Pathologists Pathology and Laboratory Quality Center. Arch Pathol Lab Med 137, 1710–22.

Toumari

D. L.

Kemp

R. K.

Sellers

Yarrington

J. T.

Geoly

Fouillet

X. L. M.

Dybdal

Perry

Long

(2007). Society of toxicologic pathology position paper on pathology image data: Compliance with 21 CFR parts 58 and 11. Toxicol Pathol 35, 450–55.