Abstract
Agreement among pathologists interpreting histologic specimens is an area of interest within human pathology, but little work in this area has been reported in the veterinary literature. Agreement among pathologists evaluating routine histologic sections of amputated digits from cats and dogs submitted to multiple diagnostic centers was examined. Histologic sections from surgical specimens were reviewed in a blinded fashion by two pathologists, and a comparison to the original diagnosis, as stated in the diagnostic report, was recorded. A total of 513 cases were reviewed, and complete agreement was reached in 409 (79.7%). Of the 104 instances of disagreement, 77 (74.0%) were considered to be of clinical significance. The diagnosis of keratoacanthoma was disagreed with in 19 of 21 diagnoses (90.4%). No other individual diagnosis was similarly disputed. The overall level of disagreement is large and is similar to that reported in human pathology and suggests that further study of this issue would be useful in veterinary pathology.
Histopathologic interpretation is a commonly used method for the detection and identification of neoplastic and inflammatory disease in veterinary medicine and, in many cases, may represent the definitive or only tool used to determine an animal's prognosis and treatment. Histopathology is also commonly used as the basis for the evaluation of new diagnostic tests. 6,21 However, it is known that histopathology is an imperfect means of diagnosis. 4,13 Unlike most diagnostic tools, it has not been subjected to validation. The reliability of a diagnostic test depends upon the reproducibility of interpretation. Histopathology is a test where analysis of a visual image, and its subjective interpretation may lead to variance or disagreement in the reported results. Histopathology is particularly amenable to retrospective agreement studies, because the decisions made by the pathologists are recorded and stored, and the slides and blocks upon which those decisions were made are available for review. 13
Diagnostic agreement is an area of increased interest in human pathology, 5,7,14,18,21 but relatively little has been published in the veterinary literature. Diagnostic agreement differs from diagnostic accuracy in that the actual disease state of the animal is not considered. Rather, diagnostic agreement involves determining whether separate interpretation of the same data arrives at the same conclusions. 1
There are many possible reasons for examining diagnostic agreement: to determine diagnostic repeatability, to assess training or institutional bias variation, to improve consistency of diagnosis, to identify areas where pathologists may disagree and attempt to resolve these, and to assess to what degree interobserver variability may affect research trials. Numerous studies were performed in human pathology to assess the agreement in different settings, and these studies reveal considerable variation in the diagnosis, even among expert groups. 3,8,19 No comparable literature exists for veterinary medicine. Veterinary pathology studies include grading of mast-cell tumors 10,11 and the evaluation of the degree of inflammation within intestinal biopsies. 23 These studies suggest that a similar discordance of agreement exists between veterinary pathologists as does between human pathologists.
Many different types of retrospective agreement studies have been performed, and different positive and negative attributes are present for each. 22 Blinded case review, where the reviewing pathologists do not have access to the previous diagnosis or clinical history of the animal, eliminates some of the bias involved in case selection studies. By way of example of this possible bias, in a human cytopathology study, a review of previously diagnosed slides had a false-negative rate of 73.2%, which was attributed to the fact that knowing someone has previously diagnosed the slide as negative reduces the vigilance of the second observer. 15
Diagnostic veterinary laboratories routinely receive formalin-fixed digits that have been amputated from cats and dogs for evaluation and histopathologic diagnosis. Given that amputation of a digit is often used as both a means of diagnosis and treatment, assessment of the degree of diagnostic agreement should be of interest to practitioners.
The purpose of this study was to use blinded review of a series of consecutively received surgical biopsy specimens of amputated digits from multiple diagnostic laboratories to assess whether substantial interobserver variability exists and if it differs between cat and dog diagnoses.
A retrospective review was performed on a group of surgical specimens collected for another purpose. These specimens consisted of 513 surgical biopsy specimens of amputated digits from 85 cats and 428 dogs. Specimens were obtained from diagnostic laboratories in the United States and Canada: Colorado State University Veterinary Diagnostic Laboratory (CSUVDL) in Fort Collins, Colorado, and Prairie Diagnostic Services (PDS) in Regina and Saskatoon, Saskatchewan. Submissions of biopsy specimens that consisted of an amputated digit from a cat or a dog were identified by using computer-based record searches for the 9.5-year period between January 1, 1995, and June 30, 2004, inclusive, at the PDS Saskatoon Laboratory, the 7-year period between January 1, 1996, and December 31, 2002, inclusive, at the PDS Regina Laboratory, and the 6-year period between January 1, 1996, and December 31, 2001, inclusive, at CSUVDL. For the purposes of this study, amputated digits needed to include some portion of bone within the biopsy specimen to be considered. A total of 12 different pathologists from CSUVDL were represented in the original diagnostic reports, 9 from PDS in Saskatoon and Western College of Veterinary Medicine (WCVM), and 3 from PDS in Regina.
The hematoxylin and eosin stained histologic slides that were used by the original pathologists were retrieved and examined by an experienced veterinary pathologist currently involved in diagnostic work (A.L.A.) and by a veterinary pathology graduate student (B.K.W.), who viewed the slides together at a multiheaded microscope to determine a diagnosis (second diagnosis). In 11 cases, the original slide was not available and a recut section from the same histologic block was used instead. The number of slides reviewed per case ranged from 1 to 7. Slides were examined consecutively, without prior knowledge of the clinical history, signalment, apart from species, or original diagnosis. A diagnosis was derived by using whatever published reference material the pathologists would normally use in the course of his or her regular diagnostic duties, with the exception that, if the lesions were determined to be solely inflammatory in nature, then the term “inflammation” was deemed to be sufficient for this study. No additional diagnostic testing, such as polymerase chain reaction or immunohistochemistry was performed in any of these cases in either the original or the second diagnosis.
The original diagnosis, as reported in the diagnostic report, and the second diagnoses were compared by one of the authors (B.K.W.) and the agreement was recorded. If there was a discrepancy between the original diagnosis and the second diagnosis, then the same slides were reviewed in the same blinded fashion by another experienced American College of Veterinary Pathologists (ACVP) certified veterinary pathologist (B.A.K.) to obtain a third diagnosis. This third diagnosis was compared with the original and the second diagnoses; this was done to determine if disagreement could be attributed to bias in the second diagnosis.
Agreement between the original diagnosis and the second diagnosis based on the veterinary diagnostic laboratory of origin and the species of animal.
Agreement was considered to have occurred when the 2 diagnoses matched given considerations for minor differences in terminology. A clinically significant disagreement was defined as those cases in which a change in diagnosis would likely affect treatment or prognosis as defined by 1 author (B.K.W.) who has had considerable prior clinical experience. Agreement was analyzed by pairwise comparisons of the original and second diagnoses by using a chi-square test. The potential influence of the institution of the origin and of the species, cat or dog, on the rate of agreement was assessed by comparing the rate of agreement between the second diagnosis with the original diagnosis by using a chi-square test.
All 513 surgical specimens from amputated digits (408 from CSUVDL [344 dogs and 64 cats], 105 from Saskatchewan [84 dogs and 21 cats]) of cats and dogs were reviewed in a blinded fashion to determine a diagnosis. Overall, of 513 cases reviewed in this manner, there was complete agreement in 409 cases (79.7%) (see Table 1). Agreement with CSUVDL submissions was 339 of 428 (80.4%) and from Saskatchewan laboratories was 81 of 105 (77.1%). No statistical difference was observed between the diagnoses from the 2 laboratories (P = 0.31). Agreement in cat (71/85 [83.5%]) and dog (328/408 [80.3%]) specimens was similar.
In the 104 instances of disagreement, 77 (74.0%) were determined to be clinically significant. Examples of types of disagreement are shown in Table 2. A third diagnosis was obtained for all 104 disagreements. This third diagnosis agreed with the first diagnosis 41 times (39.4%); the second, 39 times (37.5%); or neither, 24 times (23.1%). Agreement of this third diagnosis with either the original or the second diagnosis was not statistically significant (P = 0.82)
No pattern of agreement between the original and the second diagnosis based on the type of diagnosis was present, with the exception of keratoacanthoma. Keratoa-canthoma was identified as either the original or the second diagnosis 21 times. There was disagreement with this diagnosis 19 times (90.4%), which is significantly different than with any other diagnosis (P < 0.001).
In this study, it was shown that veterinary pathologists who reviewed histologic sections from 513 different amputated digit submissions in a blinded fashion to derive a diagnosis were in agreement approximately 80% (409/513) of the time. This degree of agreement is similar to that found in many human pathology agreement studies 2,7,12,17,20,23 and to a previous veterinary study on the grading of mast-cell tumors. 11 It may be surprising that, in this study, experienced diagnostic pathologists who reviewed the same case material reached different diagnoses 20% of the time and perhaps even more so that this disagreement is clinically significant nearly three quarters of the time. However, in 1 human study of the grading of melanomas by experts in the diagnosis of cutaneous melanomas, there was unanimous agreement only in 13 of 37 so-called classic cases. 3
Examples of the most common instances of disagreement between the original diagnosis and the second diagnosis in amputated canine digits.
Institutional bias is a concern in human pathology; in the present study, no evidence of such an influence was present. In the human pathology literature, there are studies that reveal that institutions may vary considerably with others in the diagnoses that they give when reviewing the same case material. 5,19 However, only institutions from 2 areas were assessed in this study, and the involvement of more institutions over a wider geographic area may well reveal that this same variability exists among veterinary pathologists.
Why does disagreement occur and is it a concern for clinicians and pathologists? The very essence of a histopathologic diagnosis is the analysis of complex visual images and an interpretation based on that analysis. There is an unavoidable degree of subjective interpretation within this process that leads to an increased possibility of variance. Studies in human pathology show that, with increasing complexities of the histologic sample and with tissues that are not assessed on a regular basis, this variance will increase. 21 Certainly, the digits of dogs and cats are complex to interpret histologically, because there is a wide variety of different tissues present: bone, joint, tendon, fibrous tissue, adipose tissue, nail-bed epithelium, vessels, glands, and haired skin all in close apposition, with frequent ulceration and secondary inflammation. Frequently, the quality of the histologic section is less than ideal, given the inherent difficulty in achieving consistent trimming of digits of greatly differing sizes and degree of pathology. Further, within an individual slide, there are a myriad of microscopic fields to examine, and the choice of field and weight any given pathologic change is given will vary from observer to observer. In addition, in a small number of cases, the original slide(s) was not available, and a recut section from the original histologic blocks was examined instead. Although these sections would be within a few microns of the original slide, it is possible that some features would be different between the original slide and the recut section, and that this may have contributed to disagreement.
Commonly, in reports of diagnostic agreement among human and veterinary pathologists, the cases are reviewed in a completely blinded manner. 2,5,7,14,16,18,20,21,23 No knowledge of the prior diagnosis, the clinical history, or the signalment of the patient are provided. To make this study as similar as possible to those cases, blinding was done in a similar fashion. This opens the possibility that some disagreement may have been created because of the blinded nature of this study. However, it would be expected that surgical pathologists would make their diagnoses based on what they see in the surgical material submitted to them. Particularly in the case of amputated digits, the histories are often remarkably similar between various diseases, both neoplastic and inflammatory, given the limited responses displayed in lesions of the digit and, as such, the history may not be as useful an aid in making diagnoses as may be the case in submissions from other sites on the body.
Surgical reporting is prone to normal human error, and, in some cases, the diagnosis is not clear. In 1 paper on histopathologic diagnosis, the investigator suggests that, among reasons for disagreement, is that submitted specimens often contain morphologic changes that are minor and that the natural history of the underlying disease process is incompletely understood. 9 This holds true for veterinary pathology, and, although the distinction between neoplasia and inflammation seems to be straight-forward to the clinician, in reality, it can be very challenging for the veterinary pathologist. 23
The high degree of disagreement in the diagnosis of keratoacanthoma suggests that this diagnosis is problematic for pathologists. The most common variant diagnosis for keratoacanthoma was a well-differentiated squamous- cell carcinoma. Whether this is because of a lack of obvious diagnostic features, a reluctance to interpret a destructive lesion as a benign neoplasm or lack of knowledge is unclear.
The degree of disagreement present in this study begs the question of what factors contribute to a diagnosis and how to improve reliability? The choice of reference material used to assist pathologists in making their diagnosis will certainly vary among pathologists and change over time. The studies on the grading of mast-cell tumors reveal that the use of a standardized grading reference considerably improves the repeatability of grading. 10,11 The Fascicles for the Histologic Classification of Tumors of Domestic Animals (World Health Organization) represents an attempt to standardize the classification of neoplasms. They are limited, however, in the space they devote to any particular neoplasm, and, with the rapid increase in knowledge, reclassifications and different grading or prognostic schema come into existence more quickly than these fascicles can be updated. As such, there exists no universally accepted classification or grading schemes for the diagnosis of neoplasia in domestic animals, and this likely represents a considerable impediment to inter- observer agreement.
Terms used to describe neoplasms often include implicit criteria that may be alternately interpreted by other pathologists, as happens in human pathology. 17 Standardization of terms would likely increase interobserver agreement. Peer review of a diagnosis is commonly used in human pathology and may have a place in veterinary pathology as well. Review introduces redundancy as an error reduction mechanism to the diagnostic process. The drawback is the increased time and expense this would create. Ideally, a direct meeting among pathologists at a multiheaded microscope or via video microscopy would allow for maximal discussion and may help standardize the understanding of implicit descriptive terms, criteria, and diagnoses.
Further studies to assess repeatability of a diagnosis by individual pathologists (intraobserver agreement), blinded reviews of photomicrographs (to reduce field choice variability), and assessment of agreement of other anatomic sites would help to quantify what degree of variability exists in a histopathologic diagnosis by veterinary pathologists and if there are areas where improvements may be made.
