Sage Journals: Discover world-class research

Abstract

The Kenney–Doig scale is a histopathology categorization (grading) system often used as the standard for assessing endometrial disease and communicating prognostic fertility information for equine breeding prospects. We investigated how Kenney–Doig categories compared within the same institution and across different institutions to determine if observer variability may contribute to category frequencies. We conducted a retrospective analysis of all equine endometrial submission records between 1998 and 2018 at the Western College of Veterinary Medicine (WCVM) and Prairie Diagnostic Services (PDS). Of 726 biopsies, we found the following category distribution: 46 of 726 (6.3%) I, 307 of 726 (42.3%) IIA, 326 of 726 (44.9%) IIB, and 47 of 726 (6.5%) III. We also conducted a review of the literature and included 6 studies reporting Kenney–Doig category distributions. Chi-square analysis showed significant differences between the category distribution found at WCVM and PDS and the category distribution reported in the 6 studies. To account for differences in mare populations, individual category distributions were generated for 5 pathologists at the WCVM and PDS. The Fisher exact test among these 5 Kenney–Doig categories revealed significant differences in category tendencies, suggesting that observer variation affects the use of the scale. Our results suggest that there is a need for prospective inter-rater and intra-rater agreement studies of the repeatability of the Kenney–Doig scale.

Keywords

endometrial biopsy equine Kenney–Doig observer variation retrospective

The equine breeding industry is a unique field in the scope of large animal medicine. Although the majority of livestock species have been bred to improve their reproductive efficiency, most modern horse breeds have instead been artificially selected to improve aesthetics and sport performance; such breeds are often expected to reproduce only after retiring from a successful performance career at an older average breeding age than most domestic species. As a result, horses have one of the lowest reproductive efficiencies of domestic species, and often struggle with infertility.^24,46 Equine clinicians are tasked with diagnosing the cause(s) of infertility in these mares and directing therapy to successfully breed them and produce live foals.

A breeding soundness exam is used to diagnose disease and guide treatment of such problem mares. This exam includes obtaining a mare’s reproductive history and evaluating the reproductive tract via rectal palpation, ultrasonography, hysteroscopy, uterine cytology, uterine culture, and uterine biopsy. Although uterine cytology and culture have become the predominant tests used to identify endometrial inflammation, given that they are less invasive and more convenient, endometrial biopsy is considered the gold standard when looking for endometrial disease.^17,23,30 Since the 1980s, the histologic changes seen in endometrial biopsies have been used to classify the mare in 1 of 3 categories proposed by Kenney and Doig,¹⁷ with the middle category split into IIA and IIB, resulting in 4 functional categories. These categories are thought to be correlated with a percent chance of the mare producing a live foal, enabling clinicians to give clients a more definitive answer regarding the mare’s reproductive potential.

The Kenney–Doig scale is built upon categorizing different histologic lesions, such as inflammation or fibrosis, as absent, mild, moderate, or severe based on semiquantitative guidelines. The severity of each histologic lesion and the combination of these lesions, in conjunction with the mare’s age, reproductive history, and other clinical findings, is used to determine which category an endometrial biopsy will receive.¹⁷ Ultimately, the scale is meant to evaluate the type and quality of histologic lesions present, whether the endometrium reflects the stage of the sexual cycle the mare was in at the time of biopsy, and how likely the endometrial histologic lesions will improve with appropriately guided therapy. This information is communicated to the submitting clinician along with the final Kenney–Doig category and its associated fertility prognosis.

Although the Kenney–Doig scale is considered the industry standard for categorizing endometrial biopsies, criticism involving the subjectivity of the category guidelines has surfaced in the literature.^9,31,32,43 The guidelines set by Kenney and Doig to determine whether the inflammation or fibrosis present in a biopsy qualifies as absent, mild, moderate, or severe are potentially vague and not defined specifically. In particular, the 2 middle categories of IIA and IIB include a wide range of qualifying lesions with potential overlap between the categories. Given the overlap between the IIA and IIB categories, and the wide prognostic value associated with each category, these middle ranks may act as a “catch-all” for borderline biopsies. In other histopathology grading systems, observers have been noted to categorize predominantly within the middle categories of scales and avoid the extremes, possibly given their reluctance in upgrading a borderline biopsy to the next more severe category.^4,19,29,44

Aside from the criticism regarding subjective guidelines, other concerns regarding the Kenney–Doig scale have been raised. New endometrial diseases have been identified that are not included in the scale guidelines.^38–40 The use of endometrial biopsy as part of the breeding soundness exam is no longer considered routine by many equine clinicians and theriogenologists, and is now reserved as an additional test when other, less-invasive tests, such as transrectal palpation, ultrasonography, uterine cytology, and uterine bacterial culture, have already been done.^20,21,36 Given the continuous evolution of minimally contaminated breeding techniques and new uterine therapies, foaling rates are continuing to improve, specifically for biopsies within the 2 middle categories. Therefore, less emphasis is being placed on the foaling rates originally proposed by Kenney and Doig, and overall categorization of biopsies has become less important in the theriogenology world, in which theriogenologists are now trained to read their own biopsies. Instead, the focus has shifted to a more qualitative assessment of the endometrial disease present and its ability to respond to treatment, regardless of numeric category.

Despite this shift in equine endometrial biopsy evaluation within other specialties, pathologists across Canada and the United States are routinely tasked with grading an endometrial biopsy. Given the criticisms involving the semiquantitative and possibly subjective guidelines of the Kenney–Doig system, and its continued use in diagnostic pathology, we investigated Kenney–Doig category distributions reported in the literature and within our own institution to determine the incidence of specific Kenney–Doig categories and if observer variation may explain differences in categories assigned. To investigate the relative incidence of Kenney–Doig categories in Western Canada, we performed a retrospective evaluation of the Western College of Veterinary Medicine (WCVM; University of Saskatchewan, Saskatchewan, Canada) and Prairie Diagnostic Services (PDS; Saskatchewan, Canada) equine endometrial biopsy submissions to generate an institution-wide category distribution curve. To determine if the incidence of Kenney–Doig categories at the WCVM and PDS is a repeatable finding among other institutions, we identified 6 studies from the literature and used them for comparison purposes, with a discussion regarding how differences in mare populations may or may not account for any differences in Kenney–Doig category distributions. To decrease variability in mare population differences and look for preliminary evidence of observer variation within a similar mixed mare population within the same geographic region, individual category distributions from 5 pathologists at the WCVM and PDS were compared.

Our observations are limited by the different mare populations and biopsies categorized between studies and pathologist; however, we offer an in-depth discussion of endometrial disease and the potential influence of mare demographics and observer variability. This publication is the first in a 2-part series, and it illustrates the need for a prospective reliability study concerning the current use of the Kenney–Doig scale by diagnostic pathologists, the results of which are published in our second manuscript.

Materials and methods

Retrospective analysis of endometrial biopsies at the WCVM and PDS

Endometrial biopsy records were collected from the Veterinary Diagnostic Services software (1998–2014) and the Prairie Diagnostic Services Casebook 2 (2014–2018), the institution-wide computerized databases used for pathology submission recordkeeping at the WCVM and PDS, respectively. A free text search engine was used for the record history, final diagnosis, and comments for terms including “endometrial biopsy, uterine biopsy, endometrium, Kenney” and performed on all equine surgical biopsy submissions from 1998 to 2018. The signalment, submission history, final diagnosis including Kenney–Doig category, histopathology comments, and the reporting pathologist were recorded. The final biopsy categories were then plotted on a bar graph to show the institution-wide distribution of categories of endometrial biopsies. Separate distribution curves were generated for 5 pathologists who had contributed ≥50 biopsy submissions over the allotted time.

Retrospective review of Kenney–Doig categories reported in the literature

To collect a database of similar published material, the following inclusion criteria were set: the study must have 1) categorized endometrial biopsies according to the scale proposed by Kenney and Doig, 2) reported the grading distribution in the publication (either in total counts or in frequency proportions), and 3) used a mare sample size ≥150.¹⁷ If studies reported multiple category distributions, such as comparing distributions based on the use of barren history or before and after uterine therapy, the distribution that best represented the situation observed at the WCVM and PDS was used for more accurate comparison. If studies reported between-category diagnoses, for example a category I–IIA, these were excluded and only discreet Kenney–Doig categories were used. If the category distributions were reported as raw count data, these were recorded, and if reported only as a percentage, the raw count data were calculated from the overall sample size and each category percentage. If the authors did not report percentage frequencies for their Kenney–Doig distribution data, these were calculated from the raw counts and overall sample size, and then used for descriptive comparison.

Statistical analysis

Raw counts for the category distributions for the 5 WCVM pathologists were uploaded into the statistical software R (v.4.0.2, http://www.R-project.org). An initial Fisher exact test was performed on all 5 grading distributions, followed by separate Fisher exact tests for pairwise comparison of each pathologist’s category distribution. A Bonferroni adjustment was made to calculate a new p-value threshold for significance to adjust for the increased possibility of observing a significant difference as a result of chance with multiple comparisons (0.05/6 = acceptable p-value threshold <0.008).

Similarly, raw counts for the grading distribution of each of the 6 published studies, as well as the counts for the category distribution found at the WCVM and PDS, were input into R. Chi-square analysis was done on all 7 category distributions, followed by separate pairwise chi-square tests between each study’s distribution and that found at the WCVM and PDS. Again, a Bonferroni adjusted p-value was calculated for an adjusted significance threshold to adjust for multiple comparisons and the possibility of increased error (0.05/10 = acceptable p-value threshold <0.005).

Results

We identified 755 biopsy submissions; however, 29 submissions had been assigned “between” Kenney–Doig categories, for example, between a category I and a category IIA. We included mare age range, breed distribution, and reproductive histories from 726 equine endometrial biopsy submissions in the final distribution analysis of the WCVM and PDS databases (Table 1).

Table 1.

Sample size, age range, breeds, reproductive history, and geographic regions from 6 studies of equine endometrial biopsy Kenney–Doig categorization in the literature and at the Western College of Veterinary Medicine (WCVM) and Prairie Diagnostic Services (PDS).

Study	Sample size	Age range (y)	Breed	Reproductive history	Geographic region
31	530	3–23	Thoroughbred (81.3%), cross-bred (11.1%), Arabian (3%), pony (2.3%), Heavy horse (2.1%), Standardbred (0.2%)	Problem	United Kingdom
45	192	4–24	European Warmblood (41.7%), Thoroughbred (28.6%), light draft horse (14.1%), Standardbred (8.3%), Arabian (6.8%)	Mixed	Switzerland
28	154	4–21	Thoroughbred (100%)	Unspecified	Japan
18	164	3–23	Retired performance mares (100%)	Mixed	Germany
16	816	20–32	Various	Mixed	Germany
35	8,795	1–30	Various	Mixed	Germany
WCVM and PDS	726	3–30	Quarter Horse (39.9%), Thoroughbred (11.6%), unspecified/unknown (12.3%), unspecified Warmblood (11.0%), Arabian (4.5%), Clydesdale (2.6%), Hanoverian (2.5%), American Paint (2.1%), Percheron (1.9%), Morgan (1.8%), Appaloosa (1.7%), Miniature Horse (1.4%), Standardbred (1.1%), Welsh Pony (1.0%), American Saddle Horse (0.7%), Belgian (0.6%), Tennessee Walker Horse (0.6%), Friesian (0.6%), Andalusian (0.4%), Cleveland Bay (0.4%), Holsteiner (0.3%), Trakehner (0.3%), pony (0.3%), Fjord (0.3%), Westphalian (0.1%), Shire (0.1%), Pinto (0.1%)	Mixed	Western Canada

Mixed = mare population consisting of various reproductive histories including those considered healthy with successful foaling history to those with known fertility issues (if 100% of the population did not meet the problem definition, the reproductive history was labeled as mixed); problem = mares with known fertility issues including history of barrenness, subfertility, and unsuccessful breeding attempts; unspecified = samples used for endometrial biopsy categorization were from mares that had a negative uterine culture and determined “non-infectious,” however, no other specific reproductive history was given.

Endometrial biopsy categories were distributed as follows: 46 of 726 (6.3%) category I, 307 of 726 (42.3%) category IIA, 326 of 726 (44.9%) category IIB, and 47 of 726 (6.5%) category III (Fig. 1). Most (87.2%) of the biopsies were categorized as either category IIA or IIB. Stratifying mares by either age (<20-y-old vs. ≥20-y-old) or breed (comparing the 2 predominant breed groups, Quarter Horses and Thoroughbreds) did not significantly change the shape of Kenney–Doig category distributions, still resulting in a higher incidence of categories IIA and IIB, with fewer biopsies assigned to categories I and III.

Figure 1.

Graphical representation of the Western College of Veterinary Medicine and Prairie Diagnostic Services institution-wide Kenney–Doig category distribution of 726 equine endometrial biopsies submitted between 1998 and 2018.

Although the WCVM and PDS are teaching institutions, and many of these biopsies may have been initially evaluated by pathology residents, every histology report was reviewed and finalized by a senior diagnostic pathologist. In total, 48 different senior pathologists and 2 board-certified theriogenologists contributed to the grading of these submissions; pathology residents were not included.

Five pathologists were identified who had categorized ≥50 endometrial biopsies, and individual distribution curves were generated for each of them (Fig. 2). All 5 pathologists ranked the majority of their categories in one of the middle-ranked categories IIA and IIB. The initial Fisher exact test performed on all 5 category distributions of pathologists A–E revealed significant differences within the group (p < 0.001). Additional Fisher exact tests comparing the individual category distributions of pathologists A–E in a pairwise fashion found that all pathologists’ category distributions were significantly different from each other (p < 0.001, with Bonferroni adjusted threshold at 0.005) except for pathologists A and C (p = 0.006) and pathologists A and E (p = 0.065).

Figure 2.

Kenney–Doig category distributions for 5 pathologists at the Western College of Veterinary Medicine and Prairie Diagnostic Services. Criteria for inclusion of pathologists for analysis included grading ≥50 equine endometrial biopsies in the period of study. Number of endometrial biopsy categories assigned per pathologist: A = 104, B = 59, C = 128, D = 65, E = 61. Fisher exact test revealed significant differences (p < 0.001) between the category distributions of pathologist A vs. B, A vs. D, B vs. C, B vs. D, B vs. E, C vs. D, C vs. E, and D vs. E.

Six studies from the literature met our inclusion criteria (Table 1).^{16,18,28,31,35,45} Four of the 6 studies had most (>50%) biopsies categorized within the middle-ranked categories, but with variable splits between IIA and IIB as the most commonly assigned category (Fig. 3). An initial chi-square analysis on all 6 Kenney–Doig category distributions from the literature and that of WCVM and PDS found significant differences between grading trends with X² = 1,628.2 (18) (p < 0.001). Individual chi-square tests comparing each distribution from the literature against that at WCVM and PDS found that every identified Kenney–Doig category distribution reported was significantly different from the distribution at WCVM and PDS (p < 0.001, with Bonferroni adjusted threshold at 0.008).

Figure 3.

Frequency distributions of Kenney–Doig categories in 6 published studies and that found at the Western College of Veterinary Medicine (WCVM) and Prairie Diagnostic Services (PDS). 1) Ricketts and Alonso³¹ (n = 530); 2) Waelchli⁴⁵ (n = 192); 3) Nambo et al.²⁸ (n = 154); 4) Kilgenstein et al.¹⁸ (n = 164); 5) Kabisch et al.¹⁶ (n = 816); 6) Schilling³⁵ (n = 8,795); 7) WCVM and PDS (n = 726). Chi-square analysis revealed that all 6 studies had significantly different (p < 0.001) Kenney–Doig distributions compared to the WCVM and PDS individually.

Discussion

The Kenney–Doig category distribution generated from the WCVM and PDS equine endometrial biopsy databases suggests that categories IIA and IIB are assigned predominantly compared to the extreme categories of I and III. The category distributions in 6 published studies were significantly different from the category distribution in the WCVM and PDS review.^{16,18,28,31,35,45} To critically evaluate these significant differences between category distributions in the literature and the distribution at the WCVM and PDS, multiple variables that may affect the Kenney–Doig category assigned to a given biopsy must be considered. Two overarching factors determine the Kenney–Doig category: the presence and nature of endometrial disease in the biopsy presented for grading evaluation, and the accurate quantification of histologic lesions representative of said disease. To put it simply, the mare and the observer determine the Kenney–Doig category; however, a variety of confounding variables may influence the mare or the observer and ultimately the assigned category.

Each study assigned Kenney–Doig categories to a different mare population, which makes direct comparison of Kenney–Doig categories difficult. Therefore, certain factors must be discussed such as the age of the mare, reproductive history, breed, and performance use.

It is well established that the incidence of endometrial fibrosis increases as mares age.^{9,11–13,16,18,32,45} Mare populations that consist predominantly of older animals may therefore be expected to have a higher incidence of endometrial fibrosis and more severe Kenney–Doig categories. One retrospective analysis was conducted on 816 biopsies submitted from exclusively mares that were ≥20-y-old.¹⁶ Without taking into account the history of barrenness, the majority of biopsies were classified as either category IIB or category III for both mares 20–24-y-old (83%) and mares 25–32-y-old (93%).¹⁶ Another study found a similar trend in which mares ≥20-y-old were assigned mostly to categories IIB or III; younger mares, such as those 1–5-y-old, were assigned mostly to categories I or IIA.³⁵ At the WCVM and PDS, younger mares did not tend to have a higher incidence of less severe biopsy categories, and older mares did not tend to have a higher incidence of more severe biopsy categories. When broken into age groups, we found that both mares ≥20-y-old and mares <20-y-old had higher incidences of the categories IIA and IIB, with relatively smaller proportions of category I and category III biopsies. Age did not change the relative incidence of Kenney–Doig categories at the WCVM and PDS.

In addition to accounting for differences in ages, the reproductive histories of the population must also be considered. In one study, all mares were biopsied because of a history of subfertility or genital abnormality.³¹ Of the initial Kenney–Doig categories assigned before targeted uterine therapy was given, 87% fell within category IIB, which the authors attributed to the expected endometrial histologic lesions of barren mares at the end of an unsuccessful breeding season.³¹ In contrast, the mares submitted to the WCVM and PDS were from varied reproductive backgrounds including some pre-purchase or pre-breeding biopsies from relatively healthy mares without a history of barrenness or subfertility. Given the variety of reproductive histories submitted, and that many of the submission histories were missing vital reproductive information, we were unable to accurately compare an adequate number of “healthy” mares to “problem” mares at our institution.

Reproductive histories should also include whether a mare was barren in the previous breeding season and if so, how many years have passed since her last pregnancy. There is ample evidence in the literature that the longer a mare remains barren, the higher the likelihood of endometrial fibrosis and issues with subfertility.^{7,16,17,22,42,45} Given the significance of this finding, Kenney and Doig¹⁷ included barrenness as an important modifier when grading biopsies; a mare with a barrenness for >2 y automatically moved the mare into the next higher category of endometrial disease. This modification, however, is dependent on the submitting clinician receiving and providing an accurate reproductive history. Again, the submission histories received at the WCVM and PDS often did not contain adequate information regarding a mare’s history of barrenness, hence accurate comparison between those mares who were considered barren for a certain number of years was not possible. When comparing against another study,¹⁶ we chose the category distribution provided, which did not account for barrenness.

Aside from age and reproductive history, a mare’s breed may also influence the likelihood of endometrial disease. Differences in breeds regarding the amount of inbreeding are considered to have a negligible effect on fertility; however, the exclusive career use and associated management for specific breeds is a unique variable. The most common example of this feature is the Thoroughbred racing industry. Many Thoroughbred fillies are subjected to high levels of athletic performance early in life, experiencing rigorous training schedules that result in lean body conditions that deplete fat reserves throughout the body, resulting in sinking and sloping of the vulva and poor perineal conformation.¹⁴ This compromises the vulvar seal and predisposes these mares to vaginal and uterine contamination leading to conditions such as pneumovagina and endometritis.¹⁴ Racing Thoroughbred fillies commonly have poor perineal conformation, and broodmares often undergo vulvoplasties in an attempt to decrease air and fecal contamination of the reproductive tract. It may be argued that any breed with a commonly associated sport, such as Thoroughbreds and flat track racing, Standardbreds and pacing, or Warmbloods and eventing, may have specific conformational changes associated with their high-level career that could influence the incidence and severity of endometrial disease present.

One study used Thoroughbred mares exclusively when categorizing 154 endometrial biopsies according to Kenney and Doig.²⁸ Although it was not specified whether these mares were currently racing or had previously retired off the track, one might assume that at least a portion of the population sampled had some form of current or previous racing career. This is in contrast to the breed demographics seen at the WCVM and PDS where the most common breed is the Quarter Horse followed by a mixture of lighter horse breeds such as Thoroughbreds and Arabians and various heavier draft breeds. When comparing the Kenney–Doig category distribution found in the Thoroughbred study in which most samples were category IIA (70.8%) to that of the WCVM and PDS where category IIA only comprised 42.3%, this difference in breed representation must be considered as it may affect the incidence and degree of endometrial disease seen.²⁸ However, separate Kenney–Doig category distribution curves for the Quarter Horse and Thoroughbred breeds at the WCVM and PDS followed a bell-shaped curve similar to the overall institutional distribution, with most biopsies falling within categories IIA and IIB, suggesting that breed differences were not present in our database of endometrial biopsies.

Regardless of breed, the level of physical exercise and type of training regime a mare is engaged in may also affect the incidence of endometrial disease. Any mare, regardless of breed, may experience loss of perineal fat reserves in response to higher levels of training resulting in poor vulvar conformation and increased risk of endometritis.¹⁴ Exogenous progesterone supplementation, often used on performance mares to suppress estrus behavior during the show season, has also been associated with higher susceptibility to endometritis and possible hormonal dysfunction affecting the endometrium.^3,8,18 Performance mares also may contend with higher levels of both psychologic and physical stress, higher body temperatures for prolonged periods of time during exercise, and higher incidences of corticosteroid use for joint maintenance than the average pleasure horse, all of which may contribute to altered hormonal status and changes in endometrial glandular function.^3,18 Additionally, performance mares are often bred only after proving themselves successful in their given sport, resulting in an older average breeding age and increased likelihood of endometrial fibrosis. In one study, endometrial biopsies were examined from a group of exclusively retired performance horses.¹⁸ Although the incidence of endometritis and endometrial fibrosis appeared similar to other studies using pleasure mares, the authors found a much higher incidence of endometrial maldifferentiation in these sport mares.^18,38–41 Endometrial maldifferentiation is a phenomenon characterized by an irregular pattern of differentiation in endometrial glandular epithelium, which is postulated to affect glandular secretions and overall fertility.^38–41 Although the Kenney–Doig scale does not include endometrial maldifferentiation, it does take into account inflammatory lesions such as those caused by endometritis and generalized non-physiologic glandular atrophy, both changes that performance mares may be more prone to developing.^3,8,14 In sum, drawing parallels between the retired performance mares¹⁸ and that of the WCVM and PDS may be confounded by the effects of strenuous exercise, competition, and different breeding practices regarding sport mares. We were unable to stratify our own data according to mare “use” (i.e., performance vs. pleasure career), because most submission histories did not contain this information.

One study⁴⁵ does more closely resemble that found at WCVM and PDS in regard to age range, breed representation, and reproductive histories (Table 1). Although these demographics arguably best reflect the mare population seen at the WCVM and PDS, the Kenney–Doig category distributions reported for both populations were still significantly different. The former study reported most (52.1%) biopsies as category I, almost 10 times the 6.3% of category I biopsies classified at the WCVM and PDS.⁴⁵ Although the former study had a similar incidence of categories IIA and III biopsies, the study only reported 14.1% as category IIB, much less than the 44.9% of category IIB biopsies assigned at the WCVM and PDS.⁴⁵ Despite the relative similarities in mare demographics, the former study was completed in Switzerland on mares in a distinctly different geographic region than the mares seen in Western Canada at the WCVM and PDS. Regional differences in local environmental pathogens and the incidence of endometritis, the prevalence of certain breeds and specific high-level sports in the area, common breeding management, and veterinary practices may also influence the Kenney–Doig categories assigned, and this must be considered for the other studies identified from the literature as well.

Given all of these differences in mare population demographics among the 6 studies identified in the literature and that of the WCVM and PDS, it is difficult to conclude whether the higher incidence of category IIA and IIB is an expected finding at our institution based on a normal distribution of a local population of mares. However, when we stratified our data into young (<20-y-old) and old (≥20-y-old) mares, we did not see changes in Kenney–Doig category distributions as reported by others.¹⁶ We also did not see any differences in Kenney–Doig categories based on breed when comparing the Quarter Horse population to the Thoroughbred population seen at our institution. This may suggest that a second factor is influencing the incidence of Kenney–Doig categories at the WCVM and PDS, namely, observer variation regarding the interpretation and application of the scale guidelines.

When using histopathology grading scales, it has been shown that observers often gravitate towards assigning middle ranks.^4,19,29,44 This may occur as a result of differing views among observers in determining the threshold between histopathology categories, hesitation to increase the category severity when only a small portion of the biopsy is affected, or the inherent subjectivity of the grading system.^2,19,29 Given the vague and subjective guidelines of the Kenney–Doig system, particularly of the 2 middle categories, observer bias and variation may contribute to the higher proportion of categories IIA and IIB seen at the WCVM and PDS. To investigate this observation, individual category distributions from 5 contributing pathologists at the WCVM and PDS were generated and compared by the Fisher exact test, allowing analysis of different grading tendencies within the same local mare population.

The individual grading distributions generated from the top 5 contributing pathologists in our retrospective review illustrate obvious differences in grading tendencies between pathologists when compared in a pairwise fashion. All 5 pathologists assigned more of either one or both of the middle categories than the categories I or III. This may reflect the population demographics each pathologist happened to evaluate, as a result of avoidance tendencies when it comes to utilizing either end of the scale, or because of differences in training regarding the Kenney–Doig scale itself. Of the 5 pathologists examined, 4 completed their training at WCVM; 1 pathologist completed their residency in the United States. Another source of variation among categories assigned may be the clinical history available for certain cases, and whether pathologists incorporated the number of years barren as modifiers for the final Kenney–Doig category assigned. Although many of the submissions in our database did not contain enough information to make this modification, given that only 3.3% of submissions made specific mention of number of years barren, the final reports were often unclear as to whether the category was made based on histology alone or in conjunction with clinical history and the stage of sexual cycle. Finally, the differences observed in these category distributions may also be the result of ambiguity between what should differentiate a category I or III from the category IIA or IIB. When looking at the shapes of the distribution curves between pathologists, there are obvious differences in whether a category IIA or IIB was assigned, which could offer evidence that observer variation is occurring.

Although these pathologists were grading biopsies submitted from the same local mare population, and were likely to see a similar mix of histologic lesions during their diagnostic duties, we cannot definitively conclude that differences in grading tendencies were in part the result of observer variation, given that no 2 pathologists categorized the same group of biopsies. We believe the high incidence of categories IIA and IIB found at the WCVM and PDS, both overall and within individual pathologists, warrants further investigation regarding inter-rater and intra-rater agreement produced by the Kenney–Doig scale, to determine if observer variation is in part responsible for the different category distributions reported in the literature and at our institution.

Inter-rater agreement, whether separate individuals agree on a diagnosis, and intra-rater agreement, whether the same individual agrees on the diagnosis of a given biopsy at different times, are fundamental to the reliability of a grading system. Without reproducibility between separate observers, clinicians cannot accurately interpret Kenney–Doig categories assigned by different diagnostic laboratories, or different pathologists within each laboratory. Although inter- and intra-observer agreement studies have become popular in testing the reproducibility of histologic grading of certain types of cancer or degenerative conditions in both human and veterinary medicine, no studies exist on the observer agreement concerning the Kenney–Doig scale.^{1,4–6,10,15,19,25–27,29,33,34,37,42} If the Kenney–Doig system proves to be unreliable among diagnostic pathologists, this information will add to the growing evidence that categorization of equine endometrial biopsies is less important than critically evaluating the type and response to treatment of certain endometrial lesions, as is currently practiced in the theriogenology field. Additionally, it may highlight the importance of inter-disciplinary communication between pathologists and theriogenologists regarding equine reproductive pathology, and possibly the need to submit such biopsies to equine reproduction–oriented pathologists or theriogenologists for more qualitative and clinically practical advice regarding mare management. Based on the preliminary work done in our review, we designed a prospective inter-rater and intra-rater agreement study evaluating the Kenney–Doig categories assigned by multiple pathologists to the same set of endometrial biopsy slides, the results of which are published in part 2 of this manuscript series.

Footnotes

Acknowledgements

We are grateful to Brian Chelak and PDS for support in completing this retrospective review. We thank our other committee members, Drs. Claire Card, Elemir Simko, and Andy Allen, for continuing support throughout the project.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

Our project was supported by the Townsend Equine Health Research Fund and the Interprovincial Graduate Student Fund.

ORCID iD

Jane Westendorf

References

Bergeron

, et al. A multicentric European study testing the reproducibility of the WHO classification of endometrial hyperplasia with a proposal of a simplified working classification for biopsy and curettage specimens. Am J Surg Pathol 1999;23:1102–1108.

Brothwell

, et al. Observer agreement in the grading of oral epithelial dysplasia. Community Dent Oral Epidemiol 2003;31:300–305.

Burger

, et al. Managing a mare for breeding and sport. Pferdeheilkunde 2008;24:102–107.

Cross

SS.

Grading and scoring in histopathology. Histopathology 1998;33:99–106.

de Vet

, et al. Interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol 1990;43:1395–1398.

de Vet

, et al. Efforts to improve interobserver agreement in histopathological grading. J Clin Epidemiol 1995;48:869–873.

Doig

, et al. The use of endometrial biopsy in the infertile mare. Can Vet J 1981;22:72–76.

Evans

, et al. Clearance of bacteria and non-antigenic markers following intra-uterine inoculation into maiden mares: effect of steroid hormone environment. Theriogenology 1986;26:37–50.

Evans

, et al. Morphometric analysis of endometrial periglandular fibrosis in mares. Am J Vet Res 1998;59:1209–1214.

10.

Fadare

, et al. The diagnosis of endometrial carcinomas with clear cells by gynecologic pathologists: an assessment of interobserver variability and associated morphologic features. Am J Surg Pathol 2012;36:1107–1118.

11.

Flores

, et al. Endometrosis in mares: incidence of histopathological alterations. Reprod Domest Anim 1995;30:61–65.

12.

Hanada

, et al. Histopathological characteristics of endometrosis in thoroughbred mares in Japan: results from 50 necropsy cases. J Equine Sci 2014;25:45–52.

13.

Held

Rohrbach

Clinical significance of uterine biopsy in the maiden and non-maiden mare. J Reprod Fertil Suppl 1991;44:698–699.

14.

Hurtgen

JP.

Pathogenesis and treatment of endometritis in the mare: a review. Theriogenology 2006;66:560–566.

15.

Ishak

, et al. Histological grading and staging of chronic hepatitis. J Hepatol 1995;22:696–699.

16.

Kabisch

, et al. Endometrial biopsies of old mares—what to expect? ! Pferdeheilkunde Equine Med 2019;35:211–219.

17.

Kenney

Doig

. Equine endometrial biopsy. In: Morrow

, ed. Current Therapy in Theriogenology: Diagnosis, Treatment, and Prevention of Reproductive Diseases in Small & Large Animals. 2nd ed. Saunders, 1986:723–729.

18.

Kilgenstein

, et al. Microscopic examination of endometrial biopsies of retired sports mares: an explanation for the clinically observed subfertility? Res Vet Sci 2015;99:171–179.

19.

Kiupel

, et al. Proposal of a 2-tier histologic grading system for canine cutaneous mast cell tumors to more accurately predict biological behavior. Vet Pathol 2011;48:147–155.

20.

Köhne

, et al. Diagnostic and treatment practices of equine endometritis—a questionnaire. Front Vet Sci 2020;7:547.

21.

Leblanc

MM.

When to refer an infertile mare to a theriogenologist. Theriogenology 2008;70:421–429.

22.

Lehmann

, et al. Morpho-functional studies regarding the fertility prognosis of mares suffering from equine endometrosis. Theriogenology 2011;76:1326–1336.

23.

Love

. Techniques in reproductive examination: endometrial biopsy. In: McKinnon

, et al., eds. Equine Reproduction. 2nd ed. Wiley Blackwell, 2011:1929–1939.

24.

Mahon

GAT

Cunningham

EP.

Inbreeding and the inheritance of fertility in the Thoroughbred mare. Livest Prod Sci 1982;9:743–754.

25.

Matos

AJF

, et al. Prognostic studies of canine and feline mammary tumours: the need for standardized procedures. Vet J 2012;193:24–31.

26.

Morris

JA.

Information and observer disagreement in histopathology. Histopathology 1994;25:123–128.

27.

Munkedal

DLE

, et al. Significant individual variation between pathologists in the evaluation of colon cancer specimens after complete mesocolic excision. Dis Colon Rectum 2016;59:953–961.

28.

Nambo

, et al. Influence of age and endometrial biopsy score on the expression of lactoferrin in the uterus of mares. J Equine Vet Sci 2014;34:142.

29.

Northrup

, et al. Variation among pathologists in the histologic grading of canine cutaneous mast cell tumors with uniform use of a single grading reference. J Vet Diagn Invest 2005;17:561–564.

30.

Overbeck

, et al. Comparison of three diagnostic methods to identify subclinical endometritis in mares. Theriogenology 2011;75:1311–1318.

31.

Ricketts

Alonso

Assessment of the breeding prognosis of mares using paired endometrial biopsy techniques. Equine Vet J 1991;23:185–188.

32.

Ricketts

Alonso

The effect of age and parity on the development of equine chronic endometrial disease. Equine Vet J 1991;23:189–192.

33.

Robbins

, et al. Histological grading of breast carcinomas: a study of interobserver agreement. Hum Pathol 1995;26:873–879.

34.

Scheuer

PJ.

Chronic hepatitis: what is activity and how should it be assessed?

Histopathology 1997;30:103–105.

35.

Schilling

Die endometriumbiopsie bei der stute- eine analyse der histologischen befunde zwischen 1992- 2012 am Leipziger Institut für Veterinär- Pathologie [The endometrial biopsy in the mare—an analysis of the histological findings between 1992–2012 at the Leipzig Institute for Veterinary Pathology]. Dr. Med. Vet. Dissertation. 2017. German. https://ul.qucosa.de/api/qucosa%3A15630/attachment/ATT-0/

36.

Schlafer

DH.

Equine endometrial biopsy: enhancement of clinical value by more extensive histopathology and application of new diagnostic techniques?

Theriogenology 2007;68:413–422.

37.

Scholten

, et al. Prognostic significance and interobserver variability of histologic grading systems for endometrial carcinoma. Cancer 2004;100:764–772.

38.

Schöniger

Schoon

H-A.

The healthy and diseased equine endometrium: a review of morphological features and molecular analyses. Animals (Basel) 2020;10:625.

39.

Schoon

, et al. The endometrial biopsy in the mare with regard to clinical correlations. Pferdeheilkunde 1997;13:453–464.

40.

Schoon

, et al. “Endometrial maldifferentiation”—a clinically significant diagnosis in equine reproduction? Pferdeheilkunde 1999;15:555–559.

41.

Schoon

, et al. Functional disturbances in the endometrium of barren mares: a histological and immunohistological study. J Reprod Fertil Suppl 2000;56:381–391.

42.

Silcocks

PB.

Measuring repeatability and validity of histological diagnosis—a brief review with some practical examples. J Clin Pathol 1983;36:1269–1275.

43.

Snider

, et al. Equine endometrial biopsy reviewed: observation, interpretation, and application of histopathologic data. Theriogenology 2011;75:1567–1581.

44.

Thomas

, et al. Observer variation in the histological grading of rectal carcinoma. J Clin Pathol 1983;36:385–391.

45.

Waelchli

RO.

Endometrial biopsy in mares under nonuniform breeding management conditions: prognostic value and relationship with age. Can Vet J 1990;31:379–384.

46.

Woodward

, et al. Susceptibility to persistent breeding-induced endometritis in the mare: relationship to endometrial biopsy score and age, and variations between seasons. Theriogenology 2012;78:495–501.

IIB or not IIB,part 1: retrospective evaluation of Kenney–Doig categorization of equine endometrial biopsies at a veterinary diagnostic laboratory and comparison with published reports

Abstract

Keywords

Materials and methods

Retrospective analysis of endometrial biopsies at the WCVM and PDS

Retrospective review of Kenney–Doig categories reported in the literature

Statistical analysis

Results

Discussion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References