Abstract
Although quantitative assessment of margins is recommended for describing excision of cutaneous malignancies, there is poor understanding of limitations associated with this technique. We described and quantified histologic artifacts in inked margins and determined the association between artifacts and variance in histologic tumor-free margin (HTFM) measurements based on a novel grading scheme applied to 50 sections of normal canine skin and 56 radial margins taken from 15 different canine mast cell tumors (MCTs). Three broad categories of artifact were 1) tissue deformation at inked edges, 2) ink-associated artifacts, and 3) sectioning-associated artifacts. The most common artifacts in MCT margins were ink-associated artifacts, specifically ink absent from an edge (mean prevalence: 50%) and inappropriate ink coloring (mean: 45%). The prevalence of other artifacts in MCT skin was 4–50%. In MCT margins, frequency-adjusted kappa statistics found fair or better inter-rater reliability for 9 of 10 artifacts; intra-rater reliability was moderate or better in 9 of 10 artifacts. Digital HTFM measurements by 5 blinded pathologists had a median standard deviation (SD) of 1.9 mm (interquartile range: 0.8–3.6 mm; range: 0–6.2 mm). Intraclass correlation coefficients demonstrated good inter-pathologist reliability in HTFM measurement (κ = 0.81). Spearman rank correlation coefficients found negligible correlation between artifacts and HTFM SDs (r ≤ 0.3). These data confirm that although histologic artifacts commonly occur in inked margin specimens, artifacts are not meaningfully associated with variation in HTFM measurements. Investigators can use the grading scheme presented herein to identify artifacts associated with tissue processing.
Introduction
Histopathology is widely used for determining adequacy of excision in cutaneous malignancies. 12 In 2011, an American College of Veterinary Pathologists consensus committee recommended margin quantification (“histologic tumor-free margin” or HTFM) as one means of communicating excisional status. 5 Although this recommendation removes ambiguity associated with descriptive margin interpretation, the objective value of a HTFM is poorly defined. As yet, there is incomplete understanding of inter-pathologist HTFM variation, and the impact of unavoidable artifacts introduced during routine histologic processing.
Histologic specimens undergo considerable structural changes beginning immediately following surgical incision and ending after mounting on a glass slide. 12 Tissue length reduction of normal skin (“shrinkage”) is the most widely reported structural change.10,17,18,25 Mean overall tissue shrinkage of 15.6% from pre-excision to 24-h formalin fixation (mean ± SE overall change in diameter of 6.2 ± 0.7 mm) was reported in one study. 25 Microscopically visible artifacts in tumor-bearing margin specimens are less clearly defined. Although pathologists commonly recognize processing artifacts in diagnostic specimens, their impact on quantification of HTFM width is unknown.
Margin measurement requires identification of 1) the peripheral tumor edge, and 2) inked surgical margin. 12 Both are subject to differences in pathologist interpretation and may be influenced by histologic artifacts. In canine cutaneous mast cell tumors (MCTs), distinguishing tumor cells from reactive mast cells can be a challenge.2,9,20 Differences in ink performance can also influence the reliability with which a surgical margin is discerned. 11 There are data suggesting that pathologists may differ in interpretation of margin status 16 ; however, variation in HTFM measurements has not been critically examined in a larger cohort of tumors.
To address these problems, we designed a study with 3 aims: 1) to create a novel grading scheme to describe microscopically visible artifacts in routine histologic sections of canine skin; 2) to determine if the grading scheme could be applied to sections of normal and tumor-bearing skin, with acceptable reliability within and between pathologists; and 3) to quantify variation in HTFM measurements and determine if processing artifacts impact HTFM quantification. Using routinely inked, margin specimens from grade II/low-grade MCT, we hypothesized that artifacts would have a moderate or higher positive association with HTFM standard deviations (SDs) between pathologists.
By defining and quantifying artifacts in histologic specimens, pathologists could use our grading scheme to identify the frequency of artifacts associated with processing. Furthermore, an improved understanding of HTFM measurement variability will directly inform clinicians and researchers about the clinical relevance of quantitative margin evaluation.
Materials and methods
Tissue specimens
Normal canine skin
Slides of normal, inked canine skin were selected from a previously published independent research question. 11 Violet-inked samples were excluded because of demonstrated poor performance. 11 A subset of 50 slides was chosen at random (Excel Random Number Generation, Microsoft, Redmond, WA). Routinely available, commercial tissue inks from 5 manufacturers were used (Cancer Diagnostics, Morrisville, NC; Davidson Marking System dyes, Bradley Products, Bloomington, MN; Platinum Line tissue marking dye, Mercedes Medical, Sarasota, FL; Path Mark tissue margin ink, BBC Biochemical, Mount Vernon, WA; and TMD tissue marking dye, Triangle Biomedical Sciences, Durham, NC). Ink colors included black (10 slides), blue (7), green (11), orange (6), red (9), and yellow (7).
Margin measurement from specimens containing MCT
Specimens for margin measurement were taken from a population of dogs prospectively recruited for a larger clinical trial published elsewhere. 13 All patients had been presented to Lois Bates Acheson Veterinary Teaching Hospital at Oregon State University (Corvallis, OR) over a consecutive 10-mo period (August 2014–June 2015). Inclusion criteria were dogs undergoing planned surgical excision of a cytologically diagnosed cutaneous or subcutaneous MCT, with en bloc removal of a grossly normal surgical margin of ≥1 mm (i.e., no grossly visible tumor at the surgical margins). Exclusion criteria included previous surgical excision of the mass, neoadjuvant treatment (e.g., preoperative chemotherapy, including corticosteroids, or preoperative radiation therapy), excision requiring amputation, and grade III or “high”-grade tumors based on post-excision histopathology.6,15 Institutional Animal Care and Use Committee approval was obtained for the study, and all owners signed an approved informed client consent form.
Surgical margins of the fresh unfixed biopsies were inked immediately following excision by certified veterinary technicians trained in uniform ink application. Five different directions were identified (e.g., cranial, caudal, dorsal, ventral, and deep) with commercial margin inks, using technique identical to that used on normal canine skin samples. 11 Specimens were then immersed in 10% neutral-buffered formalin and submitted for routine processing. Fixation time was variable and determined by the size of the sample.
Fixed specimens were sectioned radially to capture the 4 inked circumferential margins and 1 inked deep margin. For fixed excisional specimens <2.5 cm, all circumferential margins could be captured with 3 sections (1 complete cross-section through the mass and 2 additional perpendicular sections). Tumors with margins >2.5 cm were bisected and captured on 2 slides. Sections were processed routinely, sectioned at 4 μm, stained with hematoxylin and eosin, and coverslipped. The histologic diagnosis and tumor grade were assigned by a single board-certified veterinary anatomic pathologist (DS Russell).
Creation of grading scheme
We developed a grading scheme to describe and quantify processing artifacts in histologic sections of canine skin. First, artifacts were identified from a preliminary review of normal canine skin. Artifacts were only considered for inclusion if capable of influencing either absolute margin interpretation in tumor biopsies (i.e., positive or negative) or HTFM measurements. Prior to implementation, 2 pilot versions of the grading scheme were tested on sections of normal skin for inter- and intra-observer reliability (data not shown). Review of the pilot grading schemes identified variable “severity” of artifact within each subcategory. To optimize rater reliability and ensure clinical relevance, criteria were intentionally designed as binary variables (i.e., present or absent), excluding “minor” artifacts unlikely to affect final margin interpretation or HTFM measurement. The final grading scheme was generated by consensus among 2 board-certified veterinary anatomic pathologists (DSR, CV Löhr) and 1 veterinary anatomic pathologist in training (PK Kiser).
Artifacts were divided into categories of tissue deformation, ink-associated artifacts, and sectioning artifacts, with criteria defined for each category (Table 1, Figs. 1–3). Two board-certified veterinary anatomic pathologists (DSR, CVL) and 1 veterinary anatomic pathologist in training (PKK) uniformly applied the final grading scheme to circumferential margins of 50 sections of normal canine skin and 56 MCT circumferential margins from 15 different tumors, taken from 12 dogs. Ten tumors were cutaneous MCTs (i.e., involving dermis and possibly subcutis) and 5 tumors were subcutaneous (i.e., not involving overlying dermis). Pathologists’ interpretations were blinded. To determine intra-rater reliability, pathologists performed a single repetition within 1 wk of initial reading (10 randomly selected normal skin margins and 10 randomly selected MCT margins). To ensure consistent application of the grading scheme, the same microscope was used for all analyses (Eclipse 80i, Nikon Instruments, Melville, NY). The mean frequency of each artifact was quantified as the sum of positive artifacts noted by each independent rater, divided by 3.
A grading scheme for defining and quantifying processing artifacts in histologic specimens with inked margins.

Tissue deformation in histologic sections of normal canine skin.

Ink-associated artifacts in histologic sections of normal canine skin.

Sectioning artifacts in histologic sections of normal canine skin.
Measurement of HTFM on MCT samples
HTFM were quantified on all MCT samples. Only circumferential margins were considered for evaluation. Blinded digital measurements were taken by 3 board-certified veterinary anatomic pathologists (DSR, CVL, ST Spagnoli) and 2 veterinary anatomic pathologists in training (PKK, D Meritet). Measurements were captured with calibrated digital cameras attached to microscopes (BX46 with SC100 camera and CellSens entry software, Olympus, Waltham, MA; Eclipse E400 with Digital Sight DS-Fi1 camera and NIS-Elements BR3.2 software, Nikon; Eclipse 50i with Lumenera Infinity 1 camera and Lumenera Infinity Analyze software, Nikon). The HTFM was defined as the shortest length of non-neoplastic tissue between the inked surgical margin and MCT, measured to 0.1 mm. The polyline feature was used to measure HTFM in a plane approximately parallel to the epidermis or deep connective tissue. For each individual margin, the 5 HTFM measurements were expressed as the median, interquartile range, and range (mm).
Statistical analyses
All grading scheme analyses were performed using R software (https://www.R-project.org/), utilizing the packages epiR (https://goo.gl/dXfrfr) and irr (https://goo.gl/JTraez). The prevalence of artifacts in normal and MCT skin was expressed as the mean ± standard error of the mean. The presence of artifacts was tested for inter- and intra-rater reliability using prevalence-adjusted kappa given the expected low frequencies of artifact. Inter-rater reliability was tested with Randolph kappa, and intra-rater reliability was tested with the prevalence-adjusted, bias-adjusted kappa.1,26 Reliability was considered slight (0.01–0.2), fair (0.21–0.4), moderate (0.41–0.6), substantial (0.61–0.8), almost perfect (0.81–0.99), and perfect (1.0) as categorized previously. 8 Given expected differences in agreement within subcategories of the grading scheme, 3 pathologists met to reach consensus on discordant interpretations (DSR, PKK, CVL). Consensus interpretation was used for subsequent correlations.
The strength and significance of the correlation between artifact and HTFM SD was determined by Spearman rank correlation coefficients (r) to account for possible non-normality. Correlation was performed for each individual subcategory (possible range: 0–1), sum of artifacts within each category (possible range: 0–2, 0–5, and 0–3), and the sum of total artifacts (possible range: 0–10). The level of significance specified at an alpha level of 0.05 (p < 0.05). Intra-class correlation coefficient for inter-pathologist HTFM measurements was calculated based on a single-rating, absolute-agreement, 2-way random-effects model. 7
Results
In normal skin, ink dissection was the most common (60 ± 4%) artifact, and tissue folding (5 ± 2%) was the least common. Artifacts in the MCT biopsies were less frequent overall: ink absent from the tissue edge was most common (50 ± 4%), and tissue folding (4 ± 1%) was least common (Fig. 4).

Mean prevalence of processing artifacts in normal skin and mast cell tumor (MCT) radial sections. Data are expressed by mean ± standard error of the mean.
Inter- and intra-rater reliability was calculated for normal canine skin samples and MCT radial margins (Table 2). For all 4 kappa values, reliability was best (substantial to perfect) for tissue folding (0.79–1.0). Reliability for differential contraction, inappropriate coloring, and ink at an incorrect edge was moderate or better (≥0.49 for all tests). For other criteria, inter- and intra-rater reliability was variable. In both normal and MCT skin, intra-rater reliability was better than inter-rater reliability for all subcategories within ink-associated artifacts and sectioning artifacts. Inter-rater reliability was slight for dissociated tissue (0.17–0.19); all other criteria were fair or better (≥0.21 for both normal skin and MCT skin).
Kappa statistics for normal canine skin and radial mast cell tumor (MCT) margins.
Inter-rater reliability is calculated with Randolph kappa; intra-rater reliability is calculated with the prevalence-adjusted, bias-adjusted kappa (PABAK). Reliability is considered slight (0.01–0.2), fair (0.21–0.4), moderate (0.41–0.6), substantial (0.61–0.8), almost perfect (0.81–0.99), and perfect (1.0). 8
Individual HTFM measurement and variation among 5 pathologists is displayed in Figure 5. For 3 cases, HTFM was measured by only 3 or 4 pathologists (pathologists unable to identify ink in the section). The median HTFM was 10.2 mm (interquartile range: 6.2–14.5 mm; range: 0–24.0 mm). The median SD was 1.9 mm (interquartile range: 0.8–3.6 mm; range: 0–6.2 mm). In 7 of 56 margins, 1 or more pathologists disagreed between complete (HTFM > 0 mm) and incomplete (HTFM = 0 mm) excision. The median HTFM for these margins was 3.0 mm (interquartile range: 0–6.0 mm; range: 0–10.4 mm). The intraclass correlation coefficient for inter-pathologist HTFM reliability was 0.81, indicating good reliability among pathologists for quantifying HTFM. 7

Box and whiskers plot of histologic tumor-free margin (HTFM) measurements on 56 individual mast cell tumors by 5 blinded pathologists. The box extends from the 25th to 75th percentiles; the line in the middle is the median for each margin. Whiskers represent the range of HTFM measurements.
Spearman rank correlation coefficients were determined to evaluate the relationship between artifacts and variation (SD) in HTFM measurement. There was negligible correlation between HTFM SD, individual artifacts, sum of artifacts within each category, and total sum of artifacts (all r ≤ 0.30; Table 3). Although statistical significance was reached in 3 categories (differential contraction, ink dissection, tissue defects), the magnitude of the correlation was low, suggesting negligible clinical relevance.
Spearman rank correlation coefficients for artifacts and standard deviations of histologic tumor-free margin (HTFM) measurements.
Correlation was performed for each individual subcategory (possible range: 0–1), sum of artifacts within each category (possible range: 0–2, 0–5, and 0–3), and the sum of total artifacts (possible range: 0–10).
Discussion
HTFM length measurements convey objective information about a surgical margin, and have been used to address research questions related to clinical outcomes.14,20,21 In response to growing interest in the clinical relevance of HTFM, a number of studies have evaluated factors capable of influencing this measurement.4,18,19,24 Herein, we sought to quantify inter-pathologist variation in HTFM measurements, and to determine if variation is associated with artifacts in routinely processed histologic specimens. Although processing artifacts commonly occur in normal and MCT-bearing skin sections, we demonstrate no meaningful association between artifacts and HTFM variance.
Through the application of our grading scheme to normal canine skin, 8 of 10 artifacts occurred with a frequency of ≥30%; in MCT-bearing diagnostic specimens, 4 of 10 artifacts occurred at a frequency of ≥30%. Our data indicate that processing artifacts are common in spite of standard operating procedures for creating diagnostic-quality histopathology slides. Although determination of artifact causation was beyond the scope of our study, this grading scheme might be implemented by histology laboratories that wish to identify and mitigate specific problems at different stages of tissue processing (i.e., sectioning, fixation, processing, microtomy). Strategies for minimizing these artifacts are discussed elsewhere.3,22,23
Ink-associated artifacts are diagnostically relevant because microscopic identification of ink is a prerequisite for identification of a surgical margin. 12 A pathologist’s ability to define a surgical margin is negatively impacted by ink that is absent, ink that has dissected into tissue, or ink that is otherwise in an inappropriate location (i.e., ink present at sites not intended to represent the surgical margin). Ink on an unintentional edge may influence a pathologist’s ability to discern the margin direction of interest (i.e., caudal vs. deep; lateral vs. cranial). Our data show that ink-associated artifacts are particularly common (25–60% in normal skin; 19–50% for MCT specimens), despite using purpose-designed, commercial products, applied in a standard fashion according to manufacturers’ guidelines. Between 45% and 50% of margins lacked some degree of ink from the tissue edge, indicating either suboptimal adhesion to the surgical margin or incomplete microtomy. This suggests that the margin edge is not consistently indicated by presence of ink. Ink dissection occurred with 24–60% prevalence, thereby demonstrating that inked tissue does not necessarily equate to a true surgical margin. Continuous circumferential suturing of post-excisional specimens has been shown to improve alignment of tissue planes. 19 Improved tissue alignment and cohesion between different tissue layers might also minimize microscopic ink dissection, although this was not investigated in our study.
Tissue deformation at the inked edge captured changes in contour specifically at the margin. These data find deformation artifacts in 11–36% of margins. Contour changes at the inked edge are most likely linked to surgical excision, formalin fixation, and/or tissue trimming. Both excision and fixation are capable of producing dramatic changes in histologic tissue dimensions.17,25 Dimensional changes are influenced by tissue composition: samples with a muscle layer have less contraction as compared to samples without muscle. 17 Differential contraction at a surgical margin (11% in MCT margins and 36% in normal skin) captured those margins in which length reductions varied between different microscopic skin compartments (i.e., dermis vs. panniculus vs. skeletal muscle). Deflection of tissue, to the point where tissue is apposed to the radially inked margin, occurred in 16–34% of margins. These artifacts are relevant because margin deformation would directly impact length measurement.
Sectioning artifacts defined those changes that occurred in direct association with microtomy or tissue processing. Artifacts within this broad category occurred in 4–47% of margins. Tissue defects (i.e., splitting or cavitation) were most common in both normal skin (47%) and MCT samples (32%). In our samples, cavities were commonly identified within the panniculus, likely reflecting inherent difficulties associated with processing adipose tissue. 3 Artifactual tissue spaces impede a pathologist’s ability to detect tumor microemboli; furthermore, a linear measurement across these spaces may not be an accurate representation of HTFM length.
Grading scheme performance, as defined by inter- and intra-rater reliability, was quantified by 2 prevalence-adjusted kappa statistics. Inter-rater reliability was fair or better for 9 of 10 artifacts in normal and MCT skin. Intra-rater reliability was generally greater than inter-rater reliability: in both tissue sets, reliability was moderate or better in 9 of 10 categories. Overall, the imperfect kappa statistics reflect challenges inherent to quantifying morphologic tissue alterations in a repeatable fashion. Specific challenges included creating definitions that adequately captured the magnitude of a given artifact, while still having detail necessary to allow resolution between co-localizing artifacts (i.e., incomplete inking and tissue defects; differential contraction and tissue deflection).
By quantifying variation in inter-pathologist HTFM measurements, our data will inform future research that uses HTFM as an indicator of excision in cutaneous malignancies. Even though inter-pathologist reliability was good (κ = 0.81), there was a median SD of 1.9 mm and a maximum SD of 6.2 mm. This degree of variation may not be clinically relevant in wide excisions, but could have impact in excisions with minimal margins. In 13% of margins, one or more pathologists disagreed between complete and incomplete excision. Within this subset, the median HTFM measurement was 3.0 mm, although the maximum recorded measurement was 10.4 mm. One study of diagnostic disagreements in a variety of canine and feline oncologic specimens found discordant margin interpretations in 4 of 52 prospective second-opinion biopsies. 16 Interpretive differences in canine MCT margins might partially be influenced by inherent difficulties distinguishing reactive and neoplastic mast cells at the tumor periphery. 12 In our study, variation in HTFM measurements are likely attributable to perceived differences in where the tumor ends, where the margin begins, and the narrowest distance between these 2 points.
Spearman rank correlation coefficients found negligible associations between artifacts and HTFM SDs (r ≤ 0.3 for all categories), despite reaching statistical significance in 3 categories. As such, we reject our hypothesis that presence of artifacts would have a moderate or higher positive correlation with inter-pathologist variation in HTFM measurements. This might reflect a tendency among pathologists to “read through” artifacts when measuring HTFM. Failure to detect an association might also reflect inherent limitations of a binary, quantitative (as opposed to a descriptive) approach to artifacts, which might not adequately capture the severity of a given artifact. It is also possible that artifacts have disproportionately greater impact on HTFM measurements falling within the range of expected SD, or where distinction between complete (i.e., HTFM = 0 mm) and incomplete (i.e., HTFM > 0 mm) excision is ambiguous.
Footnotes
Acknowledgements
We acknowledge the technical expertise of Kay Fischer, Misty Corbus, and Renee Norred who generated histologic slides for this study. Laura Kelly and Patrick Mayne of Confluent Insights performed statistical analysis.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was funded by a grant from the Department of Biomedical Sciences, Oregon State University.
