Abstract
A number of studies have investigated the potential toxicity of the analgesic agent diclofenac (DCF) in various fish species under a diverse array of experimental conditions. Reported evidence of toxicity in these investigations is often strongly reliant on morphologic end points such as histopathology, immunohistochemistry, and transmission electron microscopy. However, it may be challenging for scientists who perform environmental hazard or risk determination to fully appreciate the intricacies of these specialized endpoints. Therefore, the purpose of the current review was to critically assess the quality of morphologic data in 14 papers that described the experimental exposure of fish to DCF. Areas of focus during this review included study design, diagnostic accuracy, magnitude of reported changes, data interpretation and presentation, and the credibility of individual reported findings. Positive attributes of some studies included robust experimental designs, accurate diagnoses, and straightforward and transparent data reporting. Issues identified in certain articles included diagnostic errors, failure to account for sampling and/or observer bias, failure to evaluate findings according to sex, exaggeration of lesion severity, interstudy inconsistencies, unexplained phenomena, and incomplete or ambiguous data presentation. It is hoped that the outcome of this review will be of value for personnel involved in regulatory decision-making.
Keywords
Introduction
Diclofenac (DCF) is a commercially available nonsteroidal anti-inflammatory and analgesic medication derived from phenylacetic acid. Diclofenac is thought to alleviate pain and inflammation primarily via inhibition of prostaglandin synthesis, which is accomplished via inhibition of cyclooxygenase 2, although additional mechanisms of action (MOAs) may contribute to its therapeutic activity. 1 Diclofenac is available in oral, injectable, suppository, and topical formulations, and common therapeutic indications in humans and domestic animals involve the mitigation of musculoskeletal pain caused by a variety of etiologies. 2 A frequent route by which DCF enters aquatic environments is through wastewater treatment plants (WWTP). 3 Diclofenac has a relatively short 8-day half-life in freshwater and has been detected in at concentrations as high as 0.99 μg/L in WWTP effluents. 4
Concerns regarding the presence of DCF in the natural environment arose in the early 2000s when serious pathological effects and population declines in Asian vultures (
A number of investigations have attempted to determine the potential toxicity of DCF in various fish species (reviewed in Sathishkumar et al).
12
One rationale for conducting such studies is based on the frequent detection of DCF in sewage effluents and surface waters across the globe.
12
However, unlike the relationship demonstrated in
To establish guidelines for the regulation of anthropogenic substances in the environment, the potential for various chemicals to adversely affect wildlife and humans, and the exposure concentrations at which adverse effects occur, must be determined. This requires a thorough, detailed, and critical review of the applicable scientific literature. Not surprisingly, the reliability of hazard or risk assessment results is heavily dependent on the quality of the published data that form the basis for such determinations. 17,18 In efforts to standardize procedures for evaluating the quality of toxicological data to be used for hazard/risk assessment and to diminish the inherent degree of subjectivity associated with these assessments, various weighted grading procedures have been developed, 2 of the most widely recognized of which are Klimisch scoring 19 and the ToxRTool. 20 Although unquestionably useful, these tools are not a panacea, in part because they lack criteria for auditing specialized endpoints such as pathology; consequently, articles that receive high Klimisch or ToxRTool scores, indicative of superior overall quality, can still have serious deficiencies in terms of the HP data. 13 Such deficiencies, especially those involving diagnostic accuracy, may not be obvious to risk assessment professionals, many of whom have not received comprehensive training in comparative anatomic pathology.
The primary goal of this article is to systematically review reported morphologic findings in available toxicological studies of DCF in fish to establish the credibility and reliability of these published findings for use in weight-of-evidence determinations of environmental hazard or risk. In addition, the evidence for DCF-induced effects in fish based on the current review will be summarized. It is hoped that these efforts will highlight the case-relevant need for endpoint-specific audits of data generated by specialized investigative approaches such as HP, IHC, and TEM.
Materials and Methods
A recent survey of the literature (Google Scholar, PubMed, Microsoft Academic; key words = DCF, fish, toxicity, histology, electron microscopy) yielded 14 papers (13 peer-reviewed journal articles and 1 doctoral thesis) that incorporated morphological findings in fish as a consequence of experimental DCF exposure (Table 1). 21 –34 Morphologic findings included macroscopic observations recorded in-life, at necropsy, or during gross tissue trimming, and pathologic diagnoses based on tissue examinations made via light microscopy (LM), IHC, TEM, and/or quantitative morphometric evaluations of histologic specimens (ie, image analysis or stereology). Although each document was read in its entirety, the current review was focused primarily on those morphologic endpoints and as such included elements of the study design, relevant portions of the tissue collection and preparation methods, lesion descriptions and data, photomicrographs and ultrastructural images, interpretations of study outcomes, and cited references. Information compiled ad hoc for each study/paper included an overview of the experimental design, pertinent methodology, a summary of the salient results, strengths of the research, major and minor weaknesses of the research, and conclusions concerning the overall credibility of the work in terms of the morphologic endpoints.
Study Design Elements and Reported LOECs From 14 Reviewed Papers.
Abbreviations: F, female; d, day; LOEC, lowest observed effect concentration; m, month; M, male; U, sexes undetermined or combined; w, week; y, year.
Additionally, individual morphologic findings from each paper were tabulated and scored for credibility according to the following scale adapted from Wolf and Maack 13 : 5 = highly credible, 4 = credible, 3 = equivocal credibility, 2 = dubious credibility, and 1 = no credibility. Whereas Wolf and Maack used this scale to score entire articles, in this case the system was applied instead to individual findings, similar to the approach used in Wolf and Wheeler. 14 The rationale for this more granular approach is addressed in the Discussion section of the current report. The credibility of a particular finding was dependent on a variety of factors, the most important of which included the accuracy of diagnostic interpretation (as assessed from figure images), the scientific plausibility of the finding, the robustness of the experimental design and methodology, and the degree to which procedures were used to mitigate sampling or observer bias. Although these factors were not assigned weighted numerical values per se, it was nevertheless true that certain factors received higher priority than others. For example, clear instances of diagnostic inaccuracy were always considered to be more important in credibility determinations than the failure to control for bias. Further illustrative examples of credibility scoring are presented in the Results section.
For 3 of the studies in the current review, 25 –27 the original slides, associated HP data, and pathology reports, as available, had been reviewed previously in a blinded fashion by a panel of expert fish pathologists, and the results of that pathology working group (PWG) audit were later published. 35 Salient conclusions from that paper are integrated into the current review.
To get a rough idea of the percentage increase in the volume of epithelial lifting artifact in Figure 1A of the current review as compared to the subepithelial space volume in figure 99A in the study by Birzle,
22
the formula for calculating the volume of a cylinder was used as a crude approximation:

Examples of non-lesions and pathologic findings unrelated to DCF exposure, included for reference. A, Epithelial lifting artifact in the gills of a laboratory-reared adult male fathead minnow (
Results
For ease of description and discussion, each of the 14 reviewed documents will henceforth be referred to by its paper number in Table 1, as opposed to the author names or article title. Review of paper 2, a thesis, was dependent on an English translation of the original German document. Paper 13 was an ultrastructural investigation of the same light microscopic study reported in paper 12. Paper 14 presented essentially the same DCF study results as paper 13 and added a visual quantitation of renal tubular protein using image analysis. Papers 2, 11, 12, 13, and 14 were inter-related by virtue of shared coauthorship and/or institutional affiliations. The 14 papers reported a total of 42 distinctly different morphologic effects attributed to DCF exposure (Figure 2). The number of effect types (in parentheses) reported for each tissue was as follows: kidney (16), gill (11), liver (7), skin (3), intestine (3), eye (1), whole fish (mortalities; 1), and testis (0).

List of morphologic findings from reviewed papers, with credibility scores. Credibility scores: 5 (dark green) = highly credible, 4 (light green) = credible, 3 (yellow) = equivocal credibility, 2 (orange) = dubious credibility, 1 (red) = no credibility. Approaches: LM = light microscopy, EM = electron microscopy, IA = image analysis, IH = immunohistochemistry, NE = necropsy, PWG = pathology working group assessment. Other notations: “-” = finding not reported, ! = finding reported as not present or not treatment related; a = finding discovered by PWG panel; b = finding not treatment related according to supplemental data; c = possible treatment-related finding (misdiagnosed originally); blank gray cell = tissue not examined.
Rather than address the merits of each paper individually, these 14 documents will be discussed in the context of several broad areas of concern that emerged during this review, which are described under the headings of experimental design, diagnostic accuracy, magnitude of effect, data interpretation and presentation, interstudy consistency, and the credibility of morphologic findings.
Experimental Design
Experimental designs varied markedly among the 14 studies, in terms of the types of fish species tested (8 species total), the age of fish at study onset (range: egg to adult), the sexes tested (male, female, or sex undetermined), study duration (96 hours to 95 days), exposure system (flow-through or static-renewal), the number of tissues evaluated per study (1-5), and the tested concentrations of DCF (0.1-5000 μg/L). Only 1 study (paper 7) was conducted in accordance with Good Laboratory Practice (GLP) guidelines.
For 13 of the papers, the number of fish examined microscopically or ultrastructurally per test concentration (treatment group) varied from 3 to 27 (median: 10). In paper 10, the number of fish per group was reported as 100, but it is unclear what percentage of those animals was processed and evaluated microscopically. The examined fish were divided among replicate tanks (2-4 containers) in 8 of the 14 papers. As far as could be discerned from the text, the remaining studies (papers 3, 5, and 12-14) did not incorporate replicates into the study design. Replication is considered to be an important experimental design feature in fish toxicology studies because of the very real potential for container-specific effects (eg, subtle tank-related differences in water parameters, lighting, temperature, proximity to human activity, etc) to bias the study outcome in a treatment-apparent manner.
36
Ostensibly, paper 2 employed a form of pseudoreplication, as it appeared that males and females were divided into separate tanks, but the resulting data were not reported according to sex (ie, the sexes were combined). In addition to pseudoreplication, the combining of sexes can be problematic for other reasons. In fish (as in other vertebrate taxa), the normal histologic appearance of parenchymal organs (eg, liver and kidney) of subadult to adult animals can vary substantially depending on gender in terms of coloration, the degree of cytoplasmic vacuolation, and the relative sizes of certain cell types, as examples.
15,37,38
Consequently, when findings are not recorded according to sex, or the sexes are pooled for reporting purposes, this increases the possibility that sex-related differences may be misinterpreted as treatment effects, especially if the number of males and females in each group are not identical, or the sex ratios are unknown. For example, figure 2 of paper 11 suggests that sex-related differences in liver morphology could possibly have confounded the study results. In that figure, the less vacuolated, more basophilic hepatocytes of DCF-exposed brown trout (
Although several of the 14 papers described methods used to minimize potential sampling or observer bias, at least half did not (Table 2). In histopathological or ultrastructural investigations, sampling bias involves the existence of uncontrolled variables that occur during the specimen selection process that may impact the study results, while observer (observational) bias involves the slide or image examination phase.
15
Either form of bias can lead to the generation of false positive or negative results (ie, type 1 or type 2 errors, respectively). Although it can be difficult to establish that sampling or observer bias necessarily occurred in a given study, the absence of measures used to control bias lessens confidence in the experimental results. Histopathology procedures that are susceptible to sampling bias include animal collection, tissue harvesting, and the subselection of tissue areas for examination. The last is especially important for ultrastructural and morphometric studies in which the amount of tissue actually evaluated may represent a minute fraction of the available specimen. For example, in paper 13, the authors stated that an area of approximately 20 hepatocytes was examined ultrastructurally per fish. Meanwhile, it has been determined stereologically that the average number of hepatocytes in the liver of an adult brown trout is approximately 1.5 billion.
39
Therefore, the fraction of liver examined per fish in that study represented approximately 0.0000013% of the total organ. Because of the ability to survey and compare much larger areas of tissue, LM is a far superior tool for reliably assessing the frequency of focal lesion occurrence in tissues, whereas TEM is better used to further characterize changes observed initially via LM.
40
In TEM investigations, sampling bias may occur when the tissue regions to be assessed are selected subjectively, which is typically based on having previously viewed the specimens at far lower magnification in semithin sections. Another common example of sampling bias can involve the selection of individual fish from a larger population by netting. When this is performed arbitrarily (as opposed to selection based on actual mathematical randomization), bias may occur because fish that are debilitated (for whatever reason) may be netted preferentially. Sampling bias can be mitigated by selection of specimens in a mathematically randomized and standardized manner (eg, as much as possible, the same regions of an organ are collected for each animal, which has been selected randomly). In paper 1, the density of developing neurons (DNs) in fathead minnow (
Various Evaluated Aspects of 14 Reviewed Papers.
Abbreviations: Y, yes; N, no; n/a, not available; ±, equivocal.
a One or more morphologic effects reported as moderate.
b One or more morphologic effects reported as severe.
Observer bias primarily involves awareness of the treatment group status of individual test subjects during histologic slide examination, the assessment of TEM images, or the acquisition of on-slide or image-based measurements. Procedures used to avoid observer bias include blinding of the examiner to the treatment status during the initial slide examination phase, or blinding during a re-examination of positive findings, and randomization of the order of slide evaluation. Aspects of the PWG review 35 of papers 5 and 6 suggest that observer bias likely occurred in those 2 studies, for example. In paper 5, zero findings were reported for the untreated control fish, while DCF-exposed fish were listed as having 19 different types of pathologic findings. When the PWG panelists subsequently examined the histologic sections from the control fish, they found that the frequency and severity of findings in control and treated fish were comparable. Paper 6 reported monotonic dose-responsive effects involving 6 different kidney and intestinal findings in DCF-treated trout, despite that fact that all of those diagnoses were later determined to be artifacts or nonexistent changes by the PWG panel. The probability that those 6 near-perfect dose–response patterns could have occurred by chance, that is, without some element of inadvertent or subconscious observer bias, seems highly unlikely.
In the stereological assessments conducted in paper 2, randomization steps were used to prevent sampling bias, but there is no indication that specimens were blinded or randomized during the measurement phase to minimize observer bias. Conversely, the authors of papers 1, 4, and 11 each performed their slide evaluations (or re-evaluations) in a blinded manner, thus mitigating observer bias, but similar care was not taken in those studies to prevent sampling bias (or if appropriate measures were used, they were not reported). One of the few articles that attempted to account for both sampling and observer bias was paper 9. Although the method used to select the 19 to 27 fish per group for HP (vs the 2 to 3 used to measure whole-body DCF concentrations) was not described, selection of the parasagittal histological sections used to score the posterior kidney was standardized to some degree by examining the largest of 3 acquired levels. Following an initial examination of the kidneys with awareness of the treatment group status, the slides were masked, coded, and scored independently by 2 pathologists, in order to mitigate potential observer bias.
Diagnostic Accuracy
Diagnostic accuracy is a critical aspect of toxicological studies. For the purpose of the current review, the term diagnostic accuracy refers to the correct identification and interpretation of a pathologic finding reported to occur as a consequence of DCF exposure, when that finding is assessed against figure images (photomicroscopic or ultrastructural), annotations, and legends published concurrently in the same paper. Although the ideal approach for assessing diagnostic accuracy is to examine the original histologic sections on glass slides (pathology peer review), such reviews are seldom conducted in ecotoxicology for various reasons (eg, failure to retain the original study slides and raw data, reluctance of investigators to participate, and/or unwillingness of stakeholders to bear the added cost of the review process). A disadvantage of relying on journal articles to gauge diagnostic accuracy is that the incorporated figure images may not be truly representative of the histopathologic changes, and images may not have been provided for all types of described (and undescribed) effects that occurred in the study.
A number of diagnostic inaccuracies encountered during this review have been reported previously. Some of these involve the aforementioned PWG review of histologic slides 35 from studies described in papers 5, 6, and 7. Among those 3 papers, a total of 17 distinct histopathologic effects of DCF exposure were originally reported to occur in the gills, kidney, liver, and/or intestine. During their review, the PWG panelists determined that only one of the 17 originally reported effects was valid: thickened lamellar tips in the gills (paper 7). The PWG panelists additionally discovered a second previously unreported exposure-related finding (decreased hepatocyte vacuolation) in the paper 7 study. Inaccurate diagnoses in those 3 articles generally fell into 3 categories: normal anatomical structures or artifacts that were misinterpreted as pathologic changes (eg, renal interstitial hyaline droplets in paper 5); findings for which the prevalence and/or severity were determined to be comparable in treated fish as compared to controls (eg, gill chloride cell hyperplasia in paper 7); or lesions that could not be confirmed to exist at any degree in the examined histologic sections (eg, renal tubular necrosis in paper 6). Another publication that discovered inaccurate diagnoses in 3 of the other 13 articles was paper 9 of the current review. Diagnostic errors recognized (correctly) by the authors of paper 9 included the misidentification of normal mucous cells as hyperplastic chloride cells in paper 4, the mischaracterization of constituent renal hematopoietic tissue as interstitial nephritis (ie, inflammation) in paper 12, and the misinterpretation of artifacts as renal glomerular and tubular inflammation in paper 1.
Among the 14 papers in the current review, there exist many additional examples of inaccurate diagnoses that have heretofore remained undocumented. Three patent examples involve claims of DCF-induced liver necrosis and degeneration in papers 3, 8, and 11, neither of which is apparent in the histologic figure images provided in those papers. In paper 3 (figure 3), there is no visible evidence of “general necrosis” in any of the 3 poor quality photomicrographs of hematoxylin and eosin-stained liver sections from DCF-treated fish. Likewise, in paper 8 (figures 3 and 4), the livers of DCF-exposed fish appear essentially healthy, and annotation arrows meant to indicate pyknotic (condensed dying) nuclei actually point to robust-looking hepatocyte nuclei with prominent nucleoli. By the same token, putative degenerating liver nuclei in paper 11 (figure 3) appear in fact to be the nuclei of sinusoidal macrophages rather than hepatocytes, and comparable cells are evident in the image of liver from an untreated hatchery control fish. Additionally, in figure 3 there is no substantive increase in “intercellular” space in the livers of DCF-treated fish compared to controls, and some of the cells labeled as “blood cells” (erythrocytes?) are more likely sinusoidal macrophages. It is also curious that the authors of paper 11 chose to compare liver images from DCF-treated laboratory fish against those of hatchery controls in figure 2, despite the existence of untreated laboratory controls in that study. Consequently, the alleged decrease in hepatocyte vacuolation (glycogen) reported for DCF-exposed fish in that study could easily be have been caused by husbandry differences (eg, diet, feeding regimes, stocking density, etc) between laboratory and hatchery fish as opposed to treatment.
Among the 14 papers, questionable diagnoses were also made for organs other than the liver and were not limited to the interpretation of histologic specimens. In paper 8 (figure 1C), a segment of the gills in which the lamellae are out of the section plane (a tissue preparation artifact) was misinterpreted as disappearance of the secondary lamellae. Another finding described as “degeneration of the kidney” in the same paper (figure 2D) appears instead to be a granuloma (focal inflammatory lesion characterized by macrophages). Epithelial lifting in the gills was a predominant effect of DCF exposure as reported in paper 4, in which the authors maintained that this finding was unlikely to represent artifact because they used Bouin's fixative (rather than formalin, eg). However, Figure 1A of the current review illustrates profound epithelial lifting artifact in the gills of an untreated control fish from an unrelated toxicological study. The gills of that particular fish were fixed in modified Davidson solution, another recommended fixative for gills, 38 which suggests that the choice of fixation solution is not the only contributing factor to the formation of epithelial lifting artifacts. For example, tank-related differences in epithelial lifting caused by edema can occur when fish awaiting sacrifice are placed in small containers for differing time lengths. 44 In paper 12 (figure 3B), the lesion described as degeneration and necrosis of pillar cells in the gills appears to consist of fibrin and karyorrhectic nuclear debris and more likely represents a resolving capillary thrombus. Capillary thrombi are often sequelae of lamellar telangiectasis (lamellar aneurysms), which can be a common finding in control fish. 38 “Single cases” of severe telangiectasis were reported to occur in the paper 12 study, although in their report the authors did not provide data concerning the relative frequencies of capillary telangiectasis (or thrombosis) in DCF-treated fish versus negative controls. Paper 13 (figures 9-11) describes ultrastructural changes in renal glomeruli that purportedly include DCF-induced findings such as partial retraction of foot processes (pedicels), thickening of basal lamina, and the presence of desmosomes between pedicels and endothelial cells. However, there is no clear evidence of pedicel retraction, the ostensibly thickened basal lamina can be attributed to relative difference in location along the capillary loop (ie, distance from the glomerular mesangium), and although the authors suggest that the presence of desmosomes is indicative of glomerular damage, these structures have been observed in apparently normal glomeruli from multiple fish species. 45 Furthermore, structures such as endothelial cells and basement membrane (more likely mesangial matrix) appear to be misidentified in figure 11.
Based on figure images included in the articles, there were also a number of accurately diagnosed histopathologic findings among the 14 papers. Examples include developing renal nephrons in paper 1; lamellar epithelial hyperplasia of the gills in paper 4; increased hematopoietic tissue in paper 9; decreased hepatic glycogen vacuolation in paper 13; and the presence of renal tubular hyaline droplets in papers 2, 8, 11, 12, and 13. Diagnostic accuracy does not imply that the aforementioned findings were necessarily the result of DCF exposure, however. Relationship to exposure requires further evidence, such as clear demonstration of increased lesion prevalence and/or severity in DCF-treated fish relative to controls, concentration-dependent pathological responses, biological and toxicological plausibility, elimination of potential confounding variables and biases, and so on.
Figure 1A-F of the current article provides several previously unpublished image examples of artifacts, nonlesions, and pathological findings in the gill, liver, and kidney of fish, none of which were related to DCF exposure.
Magnitude of Effect
For hazard or risk assessment, it is important to not only consider the strength of relationship to treatment but also the magnitude of purported treatment-related effects and the likelihood for such effects to adversely impact animal health. 46,47 It should also be recognized that in toxicological bioassays, histopathologic findings of minor severity are not necessarily considered adverse even if they are clearly attributable to the test article. 47,48 Based on the published figures, there was a tendency among many of the reviewed papers to exaggerate the severity of histopathologic findings, by classifying relatively minor alterations as moderate or severe (Table 2). With several exceptions, the majority of diagnostically accurate findings in this review represent morphologic changes that appear to be of only minimal to mild severity according to common standards of toxicologic pathology, for example, proportion of tissue affected, percentage change from control, pathological nature of the lesion, or perceived functional impairment. 49,50 Examples of diagnoses that were reported as severe but do not match that description in the published photomicrographs include increased mucous (“mucosal”) cells in the gills in paper 8, accumulations of granulocytes in the kidney in paper 5, hyaline droplet degeneration of renal tubules in paper 12, and epithelial lifting in the gills in paper 13. Additional instances of exaggerated severity involve a variety of misdiagnosed lesions, plus histopathologic findings that were not evident in the figure photomicrographs to any degree (see Diagnostic Accuracy section). Examples of reported findings that could conceivably be considered representative of moderate severity were the macroscopic eye lesions in paper 2, the gill lamellar fusion in paper 8, the increased renal hematopoietic tissue in paper 9, and the presence of renal tubular hyaline droplets in paper 12.
Further evidence of exaggerated effects is provided by comparing the results of the companion papers 12 and 13. Paper 13 reported ultrastructural findings in DCF-exposed fish from the same study as the light microscopic investigation described in paper 12. A number of such findings were categorized as severe in paper 13 but were not reported to have occurred to any degree in paper 12; these included chloride cell hypertrophy and hyperplasia in the gills, epithelial lifting in the gills, degenerative and inflammatory changes in the liver, glomerular lesions in the kidney, and tubular vacuolation in the kidney. This is problematic for 2 reasons. First, distinct tissue alterations such as chloride cell hyperplasia and hypertrophy, epithelial lifting, and liver inflammation should have been appreciable at the light microscopic level, especially if these occurred at a magnitude considered pathological. Second, although ultrastructural changes that are not at all detectable by LM may in some instances be toxicologically important, it is debatable whether such changes can legitimately be considered severe in terms of severity grading. Another issue with these 2 papers is the investigators created a numerical scoring system that was prone to producing inflated results. Most reviewed papers used typical 0 to 3, 0 to 4, or 0 to 5 grading scales (Table 2), in which grade 0 represented no pathologic findings, the highest grade (ie, 3, 4, or 5) was reserved for severe lesions, and the intervening grade numbers were assigned labels such as minimal, mild, and/or moderate. Conversely, papers 12 and 13 employed a 1 to 3 grading system which was described as follows: “grade 1: no pathological alterations, grade 2: focal mild to moderate changes, and grade 3: extended severe histopathological alterations.” Consequently, according to that scheme, any change greater than the least observable change (ie, grade 2) would automatically receive a score of 3 and therefore qualify as “severe.” Similarly, the authors of paper 8 regarded changes greater than 20% to be severe, which is a very low bar indeed, considering that this is in the realm of mild changes for many recognized scoring systems.
49
In papers 12 and 13, reported findings were further amplified in graphical representations of the HP data (paper 12: figures 2 and 4; paper 13: figures 7, 19, and 25). Despite the fact that the lowest possible severity score value in those studies was 1 (equivalent to no alterations in their system), the
Paper 2 attempted to avoid the potential subjectivity of semiquantitative lesion scoring by performing morphometric measurements using a sophisticated approach known as stereology. Stereological methodology provides for the estimation of 3-dimensional measurements such as volume, surface area, and connectivity, in addition to measurements traditionally captured via 2-dimensional image analysis such as object counts, linear distances, and areas. Additionally, modern design-based stereology incorporates rigorous procedures intended to minimize sampling bias. Using this quantitative methodology, the author of paper 2 was able to establish quite elegantly that DCF-exposure caused minimal morphologic effects in the gills, kidney, and skin of rainbow trout (
Percentage Change From Control for Quantitative Measurements Reported in Paper 2.
Abbreviation: DCF, diclofenac.
Data Interpretation and Presentation
For most of the 14 papers in the current review, the authors’ interpretation and presentation of the published data were straightforward, transparent, and complete. Such papers provided the prevalence and severity of treatment-associated histopathologic findings (diagnoses) summarized according to treatment group (in either graphical or tabulated format), and the number of animals examined per group was readily appreciable. Noteworthy examples of well-presented HP data include paper 5 (table 2), paper 7 (Figure S1), and paper 9 (figure 5). Data presentation was also considered sufficient in papers 2, 4, and 6. Papers 3 and 10 did not describe any methods used to score the microscopic findings nor were any HP data reported. This is somewhat understandable for paper 10, because the HP data are often omitted from published reports in which there were no treatment-related morphologic effects. However, it is difficult to confirm the absence of treatment effects when the HP data and representative figure images are not provided. The authors of paper 8 presented the group level severity of various purported DCF-induced effects, without providing at least a summary of the within-group lesion prevalence. In that scenario, it is not possible to verify that the reported effects were truly attributable to DCF treatment, and therefore utility of such data for hazard or risk assessment is questionable. As mentioned above, paper 3 did not furnish any HP data, and only described qualitatively the purported exposure-related effects in the text; that practice should be considered unacceptable for modern ecotoxicological challenge studies, and it renders the experimental outcome of little value for regulatory decision-making.
There were several reports in this review (papers 11, 12, 13, and 14), in which the interpretation and presentation of HP data were arguably not straightforward, transparent, and/or complete. Instead of simply reporting the summarized prevalence and severity data, the authors of papers 12 to 14 opted to create an unnecessarily opaque scoring index for each organ they termed “mean assessment values (MAV).” Those papers described MAV as a semiquantitative “ranking” system (technically the lesions were assigned scores but were not actually ranked) of organ lesion severity in which “grade 1 = no pathological alterations, grade 2 = focal mild to moderate changes, and grade 3 = extended severe pathological alterations.” Although the authors mention the “calculation” of an MAV score for each organ (ie, gill, kidney, and liver) and provide a reference for this methodology, 52 none of those documents describes how MAV values were actually calculated. Consequently, the reader cannot determine the incidence and severity of various histopathological findings in each exposure group, which are key criteria for determining the existence of treatment-related effects. 53 Without that level of detail, it is not possible to discern whether, for example, the MAV score for a given group was influenced inordinately by one particular type of finding or by only 1 or 2 animals within that group. For example, in paper 12 (figure 2), it is possible that the primary driver of the comparatively increased kidney scores in DCF-exposed fish was interstitial nephritis, which, as discussed above (see Diagnostic Accuracy section), was an incorrect diagnosis. Similarly, in paper 4 the authors combined 3 different gill diagnoses when they reported the total number of affected gill lamellae per treatment group. As previously discussed (see Diagnostic Accuracy section in the current review), 2 of those 3 gill findings were inaccurate (chloride cell hypertrophy) or most likely artifactual (epithelial lifting), and therefore the potential relationship of the third finding (epithelial cell proliferation) to DCF exposure cannot be determined. Paper 11 employed a slightly different approach to lesion severity scoring. In their main article, they used a 1 to 5 scoring system (grade 1 = control, grade 2 = slight reaction, grade 3 = medium reaction, grade 4 = strong reaction, grade 5 = destruction), and reported these severity scores (called “histological categories”) for each organ (gill, liver, and kidney). The authors then concluded in their figure 4 legend that “particularly for the liver, there is a trend for increasing severity of pathological alterations at DCF exposure.” Although the prevalence and severity of individual findings cannot be ascertained from the data as presented in figure 4, the authors of paper 11 did publish more detailed HP data as Table 5 in a supplement to the main article. In supplemental Table 5, findings were scored for severity according to an entirely different scale that offered 3 possible grades instead of 5: “not detected,” “detected in moderate frequency/severity,” or “detected in high frequency/severity.” Table 4 of the current review presents the prevalence and severity data from the supplement Table 5, for the 8 gill, kidney, and liver findings that were described in the text as being associated with DCF exposure. Close examination of Table 4 leads to 3 important observations. First, the data clearly demonstrate the absence of effects (ie, meaningful differences in lesion prevalence or severity) in fish exposed to DCF in the laboratory as compared to unexposed laboratory controls. Second, it is interesting to note that the prevalence and/or severity of certain findings (renal tubular hyaline inclusions, kidney necrosis, and integrity of the hematopoietic tissue) did in fact differ substantially between hatchery control fish and the laboratory controls. Third, and perhaps most problematic, is the apparent lack of obvious correspondence between the HP data as presented in figure 4 of paper 11 and the data in supplemental Table 5. Inexplicably, the authors failed to describe how the more detailed data in supplemental Table 5, which used a completely different scoring system, were transformed into the results summarized per organ in figure 4. Because of the unexplained discordance between these 2 HP data sets and the lack of treatment-related findings in the detailed supplemental Table 5 data, it is difficult to consider any of the alleged microscopic effects reported in paper 11 to be reliable.
Prevalence and Severity of Reported DCF Effects From Supplemental Table 5 in Paper 11.
Abbreviations: DCF, diclofenac; hematop, hematopoietic.
Interstudy Consistency
Figure 2 lists 42 different types of macroscopic, microscopic, and ultrastructural findings (diagnoses) that were reported as effects of DCF exposure. During this review, slight differences in terminology were reconciled, and essentially similar findings from different studies were aligned, so that none of the resulting 42 diagnoses are explicitly redundant, synonymous, or interchangeable. It is important to examine the data in this type of granular fashion, so as not to conflate unrelated treatment effects. For example, the toxicologic mechanisms underlying renal hematopoietic hyperplasia are likely to be very different from those responsible for degenerative changes involving renal glomeruli, and chemicals that cause chloride cell hypertrophy in the gills would not necessarily also cause lamellar telangiectasis. Consequently, it is overly simplistic and misguided to lump disparate diagnoses together as merely “gill effects” or “liver effects,” if the intrinsic nature of the lesions and/or pathogenesis are dissimilar, especially when such data are being used for determinations of lowest observed concentration or in risk assessment analyses.
Figure 2 illustrates the high degree of diagnostic inconsistency and contradictory findings that exist among the 14 reviewed papers (papers 13 and 14 reported the same qualitative results and are thus combined in this table). Of the 42 types of exposure-related findings, 23 were reported in only 1 study, 8 were reported in 2 studies, 6 were reported in 3 studies, and 4 were reported in 4 studies. Only one type of finding (renal tubular hyaline droplets) was reported in 5 of the 14 studies, which represented the maximum level of interstudy agreement. Some degree of inconsistency can be attributed to the fact that not all organs were evaluated in all 14 papers; however, 34 of 42 reported findings (81%) occurred in the gill, kidney, and liver, and the percentage of studies that examined those organs was 71%, 86%, and 79%, respectively. Other sources of inconsistency could potentially include differences in study design and/or execution; however, the PWG review of 3 DCF studies 35 suggests that issues of diagnostic accuracy likely played a major role. Interestingly, although 7 of 14 studies were conducted using trout, there appeared to be little diagnostic uniformity even among that subset. It is also important to note that for 17 of the 42 types of reported diagnoses, the authors (or in the case of papers 5, 6, and 7, the PWG panel) stated specifically that these reported findings were either not related to DCF exposure or were not evident in the examined histologic sections to any extent. The overall high degree of inconsistency among the 14 reviewed papers is more striking given the fact that 5 of the papers were related by virtue of shared authorship and/or institutional affiliations, and the possibility that the later studies may have been biased to some degree by the authors’ awareness of previously reported findings.
Credibility of Morphologic Findings
Parameters used to assess the credibility of reported findings in this review included (but were not limited to) the robustness of the experimental design, diagnostic accuracy, the strength of association to treatment, historical incidence, biological/toxicological plausibility, and the level of support (if any) from other nonmorphologic study end points (eg, biochemical data).
Among the 14 papers, a total of 78 morphologic findings (representing 42 different types of findings) were reported as effects of DCF exposure. Of those 78 findings, the current review identified a combined total of 66 (85%) that were considered to be of equivocal, dubious, or no credibility (Figure 2). Findings with no credibility were either clearly inaccurate based on the figure images (see Diagnostic Accuracy section in the current review) or, in the case of papers 5, 6, and 7, were deemed by the PWG panel 35 to be artifact, nonexistent, or unrelated to treatment. In one instance (paper 11), the reported relationship to treatment was contradicted conclusively by the paper’s own detailed prevalence and severity data (see Data Interpretation and Presentation section in the current review).
Examples of dubious findings generally included morphologic changes for which the diagnostic accuracy was difficult to assess due to poor quality images or suboptimal histological section quality, changes whose relationship to DCF exposure was considered weak according to the presented prevalence and severity data, and implausible results. As an example of the last, the authors of paper 8 reported DCF-induced hepatic fibrosis following only 96 hours of chemical exposure, which represents a near impossibility given the inherently chronic nature of tissue scarring. 54
Findings considered equivocal often involved studies that had weak experimental designs (eg, too few animals per treatment group or no attention given to potential sampling/observer bias), papers in which the described methodology or presented results were inadequate or findings that conflicted with the results of other studies. For example, paper 1 reported increased developing nephrons as a consequence of DCF exposure, but that particular change was not confirmed in 8 other DCF studies that examined kidney. Additionally, the PWG panel 35 was unable to confirm increased developing nephrons in the only additional paper to report that observation (paper 6), and the authors of paper 9 stated specifically that such regenerating nephrons were not increased in their study. Furthermore, there is little concrete evidence that damage to urinary tissues, which is the acknowledged stimulus for increased developing nephrons, 55 occurred in any of the 11 studies in which kidney was examined microscopically. Microscopically, overt damage to nephrons is most readily visualized in the renal tubules, and in fish as in mammals, typical findings include epithelial cell necrosis, vacuolation, casts, exfoliated cells, and or atrophic/attenuated tubules. 38 Based on prior studies in which fishes were exposed to known or suspected nephrotoxic compounds, treatment-induced increases in developing nephrons occurred as a consequence of nephron destruction that was clearly evident in the histologic sections. 56 –60
Finally, morphologic findings rated as credible or highly credible included mortalities, certain well-described and illustrated macroscopic observations, most stereologically determined measurements, findings confirmed or discovered during the PWG expert panel review, and increased renal hematopoietic tissue in paper 9. There are a number of reasons why the outcome of paper 9 was considered credible: water concentrations of DCF were assayed; robust numbers of animals were examined histopathologically in the treatment and control groups; there was a blinded independent re-evaluation of findings by 2 pathologists; detailed description of lesion severity scoring and HP methods was provided; diagnoses appear accurate; high quality photomicrographs were furnished; there was transparent and appropriate presentation of HP results with full data available in a supplemental spreadsheet; statistical evaluation of differences between replicate tanks was incorporated; and a plausible explanation was provided for the major histopathologic finding of increased renal hematopoietic tissue. The last involves documented stimulation of such tissue by nonsteroidal anti-inflammatory drugs (NSAIDs) in rodents. 61
Discussion
General Observations
In terms of their morphologic data, each of the reviewed papers had clear strengths and weaknesses. Noted examples of strengths in certain papers included the acquisition of test specimens from a reliable culture source; test concentrations intended to mimic environmental exposures; inclusion of replicate tanks in the study design, with sufficient numbers of fish per replicate; the recording and reporting of results according to sex (or the use of a single gender to eliminate sex as a variable); the incorporation of measures used to minimize bias; accurate diagnoses (as far as could be determined from figure images); the inclusion of high quality figure images representative of the major morphologic findings; straightforward data presentation; and conclusions supported by the data. Common weaknesses involved elements of experimental design and methodology, diagnostic interpretation, and the published portrayal of data. Specific examples included failure to adequately account for potential sampling and/or observer bias, failure to record and report data according to sex (particularly for adult fish), failure to assess control animals with the same rigor as those exposed to the test compound, questionable or patently inaccurate morphologic diagnoses (eg, normal tissues or artifacts mistaken for pathologic findings), exaggeration of lesion severity, incomplete provided data, and data presentation in the form of graphs or tables that ranged from ill-advised to potentially misleading.
Only one study in the current review, paper 7, was conducted in accordance with GLP guidelines as recognized by regulatory agencies such as the Organization for Economic Cooperation and Development and US Environmental Protection Agency. Consequently, for that particular study alone, reviewers can have confidence that standard operating procedures involving data recording and documentation procedures were followed, in-process experimental procedures were monitored, tissue accountability was ensured, study data and reports were audited, and raw data and study materials were archived and remain retrievable. The same degree of rigor cannot be established for the other papers in this review. Only one of the studies, paper 9, incorporated measures intended to counteract both sampling and observer bias. Data presentation was a significant problem for several of the published papers. Specific examples of problems included visual exaggeration of lesion severity scores caused by elongation of the
The relatively high degree of interstudy inconsistency among the 14 DCF reports may not be all that surprising given the overt differences in study designs that employed a diverse array of species, DCF dose ranges, age and sex combinations, exposure durations, and investigative approaches. Somewhat disheartening, however, was the low overall credibility of findings (Figure 2). Unfortunately, this is to some extent consistent with recent reports which suggest that the reliability of published HP data from ecotoxicology studies is frequently suboptimal. 13,14
Morphologic Effects of DCF in Fish
Based on the morphologic evidence, it appears that DCF is capable of causing mild gill irritation when fish are exposed under certain defined experimental conditions. Thickened lamellar tips in DCF-exposed rainbow trout were confirmed during a blinded review of the original slides from paper 7 by the PWG expert panel.
38
The PWG panel graded the severity of that finding, which was limited to trout exposed to 1000 μg/L DCF, as minimal to mild. Using stereological methods, paper 2 quantitatively demonstrated very minor increases in filament and lamellar thickness, and in subepithelial space (potentially indicative of edema), in DCF-exposed rainbow trout. However, background infections of
There seems to be convincing preliminary morphologic evidence that DCF exposure may lead to an increased abundance of renal hematopoietic tissue. This result was demonstrated by blinded semiquantitative grading of histologic sections from 3-spined stickleback (
None of the 14 papers provides reliable morphologic evidence that DCF has direct cytotoxic effects on renal urinary tissue in fish. Despite the fact that the kidneys were examined in greater than 1000 DCF-treated fish among the 14 reviewed studies, there were no credible demonstrations of treatment-associated nephron damage in any of the publications. Additionally, the authors of papers 1, 2 and 9 stated specifically that tubular necrosis was not evident in their studies, and the PWG panel did not confirm the original diagnosis of renal tubular necrosis reported in paper 6. The finding of minimally increased creatinine (up to approximately 15% in the 100 μg/L group relative to control) in paper 2 would seem to support the author’s contention that a DCF-induced loss of nephrons may affect urinary function by decreasing the glomerular filtration rate (GFR); however, that hypothesis ignores the well-documented ability of certain medications, including certain NSAIDs, to elevate creatinine artifactually in mammals without actually causing renal damage or by affecting the GFR, either via interference with diagnostic creatinine assays in vitro or by affecting tubular creatinine secretion in vivo.
63,64
Comparing their study of DCF in trout kidneys to the established renal effects of DCF in
There is little dependable evidence among the reviewed papers that DCF is directly toxic to the fish liver. The sole credible DCF-related finding was decreased hepatic glycogen vacuolation (graded as minimal to severe) in rainbow trout exposed to 1000 μg/L DCF in the paper 7 study (identified by the PWG panel review). Hepatic glycogen depletion in rainbow trout as determined ultrastructurally was also reported in papers 13/14; however, because decreased glycogen vacuolation was not reported in the light microscopic evaluation from that same study (paper 12), the validity of that finding is somewhat questionable. Although decreased hepatic glycogen can be caused by chemical insult, it is not a specific indicator of liver toxicity and may be caused alternatively by a treatment-related decrease in hepatic energy storage, due to inanition, for example. 14,37,38 Other reported DCF-induced liver effects in the reviewed studies (dilated intercellular spaces, monocyte infiltration/inflammation, increased melanomacrophages, hepatocyte degeneration/necrosis, hepatocyte hypertrophy, and fibrosis) appeared to have equivocal to zero credibility, depending on the strength of evidence provided in the relevant papers.
Unexplained Phenomena
The body of work that comprises the current review generates almost as many questions as it does answers. None of the studies included a recovery sacrifice (ie, one or more groups in which exposure to DCF was discontinued for a period of time prior to study termination); therefore, it is unknown if reported treatment-related effects should be considered transient or persistent following cessation of treatment. Toxicological investigations conducted in mammalian test subjects routinely examine as many as 30 to 40 different tissue types, and yet among the reviewed fish papers, the median number of organs or tissues that were investigated morphologically per experiment was 3 (range: 1-5; Table 2). Therefore, it was not possible in such studies to determine the potential effects of DCF, or the existence of concurrent background disease (eg, caused by infectious agents), in unexamined organs or additional anatomic sites. Only in paper 2 was attention given to the possibility of background infections via health checks of fish prior to the study onset and surveys of selected tissues for parasites. Incomplete sampling for HP was especially important for studies in which uninvestigated mortalities occurred (papers 9 and 11) and for those in which the mechanisms of reported treatment effects remain inconclusive (eg, paper 9). As the authors of paper 9 admit, it is possible that in their wild-caught fish, the stress of treatment could have exacerbated pre-existing infectious disease in unexamined tissues, which could have accounted for the hematopoietic tissue hyperplasia. Or, as an NSAID, continuous high levels of DCF might have caused enteric ulceration, which could explain both the mortalities and increased hematopoiesis; however, that possibility could not be investigated because the viscera were not examined. The causes of treatment-associated mortalities were similarly uninvestigated and consequently unexplained in paper 11.
It is difficult to understand why treatment-related deaths occurred in the studies reported in papers 9 and 11, in which the highest DCF exposure concentrations and durations were 271 μg/L for 28 days (3-spined stickleback) and 200 μg/L for 25 days (brown trout), respectively, while there was no treatment-associated mortality when rainbow trout was exposed to 1000 μg/L for 95 days (paper 7). Other unexplained DCF-induced phenomena in various individual studies include the behavioral aggression wounds in paper 11, the jaw ulcers in paper 9, and the ocular lesions in paper 2 (the highest tested DCF concentration in that last study was 100 μg/L in rainbow trout exposed for 28 days). It should be noted that many of these tested concentrations were well above environmentally relevant levels that have been measured in surface waters across the globe, which generally ranges from < 0.1 to > 1 μg/L. 12 It also seems incongruous that a variety of purported DCF-induced lesions that did not occur at 1000 μg/L in the paper 7 study were reported in other papers at concentrations as low as 5 μg/L (papers 5, 9, and 12), 1 μg/L (papers 6, and 13/14), and even 0.1 μg/L (papers 2 and 4). Regarding paper 2, the very lowest observed effect concentration (LOEC) could possibly be explained by the high sensitivity of the employed stereological techniques combined with the small magnitude of reported changes. However, it is likely that the LOEC values in other reviewed papers were artifactually low as a result of one or more of the credibility issues already discussed.
Conclusions
It must be acknowledged that the morphologic results are not the only data to be considered in toxicological hazard or risk assessment, and that the quality of the morphologic data in a study cannot be ascertained definitively by examining the published journal account. It is possible that a study may actually be more robust than its published report suggests, if for example, the experimental results were not communicated effectively. It is also possible, and perhaps more common, that fundamental flaws in the design and/or performance of a study are not fully apparent in the associated journal article. For example, it is a poorly kept secret that published photomicrographs and ultrastructural images do not always accurately reflect the quality of the tissue preparation or provide an accurate representation of the range of findings in a given treatment/exposure group. This was exemplified during the PWG review of the original histologic slides from the paper 6 study, which revealed for the first time that urinary tract tissue was not actually present in 40% of the kidneys evaluated for DCF-induced renal effects. 35 This type of pathology peer review and PWG process (the conduct of which was regarded as fair by the authors of paper 4) is currently used routinely as a key quality control measure in mammalian preclinical toxicology research but is still far less commonly employed in ecotoxicology studies. Given the numerous examples of diagnostic inaccuracy and data inconsistency that were identified in the current literature review and the lack of a cohesive profile of morphologic effects attributable to DCF exposure in fish, a thorough re-examination of the original materials from these studies is recommended as a means for determining the suitability of such data for weight-of-evidence determinations that may affect hazard/risk assessment and regulatory action.
If 13 of the 14 reviewed papers were to be accepted at face value, it would seem that the experimental exposure of fish to DCF at a wide range of test concentrations (ranging from very low to many fold higher than environmentally relevant levels) is capable of causing a plethora of severe morphologic changes that involve multiple organ systems. However, the current review of these papers identified substantive shortcomings in experimental design (including failure to account for potential sampling and observer bias), diagnostic accuracy, unexplained phenomena, and in the interpretation and presentation of data, plus a high degree of interstudy inconsistency. Consequently, many reported morphologic findings were found to have questionable, dubious, or zero credibility. Taken as a whole, the results of this review suggest that chronic continuous exposure to relatively high concentrations of DCF under certain experimental conditions can cause minor proliferative changes in gill and renal hematopoietic tissue and decreased energy storage in the liver. The mechanism(s) by which these changes occur have not been established nor has the potential reversibility of such findings following discontinuation of exposure been investigated. Consequently, it is debatable whether the findings of increased renal interstitial tissue, minor gill proliferation, or decreased hepatic glycogen should necessarily be considered adverse. Meanwhile, morphologic evidence of cytotoxic damage to renal urinary tissue or hepatocytes varies from weak to nonexistent.
In weight-of-evidence determinations, it is important to evaluate the quality and integrity of the available published data, not just the quantity, and specialized end points may require a more detailed review of the relevant data by contributing experts. The current review demonstrates that a targeted evaluation of morphologic data from published ecotoxicology studies can be a useful, if not essential, procedure for determinations of environmental hazard or risk associated with chemical exposures to wildlife.
Footnotes
Acknowledgments
Declaration of Conflicting Interests
The author declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The author is employed by an independent, privately owned, contract research laboratory.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for the time and effort required to conduct this review and prepare the manuscript, and any manuscript submission fees, was provided by GlaxoSmithKline plc.
