Abstract

Thank you for your kind comments on our article. We fully agree with the main thrust of your argument, that in the early stages of examining slides to find putative treatment-related effects (your “identification stage” of a two-stage examination strategy), a Score Method is often appropriate. Our article is intended to inform readers about how to gather and analyze data to answer this question later in the subsequent “confirmatory” examination—are these groups different enough to ascribe it to treatment (i.e., to reject the null hypothesis)? Conversely, if we give the impression that we are advising pathologists to use an Ordering Method at the early identification stage of their investigations, this is an unintended mistake on our part, for the several good reasons that you give. We make no recommendations for or against any method at the early stage of examination; how one looks for differences is a very personal process, and we think the individual should be free to work in the way he or she finds suits best.
However, although we fully accept your arguments for the initial exploratory/identification phase of the examination using the Score Method, we would not recommend it for the later confirmatory examination. The problems with using the Score Method at this later stage are as we outline: it is sensitive to increased dispersion as well at change in location, its power is substantially affected by nonresponders, analysis of sparse contingency table data is difficult, combining sparse tabular data for a factorial analysis (including sex as a factor, for example) is difficult, it can be affected by diagnostic drift, it is difficult to peer-review, and it lacks power/sensitivity. We would not claim that it is always wrong to use the Score Method, just that there are usually better methods available. We fully accept that currently, Score Method data are “entrenched in the psyche of the toxicological pathologist” (and other groups, too), but credit our colleagues with the acumen to change their methods when they see the merit in alternative techniques.
We fully agree with your premise that the histological examination is a two-part process—an initial identification examination to find possible treatment effects, and then a subsequent confirmatory examination—and we have published to that effect (Holland 2010; Holland and Holland 2011). In these articles, we clearly state that the datasets generated by both the initial identification examination stage (which may well be score and grade data) and the later confirmatory examination data used in formally testing tightly framed hypotheses (for which we recommend ordered/ranked data if feasible) should both be recorded. We agree with you that currently confirmatory examinations are commonly done “informally” (sic) and then the method, data, or analysis are not included in the report—a common defect in toxicologic pathology reporting practice that has been clearly identified in the literature (Holland 2010; Holland 2011).
The eminently practical method that you recommend for randomizing slides from two or more groups is very similar to the technique used by one of us (TH) and informally called the “Mahjong Shuffle.” We do the same steps, but in a slightly different order. First, the slides’ labels are obscured with cut-down Post-Its. Then the slides are placed face-up on the bench and shuffled around (as in mahjong). Then the slides are drawn out one at a time and numbered (sequentially, but that is unimportant). They are now ready for a blind examination. After ordering, the Post-Its are transferred to the far end of the slide and the results recorded. We used to use a more formal randomization procedure (using random number tables), but this method adds tedious complexity for no practical advantage in our experience and has to be done by a third party to be valid.
The concern that “less-than-fully-informed persons or agencies will seize upon” these issues can be easily allayed. If we, as a body, demonstrate an open debate of these issues, state clearly our agreements and differences, and where there is consensus then adjust our practices in the light of the accepted evidence, then nobody will be able to justify seizing anything. To this end, we thank you for your constructive criticisms, and we thank the editor and the editorial board of this journal for allowing us to share our views so widely.
