Abstract

That a fourth edition of this book has been published indicates that it appeals to a wide audience. Indeed, there are not many texts that are aimed at statistical applications in toxicology, though this is by no means the only one.
The stated goal of this edition is to be “a source and textbook for both practicing and student toxicologists with the central objective of equipping them for regular statistical analysis of experimental data. Starting with the assumption of only basic mathematical skills and knowledge, it provides a complete and systematic yet practical introduction to the statistical methodologies that are available for, and utilized in, the discipline.”
It should be understood up front that the reviewer is a statistician, not a toxicologist, who has been working in mammalian and environmental toxicology for over 18 years. I am not the intended audience for this book, but I am in a position to evaluate the book both for statistical correctness and for the applicability of the methods presented to the design and analysis of toxicology experiments.
The text is quite ambitious, as the introduction states that the material in chapters 14–18 “reviews all the major possible data analysis applications in toxicology (LD50/LC50 calculations, clinical chemistry, reproduction studies, behavior studies, risk assessment, SARs, etc).” To begin to assess how well the author has achieved the stated goals, a list of chapter headings is given below. After that, detailed comments will be given on each chapter, followed by a summary of this review.
There are also tables of common logarithms, probit transform values, chi-square, Kruskal-Wallis critical values, Mann-Whitney U, t-test critical values, F distribution, Z scores, Table for Calculation of Median Effective Dose by Moving Averages, and Wilcoxon Rank Sum Test critical values. Finally, there is a short appendix of practice problems, as well as an appendix devoted to definitions of terms, and one for abbreviations and symbols (including the Greek alphabet). Each chapter contains a list of references, which makes it easy for the reader to do supplementary reading. One very useful feature of the book is the examples complete with SAS code to carry out the analyses that have been discussed.
1. Introduction
This is only four pages long and gives a general motivation for the text, a very brief indication of how statistics in toxicology has evolved, and a basic discussion of the difference between biological and statistical significance which should be useful to someone just entering the field.
2. Basic Principles
This includes an introduction to descriptive statistics (mean, variance, standard error, median, coefficient of variance, outlier, etc.), and two of three concepts identified in Chapter 1 as essential for understanding of the subject matter, namely the nature and value of different types of data, and causality (the third essential concept being the difference between biological and statistical significance, which was introduced in Chapter 1). Of special interest are four figures giving statistical decision trees covering selection of statistical procedure, selection of hypothesis testing procedure, selection of modeling procedure, and selection of reduction of dimensionality procedure. The first of these is well placed in such an introductory chapter. The others refer to numerous concepts that have not been introduced at this point. Once a student has worked through the entire book, it probably will be convenient to have all four tables collected in one place.
3. Experimental Design
Four basic concepts of experimental design are introduced. These are replication, randomization, concurrent control, and balance. Also four basic types of experimental designs are introduced, namely randomized block, latin square, factorial, and nested designs. More concrete examples would have been useful in these discussions. It is not made clear how a simple design with one factor of interest, dose or concentration, and cages or tanks of multiple animals within each dose, falls into the four designs. Although this clearly falls under the heading of nested designs, the student being introduced to these concepts will have little idea at this point. There is a brief but welcome discussion of the important concept of the experimental unit that could have been linked more closely to experimental design.
There are tables of formulas for calculating sample sizes for experiments involving proportions. Although such tables are useful, they seem out of place and without motivation or justification here before the statistical tests on proportions are discussed. There is also a brief discussion of designs with unequal sample sizes, though it is disappointing that the only examples provided are where more subjects are allocated to the highest dose or more to the lowest dose. There is an extensive literature on the utility of allocating more subjects to the control in various settings and it would have been valuable to introduce that idea here.
4. Computational Devices and Software
This chapter contains a very brief and not very useful overview of the capabilities of various computing options, ranging from programmable calculators to major software packages such as SAS, SPSS, and Minitab. The author makes the interesting claim that the latter are less flexible than the former. It is difficult to understand a reasonable meaning for the term “flexible” under which this claim is true.
5. Methods for Data Preparation and Exploration
There is a brief introduction to the plotting of data prior to analysis. The only specific type of plot discussed is the scatterplot. At this stage in the text, there are many other choices. There is a discussion of Bartlett’s test for variance homogeneity that indicates some of its weaknesses. It is surprising that, except for the F-test, no other test for variance homogeneity is even mentioned, such as Levene’s test. There is also a misnamed section on goodness-of-fit tests. Although there is a passing reference to tests for normality to detect mixtures of populations, this section is really an introduction to maximum likelihood estimates.
There is a real need in risk assessment (for example in determining what species sensitivity distribution [SSD] to use) and in modeling, to be able to identify the underlying distribution. Such tests are available in some of the statistical computing packages mentioned in Chapter 4 and easy to apply. Where they are not readily available in software, they are straightforward to program. It would have been useful, for example, to observe that if X is a random variable with distribution function F(X), then F(X) is a random variable with a uniform distribution and Φ–1 (F(X)) is a standard normal, so that the standard tests for normality can be applied to test whether the distribution F is correct.
The text also mentions a suggested requirement of 200 or more observations to test goodness-of-fit. Although such a large number would be welcome, it is quite common in some toxicology experiments and in risk assessment to have much smaller sample sizes. It would have been helpful to discuss what can or should be done in these cases. However, this gets into the concept of goodness-of-fit rather than the actual topic discussed (MLEs).
6. Nonparametric Hypothesis Testing of Categorical and Ranked Data
The tests considered are chi-square, Kruskal-Wallis, Mann-Whitney U, Wilcoxon, and log-rank tests together with a general multiple comparison procedure that can be used. There is a welcome, if unspecific and unreferenced, statement about the desirable power properties of these tests when the assumptions underlying the standard parametric tests are violated.
It is disappointing that the only multiple comparison test discussed is one that protects against all possible pairwise comparisons. In many toxicological studies, the only comparisons of interest are of treatment against control. There are several procedures that can be applied that are specific to this type of comparison, such as the work of Dunn. Indeed, a simple Bonferroni-Holm adjustment to multiple Mann-Whitney comparisons to control would be more powerful than the test suggested. There are also step-down trend tests (e.g., Jonckheere-Terpstra for continuous responses and Cochran-Armitage for incidence responses) that can be used in single-factor experimental designs, provided there is an (assumed) monotone dose-response. Such tests are not mentioned anywhere in this text. Further comments are provided with regard to Chapter 9.
Another topic not mentioned is nonparametric analysis of covariance (ANCOVA). These tests are needed when the requirements for parametric ANCOVA are not met and no suitable transformation can be found. Such tests do exist and can be programmed with little difficulty in SAS and other statistical packages with programming languages.
7. Hypothesis Testing: Univariate Parametric Tests
This chapter perpetuates a flaw in the statistical analysis of toxicology data. The author states that multiple comparison tests, such as Duncan, Dunnett, and Williams, should only be used if there is a significant ANOVA F-test. Although that would be appropriate for the wildly liberal Duncan test, it is by no means appropriate for the Dunnett and Williams tests. The ANOVA F-test protects against many comparisons of no interest in a typical toxicology experiment. It is designed to guard against false positives from any linear contrast. In most toxicology studies, the only comparisons of interest are comparisons against the control. Thus, a significant F-test can arise from a comparison of no toxicological relevance and an F-test can fail to be significant when there is a significant effect in a treatment group compared to the control. Both the Dunnett and Williams tests are designed to protect against false positives among just these comparisons. To require a previous significant F-test before applying the Dunnett or Williams tests significantly reduces the power of these tests and invalidates the nominal level of significance of these tests.
The tests discussed are student’s -test, the Cochran t t-test, F-test, Duncan, Dunnett, Scheffe, and Williams’ tests, and ANOVA and ANCOVA. Of these the Duncan and Scheffe tests are of little value in toxicology. Duncan’s test has already been discussed. There are far better alternatives than Scheffe’s test that have been understood for decades and are widely available in statistical computer packages. These include the Tukey honest significant difference (HSD) test, Sidak, and GT2, among many others. The interested reader is referred to Westfall et al. (1999)1 for what is available in that software package and to Hochberg et al. (1987)2 for a very good general discussion of multiple comparison procedures for both parametric and nonparametric cases. Although it is unreasonable to expect a general text such as the one under review to have the level of detail in these two references, it is surprising to find this chapter one that, except for some of the SAS code, could have been written 30 years ago. There has been a lot of development in this field that is not reflected in this text.
8. Modeling and Extrapolation
For LD50/LC50 estimation of quantal data, the only techniques discussed are probit analysis with dose log-transformed, and moving average. It would have been helpful to include logistic and Weibull models and possibly others. Probit, although a very useful model, will not always provide a suitable fit to the data and moving average often does not provide good confidence bounds on the estimate.
For continuous responses, there is a section on linear regression and a two-page (include a page of plots) section on nonlinear regression that is very limited. A document recently published by OECD3 contains numerous nonlinear models of special interest in toxicology and it is disappointing to find so little here. It is probably too much to expect any coverage of PK models in an introductory text such as this.
9. Trend Analysis
This was one of the more disappointing chapters to this reviewer. As a proponent of step-down trend tests for simple dose-response experimental designs, the chapter heading was a source of excitement, only to lead to dismay upon actually reading the chapter. There is a brief discussion of the role of trend tests, generally in the context of trends over time. There is no indication of how such tests can be employed to determine a NOAEL. Although the Cochran-Armitage test is given, its use is very restricted and the conceptual motivation is lacking. Other tests are not mentioned. There is a brief, dismissive section on Williams’ test in Chapter 7, which the author claims is rarely applicable in toxicology. In my 18 years of statistical analysis of toxicology data, I have found Williams’, Jonckheere-Terpstra, and the Cochran-Armitage tests to be extremely useful and widely applicable and there is ample toxicologically relevant literature to support that view. Other statisticians in this field, but by no means all, share that view. The OECD document cited previously contains a detailed discussion of such tests with numerous references.
10. Methods for the Reduction of Dimensionality
Table 10.1 lists 55 types of graphical displays that can be helpful in organizing and presenting data. This is an interesting list. Unfortunately, of these 55 types, only 7 are actually discussed in the text, and only 2 are illustrated in this chapter. Other techniques given brief discussion are multidimensional and nonparametric scaling, and cluster analysis. The main impact of this chapter is to mention key words that might apply to an analysis.
11. Multivariate Methods
The primary contribution of this chapter is to introduce the notions of a multivariate distribution, the variance-covariance matrix, partial correlation coefficients and discuss the concept of outlier in a multivariate context. There is only one trivial example provided.
12. Meta-Analysis
This is a limited, but interesting, presentation of a few of the basic concepts in meta-analysis with, unfortunately, only trivial examples. This is an important topic in the pharmaceutical industry and a more thorough treatment would have been valuable.
13. Bayesian Analysis
Bayes’ theorem is given an extended discussion, followed by some discussion of the notion of prior and posterior distributions and some unmotivated formulas for the Bayesian approach to estimating the mean and standard deviation of a normal distribution. An example is given to illustrate the use of these formulas. The main contribution of this chapter is to introduce the reader to the existence of Bayesian analysis, not to equip the reader to carry it out. Although a thorough treatment of Bayesian analysis would be beyond the scope of an introductory text such as this, it might have helped to include as a simple, easy to understand, example the beta-binomial distribution developed in context.
14. Data Analysis Applications in Toxicology
The chapter opens with the following statement: “Having reviewed basic principles and provided a set of methods for statistical handling of data, the remainder of this book will address the practical aspects and difficulties encountered in day-to-day toxicological work.” What follows is an often useful, general discussion of issues in various types of toxicology studies rather than the detailed examples that might have been expected. He notes that “Debane and Heller (1985) recently reviewed” issues around LD50 estimation. That 1985 is considered recent is indication of the dated nature of much of the discussion and applies throughout the chapter (and, indeed, the book). Although I do not disagree with the claim of Weil (1975) that use of probit curves and the like to estimate LCx for x = 35 and below is questionable, this should have been added to the list of controversies in Chapter 20, as this is precisely what many in the field are now advocating.
There is a highly questionable practice indicated in the section on body and organ weights, where it is stated that “With smaller sample sizes, the normality of the data become increasingly uncertain and nonparametric methods such as Kruskal-Wallis may be more appropriate.” Kruskal-Wallis (and Mann-Whitney and Wilcoxon tests, the other nonparametric tests considered) have low power for small samples, so this advice is highly questionable at best. Furthermore, sample size has nothing to do with whether data are normally distributed. It is certainly true that tests for normality have low power for small samples. However, in a toxicology experiment, normality should not be determined for each treatment group separately, as this inflates the false-positive rate. Rather, the residuals from an ANOVA are assessed by an appropriate test, such as the Shapiro-Wilk or Anderson-Darling. Even when each treatment group has only four or five subjects, if there are also three to five treatment groups and a concurrent control, then that brings the number of residuals up to 16 to 30, which is usually large enough to give adequate power to tests of normality. Also, apart from formal tests, QQ-plots are helpful and, more importantly, knowledge of what is usual for a particular type of response can be used as a prior to help decide whether a test that assumes normality is appropriate.
15. Carcinogenesis
This chapter is very similar in spirit to the previous. There are useful, though sometimes dated, discussion of general issues, such as the use of historical controls, experimental and observational units, treatment of missing data, and age- and mortality-adjusted analyses. Less useful is a discussion of the NCI Bioassay method of 1981.
16. Risk Assessment and Low-Dose Extrapolation
Risk assessment is here interpreted as “based on experimental results in a nonhuman species at some relatively high dose or exposure level, from which an attempt is made to predict the level of impact in humans at much lower levels.” In the present age, this is a rather narrow understanding of risk assessment. However, given that understanding and the dated nature of the references, this is one of the strongest chapters in the book. It has a good case study, where several analysis techniques are illustrated in an informative way. There are useful developments of the one-hit and multistage models and some development of PK models.
There is a nice treatment of the 2002 FDA approach to determining the maximum recommended starting dose (MRSD) for risk assessment of a new therapeutic agent, going through NOAEL determination, human equivalent dose (HED) calculation, selection of most appropriate species, application of safety factor, and consideration of the pharmacologically active dose (PAD). This is an entirely deterministic approach and does not consider the newer probabilistic risk assessment methods.
17. Epidemiology
This chapter gives a quick (15 pages plus references) overview of basic concepts in epidemiology. These include measurement of exposure and study design (historical cohort studies, proportional mortality studies, prospective cohort studies, case-control studies, cross-sectional studies, etc). There are few details and no examples, but the discussion is interesting.
18. Structure Activity Relationships
Structural activity relationships (SARs) “seek to predict the adverse biological effects of chemicals based on their structure.” The chapter touches on both qualitative and quantitative (QSAR) models. The author notes that “A detailed review of even the major methodologies available for SAR/QSAR modeling in toxicology is beyond the scope of this book.” The purpose of this very brief chapter is to introduce the basic ideas.
19. Good Laboratory Practice
This is one of the two new chapters in the book, chapter 20 being the other. To paraphrase the text, GLP is concerned with standardized documentation and procedure compliance monitoring to assure the quality and integrity of nonclinical test data. Anyone who works in this field understands the importance of proper documentation. This reviewer has significant experience in complying with GLP requirements for statistical software development and can testify from personal experience that a good grasp of GLP requirements is essential for keeping a laboratory in business. Useful and up-to-date references are provided.
20. Areas of Controversy in Statistics as Used for Toxicology
Four pages are devoted to areas of controversy and hardly touch on areas of current controversy. The issue of one-sided versus two-sided tests is not a pressing issue today, neither is censoring nor unbalanced designs or the use of computerized statistical packages. These are the only “controversial” topics mentioned. Of far greater interest today are (1) the issue of whether hypothesis testing (NOAEL) should be replaced by regression analysis (ECx); (2) whether the very complex biologically based models of S.A.L.M Kooijjman and his school are sufficiently well developed to enter the main stream; (3) what distribution should be used for SSDs and, indeed, whether the SSD approach to risk assessment is a good tool for environmental risk assessment; (4) how does one distinguish between variability and uncertainty in modeling and, more generally, in risk assessment; and how does one quantify these two concepts; (5) the role of Bayesian methods in analysis, especially in the area of risk assessment.
Summary
As an introduction to basic ideas in toxicology for the pharmaceutical industry, this is an interesting reference. For toxicologists in the chemical industry, this is less interesting, but still applicable. It is not really a how-to book on statistics or toxicology, though it certainly introduces numerous topics from both areas of knowledge.
However, it is out of date even in this new edition. It will certainly not satisfy a statistician, nor does it claim to be directed at that audience. My concern as a statistician in the field of toxicology is that it will in some instances give a false sense of what statistical tools are appropriate and this could lead to unhelpful interactions with statistical personnel. In my 18-year career in this field, I have run into this sort of issue on several occasions. Although that is an occupational hazard, the toxicologist could have been better served by a more up-to-date treatment. I would conclude that it does not meet either its stated goal or ambition quoted at the beginning of this review.
Footnotes
1
Westfall, P. H., R. D. Tobias, D. Rom, R. D. Wolfinger, and Y. Hochberg. 1999. Multiple comparisons and multiple tests. Cary, NC: SAS Institute.
2
Hochberg, Y., and A. C. Tamhane. 1987. Multiple comparison procedures. New York: Wiley.
