Abstract
One way in which we learn new information is to read the medical literature. Whether or not we do primary research, it is important to be able to read literature in a critical fashion. A seemingly simple concept in reading is to interpret p values. For most of us, if we find a p value that is <.05, we take the conclusion to heart and quote it at every opportunity. If the p value is >.05, we discard the paper and look elsewhere for useful information. Unfortunately, this is too simplistic an approach. The real utility of p values is to consider them within the context of the experiment being performed. Defects in study design can make an interpretation of a p value useless. One has to be wary of type I (seeing a “statistically significant” difference just because of chance) and type II (failing to see a difference that really exists) errors. Examples of the former are publication bias and the performance of multiple analyses; the latter refers to a trial that is too small to demonstrate the difference. Finding significant differences in surrogate or intermediate endpoints may not help us. We need to know if those endpoints reflect the behavior of clinical endpoints. Selectively citing significant differences and disregarding studies that do not find them is inappropriate. Small differences, even if they are statistically significant, may require too much resource expenditure to be clinically useful. This article explores these problems in depth and attempts to put p values in the context of studies.
Get full access to this article
View all access options for this article.
