Abstract

1 Marginal spike histograms
Some intriguing examples given by Harrell (2015) prompt experiment with adding marginal spike histograms to quantile and (empirical) cumulative distribution (function) plots. This tip explains why this might be helpful and gives sample code using the official command
The point is at least twofold.
Pedagogic. Such additions help clarify what (for example) a quantile plot means and how it can be explained as stacking values in terms of their associated cumulative probabilities. Every kind of graph becomes comfortable only with increased familiarity. I have encountered students and colleagues evidently long past their first histogram who struggle a little on their first experience with quantile plots. A little help does no harm.
Practical. In principle, marginal histograms add no more information to a quantile plot. In practice, they offer a complementary view of each distribution, affording another way to think about distribution level, spread, and shape—and indeed fine structure too.
Discussion here is phrased entirely in terms of official Stata commands, with the
As usual, the idea has longer roots. See Galton (1889, 38) for (in modern terms) a quantile plot and a histogram sharing the same vertical magnitude axis.
2 Enhancing quantile plots
The name “quantile plot” goes back at least to Wilk and Gnanadesikan (1968), but the device is much older, being used several times in the 19th century and early 20th century. The main idea is just to plot ordered values, often but not invariably on the y axis, against their associated cumulative probabilities on the other axis. More generally, ordered values are plotted against corresponding quantiles of some relevant distribution. Small print aside, associated cumulative probabilities are just quantiles of a uniform (rectangular, flat) distribution on the interval from 0 to 1. Cumulative distribution plots are—again, often but not invariably—plotted with those axes reversed.
There are many small variations in both terminology and format. Quantile–quantile plot and empirical cumulative distribution function plot are some other terms. As if in apology or compensation for such long-windedness, concise but more cryptic labels such as q–q plots and ECDF plots can be found. As a matter of curious convention rather than compelling logic, quantile plots commonly use markers or point symbols for each distinct value, while cumulative distribution plots commonly use connected lines. That is a distinction without a difference if any distribution is such that points merge into lines, as may well be true with large sample sizes.
Let’s start with a simple quantile plot. The official Stata command

Quantile plot of miles per gallon from the auto data
As implemented in
Because one axis is a probability scale, that is compatible in principle with a marginal histogram showing probabilities in each distinct bin. There are various fairly simple ways to get that directly. Here’s one.
Counting the sample size explicitly does no harm and is needed if there are missing values (not the case in this example but often true) or if you want to look at a subset of your data (again, not the case here but also often true).
A first attempt just adds spikes to the vertical axis, as in figure 2.

Quantile plot of miles per gallon from the auto data with marginal spike histogram
That is a good start, but we would be better off flipping the histogram so that it lies clear of the quantile plot. Negation is easy enough. See figure 3.

Quantile plot of miles per gallon from the auto data with marginal spike histogram, now flipped over or reflected in the vertical axis
The point about adding
In this example, we were lucky: There are 21 distinct values, so the mean probability in each bin is about 0.05. The maximum probability may naturally be much higher, but here taking probabilities literally (which means numerically) seems to work fine.
In other examples, we might have to do more work and make some decisions to get an extra histogram that is informative yet restrained. Let’s look at a larger dataset. Figure 4 shows a distribution of wages on a natural logarithm scale.

Quantile plot of logarithm of wage from the
Here there are 8,173 distinct values, and so the mean probability in each bin is about 0.0001. The maximum probability again can be much higher, but here taking probabilities literally (which means numerically) means that the histogram degenerates to a rug plot. If you want a rug plot, you can always get one (for example, Cox [2004]), but that is not the goal here.
There are various ways to move forward. One is to decide how much space to give the histogram and to scale accordingly. Then the probability scale for the quantile plot and that for the quantile plot are different, which needs to be explained somewhere. Figure 5 is an example. Another is to use a square-root scale for probabilities, which often works well to make structure clearer. Figure 6 is an example. The two ideas can be combined.
Figure 5 scales the histogram to cover 10% of the horizontal extent of the quantile plot.

Quantile plot of logarithm of wage from the
Figure 6 shows a square-root scale.

Quantile plot of logarithm of wage from the
3 Square-root scales for probability or frequency
If the idea of using a square-root scale is unfamiliar, let’s spell that out. Square-root scales stretch back at least a century. See discussion and references in Cox (2012) and especially also Perrin (1913, 198; 1916, 131).
Histograms using square-root scales—rootograms!—go back at least to John W. Tukey circa 1965. The idea is to show not bin frequencies but their square roots. Frequencies, as counted variables, tend to have variability that is stabilized by a root transformation, at least approximately. It is now harder to see bins with higher frequency and much easier to see bins with lower frequency. Bins that are empty remain no problem because the square root of zero is zero. Note also that the square root of a normal or Gaussian density is a multiple of another normal or Gaussian density. Hence, if the normal is a reference distribution, we are looking for the same shape on a rootogram, and experience in assessing histograms for approximate normality can be applied directly in assessing rootograms. However, taking the root is only the first step in Tukey’s procedure, and we do not implement his hanging or suspended rootograms. See Tukey (1986, 1972, 1977, chap. 17), Tukey and Wilk (1965), or Velleman and Hoaglin (1981).
4 A bimodal example
As suggested by Frank Harrell (personal communication), we close with an example that shows bimodal structure. We use the famous iris dataset of Edgar Anderson, as publicized by Fisher (1936) and often used as a sandbox in multivariate statistics and machine learning exercises. Stebbins (1978) gave an appreciation of Anderson, a distinguished and idiosyncratic botanist, and comments on the scientific background to distinguishing three species of the genus Iris. Kleinman (2002) surveys Anderson’s graphical contributions with statistical flavor.
A Stata-readable copy of the iris data is bundled with the media for this issue. This dataset includes 150 measurements of the width and length of petals and sepals for 50 flowers each of the species Iris setosa, Iris versicolor, and Iris virginica. Using the same recipe as above, figure 7 shows the data for petal length.

Quantile plot and marginal spike histogram for petal length from the iris data. Note the bimodal structure.
The bimodal structure is, I suggest, clearly shown. It is no discovery and easily related to which species is being measured.
5 Graphics choices
In graphics, choices small and large abound, and taste and circumstance should determine your own choices. The device in this tip is for use whenever it is helpful. Other possibilities include offsetting the histogram slightly by introducing some space between it and the display of quantiles, using a marginal dot or strip plot instead, and smoothing the quantiles slightly because sometimes fine structure is just noise.
Supplemental Material
Supplemental Material, sj-zip-1-stj-10.1177_1536867X211045583 - Stata tip 141: Adding marginal spike histograms to quantile and cumulative distribution plots
Supplemental Material, sj-zip-1-stj-10.1177_1536867X211045583 for Stata tip 141: Adding marginal spike histograms to quantile and cumulative distribution plots by Nicholas J. Cox in The Stata Journal
Footnotes
6 Acknowledgments
I thank Frank Harrell and Antony Unwin for encouraging and helpful correspondence.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
