Sage Journals: Discover world-class research

Abstract

In this article, I propose a new test for power-law behavior. The statistical test, pwlaw, is locally optimal if the possible alternative distributions are contained in the Pareto type (IV) family. After deriving the test, I examine four classical datasets: the frequency of unique words in an English text (Moby Dick); the human populations of U.S. cities; the frequency of U.S. family names; and the peak gamma-ray intensity of solar flares. I show that in the first case there is no indication of any power-law behavior and that in the second and fourth cases there is evidence in that regard.

Keywords

st0610 pwlaw power-law distribution Pareto law Zipf’s law Pareto (IV) distribution

1 Introduction

A continuous random variable is said to have a power-law (PWL) behavior if it has a probability density of the form

g (x) = \frac{α}{μ} {(\frac{x}{μ})}^{- (α + 1)}

where α > 0 is the shape parameter that determines the tail of the distribution and µ > 0 is the location parameter where the “heavy tail” starts.¹ The law given above is due to Pareto (1897), but in the particular case of α = 1, it is occasionally named after Zipf (1949).

The surveys by Brock (1999), Mitzenmacher (2004), and Newman (2005) make clear that the assumption of PWL behavior has been routinely made over the years by many authors in a number of disciplines. Also, in most of that literature, the use of PWL distributions is typically defended by simply pointing out the approximate linear nature of a doubly logarithmic plot or by reporting a simple regression. However, as noted by, for example, Clauset, Shalizi, and Newman (2009) and Urzúa (2011), those arguments are flawed.

Fortunately, there are also in the literature better procedures to test for PWL behavior. The most common, which can be traced back at least to Quandt (1964), is to estimate, for a given dataset, not only the Paretian law but also some competing alternative distributions. Once those estimations are made, the resulting fits are ranked according to some statistical criteria. For instance, in their rather influential article on the empirics of PWL behavior, Clauset, Shalizi, and Newman (2009) suggest as alternatives to that distribution four others: the lognormal, the exponential, the stretched exponential, and the PWL with cutoff.

Other testing procedures have been proposed as well. In particular, using a methodology that is quite common in the econometrics literature, Urzúa (2000) presents a test for Zipf’s law that is derived after considering a more general distribution that nests the density given in (1). Afterward, a locally optimal test for Zipf’s law is derived by means of Rao’s (1948) score test, also known in econometrics as the Lagrange multiplier test.²

The more general density posited in that article is of the form

f (x) = \frac{α}{μ} {(1 + \frac{x - μ}{σ})}^{- (α + 1)} x > μ

where, aside from the two parameters introduced earlier, σ > 0 is the scale parameter. This distribution is known in the literature by several names, but here it will be simply called Pareto (II), following the terminology of Arnold (see the reference in Arnold [2015]); likewise, Pareto’s basic law, given in (1), will be called from now on Pareto (I).

Using the Pareto (II) distribution, Urzúa (2000) derives a test for Zipf’s law by testing the statistical hypothesis H ₀ : σ = µ, α = 1. Rao’s score test is used because it requires estimation only under the null hypothesis. Note that if the null is reduced to H ₀ : σ = µ, then, as Goerlich (2013) shows, one can even produce a test for the Pareto (I) distribution. Nevertheless, that test would not be appropriate in the case of testing for PWL behavior, because, as I will show in the next section, the densities (1) and (2.1) have a similar heavy-tail behavior.

2 A test for PWL distributions

Following again the terminology in Arnold (2015), this article assumes that the alternatives to the Pareto (I) distribution belong more generally to the Pareto (IV) family, whose density is given by

f (x) = \frac{α}{γ^{σ}} {1 + {(\frac{x - μ}{σ})}^{1 / γ}}^{- (α + 1)} {(\frac{x - μ}{σ})}^{1 / γ - 1} x > μ

where γ > 0 is the inequality parameter. Note that (2.1) is obtained from (2.1) when σ = µ and γ = 1 and that the Pareto (IV) distribution function is given by

F (x) = 1 - {1 + {(\frac{x - μ}{σ})}^{1 / γ}}^{- α} x > μ

For purposes of this article, the parameter µ, which is present in the three densities given above, is assumed to be fixed by statistical design. This assumption is quite reasonable from a practical point of view, aside from the fact that it is theoretically required.³ Thus, after defining w = x − µ, we can rewrite the Pareto (IV) density in a more compact form as

F (w) = \frac{α}{λ^{σ}} {1 + {(\frac{w}{σ})}^{1 / γ}}^{- (α + 1)} {(\frac{w}{α})}^{1 / γ - 1} w > 0

Note that the tail behavior of a Pareto (IV) distribution is determined not only by α, as is the case for the Pareto (I) and (II) distributions, but also by the new parameter γ. This is so because, using L’Hôpital’s rule and (2),

\begin{matrix} \lim_{w \to \infty} \frac{1 - F (w)}{w^{- α / γ}} = \lim_{w \to \infty} \frac{γ f (w)}{α w^{- α / γ - 1}} \\ = \lim_{w \to \infty} {1 + {(\frac{w}{σ})}^{1 / γ}}^{- (σ + 1)} {{(\frac{σ}{w})}^{1 / γ}}^{- (σ + 1)} σ^{α / γ} \\ = σ^{α / γ} \end{matrix}

Thus, the Pareto (IV) family can be approximated, on the right tail, by w^−α/γ . This implies that, for a given PWL parameter α, the tail of the distribution can become as short as desired by reducing the value of γ from one to some small but positive value. Likewise, for a given α, the tail can be made even heavier by setting the value of γ greater than one.

Let θ = (θ ₁ , θ ₂ , θ ₃) = (σ, γ, α) be the vector of parameters, and let ${w_{i}}_{i}^{n} = 1$ be a random sample. Then, the corresponding log-likelihood function is

l (θ) = n \log (\frac{α}{γ σ}) - (α + 1) \sum_{i = 1}^{n} \log {1 + {(\frac{w_{i}}{σ})}^{1 / γ}} + (1 / γ - 1) \sum_{i = 1}^{n} \log (\frac{w_{i}}{σ})

For later use, it is necessary to find its gradient and the expected value of its Hessian. The elements of the gradient are given by

\frac{\partial l}{\partial σ} = - \frac{n}{γ σ} + \frac{α + 1}{γ σ} \sum_{i = 1}^{n} \frac{{(w_{i} / σ)}^{1 / γ}}{1 + {(w_{i} / σ)}^{1 / γ}}

\frac{\partial l (θ)}{\partial γ} = - \frac{n}{γ} + \frac{1}{γ^{2}} \sum_{i = 1}^{n} \log (w_{i} / σ) + \frac{α + 1}{γ^{2}} \sum_{i = 1}^{n} \frac{{(w_{i} / σ)}^{1 / γ} \log (w_{i} / σ)}{1 + {(w_{i} / σ)}^{1 / γ}}

\frac{\partial l (θ)}{\partial α} = \frac{n}{α} - \sum_{i = 1}^{n} \log {1 + {(w_{i} / σ)}^{1 / γ}}

The second-order derivatives are also easy to find, but they are omitted here for lack of space. A less obvious task is to calculate their expected values. Fortunately, they have already been derived in the meticulous article by Brazauskas (2002). Thus, the elements of the information matrix

I_{i j} (θ) = - E_{θ} {\frac{\partial^{2} l (θ)}{\partial θ_{i} \partial θ_{j}}} i, j = 1, 2, 3

are also known.

The stage is now set to devise a test for the PWL distribution. This is accomplished here by employing Rao’s score test in the case of the null hypothesis H ₀ : σ = µ, γ = 1. As a first step to that end, we find the restricted maximum-likelihood estimate for the nuisance parameter α by equating (8) to zero and replacing w_i with x_i − µ:

\hat{α} = {\frac{1}{n} \sum_{i = 1}^{n} \log (x_{i} / μ)}^{- 1}

Consequently, the restricted score vector is given by $\hat{d} = (d_{1}, d_{2}, 0)^{'}$ , where the first two components can be found using (8) in (4)–(5) and imposing σ = µ and γ = 1. After some simple algebra, they are found to be

d_{1} = \frac{n \hat{α}}{μ} - (\hat{α} + 1) \sum_{i = 1}^{n} \frac{1}{x_{i}}

d_{2} = - n + \hat{α} \sum_{i = 1}^{n} \log (\frac{x_{i} - μ}{μ}) - (\hat{α} + 1) \sum_{i = 1}^{n} \frac{μ}{x_{i}} \log (\frac{x_{i} - μ}{μ})

Before presenting the corresponding information matrix, I need to introduce some notation. Let $Γ (a) = \int_{0}^{\infty} t^{a -}^{1} e^{- t} d t, ψ (a) = Γ^{'} (a) / Γ (a)$ , and ψ′(a) = dψ(a)/da be, respectively, the gamma, digamma, and trigamma functions. Also, to have cleaner expressions in what follows, I define p(a) ≡ ψ(a) − ψ(1) − 1 and q(a) ≡ ψ′(a) + ψ′(1). Given that notation, and after imposing σ = µ and γ = 1 in the information matrix for the Pareto (IV) distribution in Brazauskas (2002, 165), we obtain the following restricted information matrix:

\hat{I} = n {\begin{matrix} \frac{\hat{α}}{μ^{2} (\hat{α} + 2)} & \frac{\hat{α} p (\hat{α}) + 1}{μ (\hat{α} + 2)} & \frac{1}{μ (\hat{α} + 1)} \\ \frac{\hat{α} p (\hat{α}) + 1}{μ (\hat{α} + 2)} & \frac{\hat{α} {p {(\hat{α})}^{2} + q (\hat{α}) + 2 (p (\hat{α}) + 1)}{\hat{α} + 2} & \frac{p (\hat{α})}{\hat{α} + 1} \\ - \frac{1}{μ (\hat{α} + 1)} & \frac{p (\hat{α})}{\hat{α} + 1} & \frac{1}{{\hat{α}}^{2}} \end{matrix}}

Finally, keeping in mind that $\hat{d} = (d_{1}, d_{2}, 0)^{'}$ is defined by (10) and (9) and that $\hat{I}$ is given in (10), we define the score test for PWL distributions as

PWL = {\hat{d}}^{'} {\hat{I}}^{- 1} \hat{d} \overset{a}{\sim} χ_{2}^{2}

Under the null, PWL is asymptotically distributed as a chi-squared with two degrees of freedom. The statistic can be readily computed using the pwlaw command, with the following syntax:

pwlaw varname [in] [, mu( # )]

Note that the program automatically drops any x_i ≤ µ.

How well does the test statistic behave in the case of small samples? Table 1 presents evidence that the asymptotic critical values can be safely used when 1 ≤ α ≤ 3, the typical range for the shape parameter, and n ≥ 100, as is the case for most applications. Otherwise, the significance points in table 1 can be used as approximations, or for sharper results, the pwlaw command can be used to perform a specific Monte Carlo simulation.

Table 1.

Significance points for PWL

α	Level	20	25	30	40	60	100	∞
1	5%	5.64	5.70	5.77	5.81	5.85	5.93	5.99
	10%	4.19	4.28	4.35	4.40	4.46	4.53	4.61
1.5	5%	5.51	5.60	5.71	5.74	5.84	5.88	5.99
	10%	4.06	4.18	4.25	4.32	4.41	4.48	4.61
2	5%	5.44	5.54	5.63	5.70	5.80	5.86	5.99
	10%	3.96	4.08	4.17	4.26	4.36	4.45	4.61
2.5	5%	5.39	5.49	5.60	5.68	5.79	5.83	5.99
	10%	3.89	4.01	4.10	4.21	4.32	4.41	4.61
3	5%	5.35	5.45	5.55	5.65	5.78	5.85	5.99
	10%	3.83	3.95	4.05	4.16	4.28	4.39	4.61

NOTE: Simulations using 100,000 replications.

3 Applications

We now exemplify the use of the PWL test in the case of four interesting datasets studied by Newman (2005) and Clauset, Shalizi, and Newman (2009).⁴ They are listed in table 2, together with seven values associated with each of them. Regarding the columns, note that although x _max is clearly the maximum value in each sample, ${\hat{x}}_{min}$ is not the minimum value but rather an estimated lower bound on PWL behavior.⁵ Given the value of that bound, the next smaller number in the sample corresponds to µ (to enforce x > µ), while n _tail is the effective sample size.

Table 2.

Tests of PWL behavior

	n	xmax	${\hat{x}}_{min}$	µ	n_tail	PWL	p-value
Count of word use	18,855	14,086	7	6	2,958	189	0.00
Population of cities	19,447	8,008,654	52,457	52,360	580	0.16	0.92
Frequency of surnames	2,753	2,502,020	111,919	109,432	239	3.32	0.19
Solar flare intensity	12,773	231,300	323	322	1,711	0.87	0.65

The first dataset in table 2 refers to the number of times that unique words occur in a classical piece of English literature, the novel Moby Dick by Herman Melville. Our statistic can be computed with the command pwlaw x, mu(6). As shown in the table, the value of the statistic thus obtained is quite large in that particular case, and hence the associated p-value is quite small. Thus, according to the PWL test, the null hypothesis of PWL behavior is strongly rejected. This result is at odds with many previous studies on word frequencies, ranging from Zipf (1949) to Clauset, Shalizi, and Newman (2009).

The second case corresponds to the human populations of U.S. cities as recorded by the U.S. Census Bureau in 2000. As shown in the table, and according to the proposed test, there is convincing evidence of PWL behavior. This result is in line with most of the literature on the size of cities. Note also that this finding does not contradict the claim in Urzúa (2000) that U.S. cities do not follow Zipf’s law, because the latter is just an extreme case of a PWL distribution (when α = 1).

The third sample is made of the frequencies of U.S. family names in the 1990 U.S. Census. As reported in the table, there is moderate evidence of PWL behavior, because the null hypothesis would not be rejected at the typical significance levels. A very similar finding is reported by Clauset, Shalizi, and Newman (2009).

Finally, the fourth sample listed in table 2 is made of the peak gamma-ray intensity of solar flares, measured from orbit between 1980 and 1989 (see Newman [2005] for more details). Given the p-value reported in the table, there is good evidence of PWL behavior in this case. Note that a theoretical model that renders precisely that behavior is due to Lu and Hamilton (1991).

4 Concluding remark

The Pareto (IV) distribution used in this article belongs to the Feller–Pareto family (see Arnold [2015]), whose general density is of the form

f (x) = \frac{Γ (γ_{1} + γ_{2})}{γ σ Γ (γ_{1}) Γ (γ_{2})} {1 + {(\frac{x - μ}{σ})}^{1 / γ}}^{- (γ_{1} + γ_{2})} {(\frac{x - μ}{σ})}^{(γ_{1} + γ_{2}) - 1} x > μ

where γ ₂ > 0 and, to use the notation given earlier, γ ₁ = α. Thus, following the same procedure given before, we can derive a new, more general test statistic. But because of its algebraic intricacy, it is not presented here. In any case, the gain from that generalization is not totally apparent, because, as can be checked in the same way as in (3), the Feller–Pareto and Pareto (IV) distributions have similar tail behavior.

Supplemental Material

Supplemental Material, st0610 - A simple test for power-law behavior

Supplemental Material, st0610 for A simple test for power-law behavior by Carlos M. Urzúa in The Stata Journal

Footnotes

5 Acknowledgments

I appreciate the very helpful comments made by a referee on an earlier version of the article. I also thank Erick Rosas for his research assistance.

6 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

Notes

References

Arnold

B. C.

2015. Pareto Distributions. 2nd ed. Boca Raton, FL: CRC Press.

Bera

A. K.

Bilias

2001. Rao’s score, Neyman’s C(α) and Silvey’s LM tests: An essay on historical developments and some new results. Journal of Statistical Planning and Inference 97: 9–44. https://doi.org/10.1016/S0378-3758(00)00343-8.

Brazauskas

2002. Fisher information matrix for the Feller–Pareto distribution. Statistics & Probability Letters 59: 159–167. https://doi.org/10.1016/S0167-7152(02)00143-8.

Brock

W. A.

1999. Scaling in economics: A reader’s guide. Industrial and Corporate Change 8: 409–446. https://doi.org/10.1093/icc/8.3.409.

Clauset

2019. Power-law distributions. http://tuvalu.santafe.edu/ ∼ aaronc/datacode.htm.

Clauset

Shalizi

C. R.

Newman

M. E. J.

2009. Power-law distributions in empirical data. SIAM Review 51: 661–703. https://doi.org/10.1137/070710111.

Goerlich

F. J.

2013. A simple and efficient test for the Pareto law. Empirical Economics 45: 1367–1381. https://doi.org/10.1007/s00181-012-0654-5.

E. T.

Hamilton

R. J.

1991. Avalanches and the distribution of solar flares. Astrophysical Journal 380: L89–L92. http://doi.org/10.1086/186180.

Mitzenmacher

2004. A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1: 226–251. https://doi.org/10.1080/15427951.2004.10129088.

10.

Newman

M. E. J.

2005. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46: 323–351. https://doi.org/10.1080/00107510500052444.

11.

Pareto

1897. Cours d’Économie Politique. Vol. 2 . Lausanne: F. Rouge.

12.

Quandt

R. E.

1964. Statistical discrimination among alternative hypotheses and some economic regularities. Journal of Regional Science 5(2): 1–23. https://doi.org/10.1111/j.1467-9787.1964.tb01462.x.

13.

Rao

C. R.

1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society 44: 50–57. https://doi.org/10.1017/S0305004100023987.

14.

Rao

C. R.

1973. Linear Statistical Inference and Its Applications. 2nd ed. New York: Wiley.

15.

Silvey

S. D.

1959. The Lagrangian multiplier test. Annals of Mathematical Statistics 30: 389–407. https://doi.org/10.1214/aoms/1177706259.

16.

Urzúa

C. M.

2000. A simple and efficient test for Zipf’s law. Economics Letters 66: 257–260. https://doi.org/10.1016/S0165-1765(99)00215-3.

17.

Rao

C. R.

2011. Testing for Zipf’s law: A common pitfall. Economics Letters 112: 254–255. https://doi.org/10.1016/j.econlet.2011.05.049.

18.

Zipf

G. K.

1949. Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison–Wesley.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.04 MB

0.00 MB