New mixture distributions for modelling count data

Abstract

A class of new 1-parameter underdispersed distributions is introduced. Mixed with Poisson distributions; they generate 2- and 3-parameter discrete distributions that generalize the Poisson distribution and can be both under and over-dispersed. Probabilities are easy to compute and moments and random number generation are tractable. The distributions are described, and they are fitted to some underdispersed and overdispersed datasets. We show how inference for the effect of covariates sharpens on moving from the Poisson model. The fits compare favourably to two benchmarks, the COM Poisson distribution and the weighted Poisson distribution.

Keywords

Mixture distribution Poisson distribution underdispersion covariates

1 Introduction

1.1 Previous work on underdispersed distributions

The Poisson distribution is the distribution for modelling count data, and often gives an approximate fit. Its variance equals its mean, and so departures from Poissonness can be categorized as overdispersion (variance exceeds mean) or underdispersion (variance less than mean). Here we propose some new models for non-equidispersed data.

There is some probabilistic basis for modelling overdispersed distributions with the negative binomial model (e.g., Hilbe, 2011). There one assumes that the Poisson mean itself is drawn from a gamma distribution.

For underdispersion, with which we are mainly concerned, there is no convincing probabilistic model. One could consider an extended Poisson process or Markov birth process, where the rate of event occurrence scales up or down by a factor γ after each event. This would yield under or overdispersed models, but the probabilities are very complicated.

Instead of using probabilistic models, practitioners therefore rely on a host of ad-hoc models that could fit the data, such as the Conway-Maxwell distribution (COM); see, for example, Shmueli et al. (2005). Here the Poisson probability $P_{x} = μ^{x} \exp (- μ) / x! for X \in {0, 1, 2 \dots}$ becomes $Q_{x} = μ^{x} / (Z (μ, v) {(x!)}^{v})$ , where ν > 1 gives underdispersion, ν < 1 gives overdispersion and Z must be found by summing all probabilities. This is computationally expensive, particularly when μ is a function of covariates, in which case Z must be computed for each combination of covariates. We use this distribution as a benchmark because it can model overdispersion and underdispersion, and is widely used, for example, Ben Mzoughia et al. (2018).

Another class of models are gamma count models, for example, Winkelmann (1996) and Weibull count models (e.g., McShane et al., 2008; Boshnakov et al., 2017). The Poisson process has exponential intervals between events, and these can be replaced with Weibull or gamma-distributed intervals. These models can be thought of either as ad-hoc models or as having some probabilistic basis.

There are many other models for underdispersed data, such as weighted Poissons, for example, Castillo and Pérez-Casany (2005). There are many weighted Poisson distributions, but the one used here as a benchmark has probability mass function (pmf)

P_{k} = \frac{θ^{k} {(k + a)}^{r}}{k! C (θ, r, a)},

where C must be found by summing all probabilities, and a ≥ 0, r ≥ 0.

It is currently the case that there are no models that are tractable, for example, for likelihood computation, and which can model considerable underdispersion.

1.2 Overview of the new models

This is a summary to preface the more technical content below.

By mixing the Poisson distribution with one of a class of underdispersed distributions, one can obtain distributions that generalize the Poisson distribution and are underdispersed or mildly overdispersed. They can have 2 or 3 parameters, and are tractable in that probabilities are simple to write down, so that fitting to data is computationally fast. Also, their moments can be derived in terms of the model parameters. The mean in particular is needed, as we often wish to see how it depends on covariates.

Random numbers are easily generated. These are needed, for example, for Monte-Carlo simulations and some data analyses. This tractability compares favourably with our two benchmarks, the COM-Poisson and the weighted Poisson distributions; for these distributions all probabilities must be summed numerically to obtain the normalizing constant and moments.

However, the crucial measure of usefulness is whether or not the new distributions can fit data. Economists, healthcare workers and others need to do this primarily to evaluate the effect of covariates, for example, does fertility (number of offspring) vary with educational level?

Underdispersed datasets are not plentiful, but we studied eight, three with covariates, and taken from a range of subject areas. Overall, the new distributions performed well, and it is possible to recommend one in particular that would be a good starting point for fitting underdispersed data.

It has been mentioned that existing underdispersed distributions sometimes have a probabilistic basis, but the most useful ones such as COM-Poisson and weighted Poisson, do not. The distributions proposed here are also functional forms designed to be underdispersed, but with no strong probabilistic basis.

This article aims to introduce the new distributions and give their properties, and to demonstrate their usefulness with some fits to data.

1.3 Usage of the proposed new models

A class of 1-parameter underdispersed distributions is introduced, and the model comprises a mixture of these with the Poisson distribution. Admixture with the Poisson distribution is needed so that any amount of underdispersion can be modelled. The models have two parameters: a parameter η linked closely to the mean, and the mixing parameter ϕ. They can be made more flexible by assigning the location parameter η to the Poisson component of the mixture, and η^′ to the underdispersed component, giving 3-parameter models. This extra parameter enables even bimodality to be modelled.

The mixture distribution has pmf

S_{x} = (1 - ϕ) P_{x} + ϕ Q_{x},

(1.1)

where P denotes the Poisson probability, and Q the corresponding probability from the underdispersed distribution. The variance is then

var (X) = (1 - ϕ) σ_{p}^{2} + ϕ σ_{q}^{2} + ϕ (1 - ϕ) {(μ_{p} - μ_{q})}^{2},

(1.2)

where $μ_{p}, μ_{q}$ denote means and $σ_{p}^{2}, σ_{q}^{2}$ variances. Suppose that $σ_{q}^{2}, σ_{p}^{2}$ . Then if the means $μ_{p}, μ_{q}$ are similar, we have underdispersion $(var (X) < σ_{p}^{2}) if ϕ > 0$ . However if ϕ < 0, giving a negative mixture, we would have overdispersion from (1.2). This would also result if the two means are very different.

Negative mixture distributions do not have a probabilistic basis in the way that positive mixtures do, but are sometimes used as models. Thus the hypoexponential distribution with pdf

f (t) = \frac{λ_{1} λ_{2} \{\exp (- λ_{1} t) - \exp (- λ_{2} t)\}}{λ_{2} - λ_{1}}

that generalizes the Erlang(2) distribution can be regarded as a negative mixture of exponentials.

With negative mixtures there is the danger that the probabilities S_x could become negative. If the non-Poisson distribution were longer-tailed than the Poisson, this would happen for any negative value of ϕ. However, when the other distribution is underdispersed, there is no such tail problem, and positive probabilities can exist for small negative values of ϕ. Hence the models proposed here can always be overdispersed. They were not particularly intended for this, because the negative binomial distribution usually fits well, has a probabilistic motivation, and so is preferable. However, sometimes one of a series of underdispersed datasets may turn out to be slightly overdispersed, and then the ability to model overdispersion is useful. Note that we must also have ϕ ≤ 1 to avoid negative probabilities.

The class of underdispersed probability models used is introduced next, then their main properties are given, followed by some fits of the mixed models to data. For readability, many technical details are relegated to appendices. All datasets are in the public domain and are referenced, except for the ‘yellow card’ dataset, which is given in this article. They are available in the supplementary material for this article.

2 The models

2.1 The simplest example: A distribution from the cosh function

The Poisson pmf $P_{x} = (η^{x} / x!) \exp (- η)$ can be regarded as the expansion of exp(η), normalized to unity. Expanding the cosh function instead, we have that

\cosh (2 η) = \sum_{n = 0}^{\infty} {(2 η)}^{2 n} / (2 n)! .

Here only the even powers of η remain. The terms are now relabelled to be sequential probabilities, and so from the cosh expansion,

Q_{x} = \frac{{(2 η)}^{2 x} sech (2 η)}{(2 x)!},

(2.1)

is a pmf. The scaling of η is done so that η will be the mean when it is large. This relabelling is the opposite approach from that of Bermúdez et al. (2017), who used Poisson probabilities to represent heaped count data, where, for example, all counts must be even.

The probability generating function (pgf) $G (z) = E (z^{X})$ is given by

G (z) = \cosh (2 η z^{1 / 2}) / \cosh (2 η) .

Moments are generated most easily by differentiating the moment generating function (mgf) $M (t) = G (\exp (t))$ and setting t = 0. It follows that

\begin{matrix} E (X) = η \tanh (2 η), \\ var (X) = (η / 2) \tanh (2 η) + η^{2} {sech}^{2} (2 η) . \end{matrix}

As $η \to \infty, E (X) \to η$ and $var (X) \to η / 2$ . This is therefore an underdispersed distribution, to be used in a mixture as per (1.1).

2.2 The meaning of the parameters

The parameter ϕ is the mixing parameter; when ϕ = 0 we have a Poisson distribution and when ϕ = 1 we have one of the new underdispersed distributions. negative ϕ corresponds to overdispersion. In the 3-parameter case, we have η, the mean of the Poisson distribution, and η^′, the underdispersed distribution parameter. As mentioned, when η^′ is large it tends to the mean, so it is clearly close to the mean. One could take the mean of the underdispersed distribution as the parameter, but then it would be necessary to solve A practical point is that when fitting datasets with covariates we usually make η and η^′ proportional to $\exp (\sum_{i = 1}^{m} β_{m} x_{m}),$ where the x_m are covariates.

Economists and clinicians often want to know how the mean response μ depends on the covariates. For simplicity, take one covariate. The mean of the mixture distribution is

μ = (1 - ϕ) η_{0} \exp (β X) + ϕ η_{0}^{'} \exp (β X) \tanh (2 η_{0} \exp (β X)),

where η takes the value η₀ when X = 0. Differentiating this, when X = 0,

\partial μ / \partial X = \{(1 - ϕ) η_{0} + ϕ η_{0}^{'} \tanh (2 η_{0}^{'}) + ϕ (2 η_{0}^{'}) {sech}^{2} (2 η_{0}^{'})\} β .

Thus the change in mean when X varies can be approximated. This calculation is messy but is more difficult for the benchmark distributions.

2.3 General properties of these distributions

In general, one can analytically sum series where m ≥ 0 terms are zero at regular intervals, giving a class of distributions. These must be named; a possible naming scheme is to give the numbers of the first and second nonzero terms. Thus the distribution derived from the cosh function would be RZP02, that is, ‘relabelled zeroed Poisson with terms 0, 2, 4, …’. The Poisson distribution itself would be RZP01.

In general for RZP0m distributions, we can write $G (z) = Z (m η z^{1 / m}) / Z (m η)$ , for some function Z that can be found analytically in terms of hyperbolic and trigonometric functions. The mgf is

M (t) = Z (m η \exp (t / m)) / Z (m η) .

(2.2)

A way to find Z to normalize these probabilities and to find the other distributions RZPj, m + j for 0 < j < m from them is given in appendix A. Random number generation for these distributions and also for mixture distributions is described in appendix B.

For RZP0m distributions, the means approach η quickly and monotonically (Figure 1), while the variances and coefficients of dispersion also quickly approach their asymptotic value, but may oscillate (Figure 2). That the means are monotonic can be proved, but the proof is omitted for brevity. It proceeds by differentiating (2.2), whence $d μ / d η = m σ^{2} / η > 0$ .

Figure 1.

The means of these distributions as a function of η; note that the slopes of the means can oscillate but the means are monotonic functions of η.

Figure 2.

The coefficients of dispersion σ2/μ of these distributions as a function of the mean μ.

Here the use of evaluating these exponential series with only every mth term present has been to construct underdispersed distributions. However, they can also be used, without relabelling probabilities, to model spiky or ‘heaped count’ data. Bermúdez et al. (2017) model the frequencies of work disability days, which are often rounded to 5 or 7 days, and a mixture of the distributions here could also be used for that.

Some other useful models are now described.

2.4 The RZP04 distribution: A distribution from cosh and cosine functions

An even more underdispersed distribution can be derived from the expansion

{\cosh (4 η) + \cos (4 η)} / 2 = \sum_{n = 0}^{\infty} {(4 η)}^{4 n} / (4 n)!,

so tha $Q_{x} = \frac{2 {(4 η)}^{4 x} / (4 x)!}{\cosh (4 η) + \cos (4 η)}$ . The pgf is

G (z) = \frac{\cosh (4 η z^{1 / 4}) + \cos (4 η z^{1 / 4})}{\cosh (4 η) + \cos (4 η)} .

Hence

\begin{matrix} E (X) = η \frac{\sinh (4 η) - \sin (4 η)}{\cosh (4 η) + \cos (4 η)}, \\ var (X) = E (X) / 4 + \frac{2 η^{2} \sinh (4 η) \sin (4 η)}{{(\cosh (4 η) + \cos (4 η))}^{2}} . \end{matrix}

For large η, the variance is a quarter of the mean. It undergoes decaying oscillations as η increases.

Figure 3 shows the three distributions introduced so far and the Poisson distribution.

Figure 3.

The Poisson pmf (impulses), RZP02, RZP13 and RZP04 distributions, all for η = 2.5.

2.5 A very underdispersed distribution: RZP08

We give without proof the fact that Q_x is a pmf, where

Q_{x} = \frac{{(8 η)}^{x} / (8 x)!}{(\cosh (8 η / \sqrt{2}) \cos (8 η / \sqrt{2}) / 2 + (\cosh (8 η) + \cos (8 η)) / 2) / 4} .

Similarly to previous cases, the pgf is

G (z) = \frac{\cosh (8 η z^{1 / 8} / \sqrt{2}) \cos (8 η z^{1 / 8} / \sqrt{2}) + (\cosh (8 η z^{1 / 8}) + \cos (8 η z^{1 / 8})) / 2}{\cosh (8 η / \sqrt{2}) \cos (8 η / \sqrt{2}) + (\cosh (8 η) + \cos (8 η)) / 2) / 2} .

and the mean is therefore

\begin{array}{l} μ = η \sinh (8 η / \sqrt{2}) \cos (8 η / \sqrt{2}) - \cosh (8 η / \sqrt{2} \sin (8 η / \sqrt{2}) + \\ (\sinh (8 η) - \sin (8 η)) / 2 / {\cosh (8 η / \sqrt{2}) \cos (8 η / \sqrt{2}) \\ + (\cosh (8 η) + \cos (8 η)) / 2) / 2} . \end{array}

As $η \to \infty, μ \to η,$ and $var (X) \to E (X) / 8$ . The mean and coefficient of dispersion are shown in Figures 1 and 2. There is little point going to distributions even more underdispersed than this, because if η is small, say 2, the variance would be 1/4, so this distribution is already narrow enough to increase or decrease a single probability when mixed with the Poisson distribution.

2.6 The RZP13 distribution: A distribution from the sinh function

We have that

\sinh (2 η) = \sum_{n = 0}^{\infty} {(2 η)}^{2 n + 1} / (2 n + 1)!,

hence $Q_{x} = \{{(2 η)}^{2 x + 1} / (2 x + 1)!\} cosech (2 η)$ is a probability mass. The pgf is given by

G (z) = z^{- 1 / 2} \sinh (2 η z^{1 / 2}) / \sinh (2 η) .

It follows that

\begin{matrix} E (X) = η \coth (2 η) - 1 / 2, \\ var (X) = η \coth (2 η) / 2 - η^{2} c o s e c h (2 η) . \end{matrix}

As $η \to \infty, E (X) \to η - 1 / 2$ and $var (X) \to η / 2$ . This is therefore also an underdispersed distribution.

3 Asymptotics

As $η \to \infty$ hese distributions become normal, as does the Poisson. To see this, in general, from appendix A, for the RZP0m distribution, the mgf $M (t) \to \exp (η (\exp (t / m) - 1)$ . Expanding the exponential,

M (t) = \exp (η t + η t^{2} / (2 m) + η t^{3} / (6 m^{2}) + η t^{4} / (24 m^{3}) + \dots) .

(3.1)

For large η, only infinitesimal values of t are relevant, so we have the mgf of a normal distribution with mean η and variance η/m. These distributions approach normality faster than does the Poisson. For example, the skewness is initially greater than for the Poisson distribution, but it soon becomes smaller. It can also execute damped oscillations, as can the kurtosis. The presence of cosines in Z(η) alerts us to this possibility.

From (3.1), we can see that $E \{{(X - η)}^{3}\} = η / m^{2}, E \{{(X - η)}^{4}\} = \frac{(3 m + 1) η^{2}}{m^{3}}$ , hence the skewness $γ = 1 / \sqrt{m η}$ , and (excess) kurtosis $κ = 1 / (m η)$ . This compares to $1 / \sqrt{η}$ and $1 / η$ respectively for the Poisson distribution. This again shows how these distributions approach normality faster than the Poisson.

4 Fitting datasets

4.1 Estimating parameters of mixture distributions

The parameters to be estimated are η, ϕ and any covariate regression coefficients β_i modelled as $η = η_{0} \exp (\sum_{i = 1}^{m} β_{i} x_{i})$ . This can be done through likelihood-based methods, such as maximum likelihood estimation (MLE). Because the two distributions in the mixture are different, there is no problem of identifiability. However, for the 3-parameter distributions, when $ϕ \to 0$ or $ϕ \to 1, η^{'}$ or η respectively are undefined.

When ϕ < 0 for a negative mixture, the function minimizer might move to a large negative value of ϕ that would make probabilities negative. One can either find the bound on ϕ by requiring $(1 - ϕ) P_{x} + ϕ Q_{x} > 0 \forall x$ , or simply reset negative probabilities to tiny positive probabilities. This will impose a penalty function that will drive the minimizer away from the impossible region of ϕ.

The meaning of the model parameters is that η is a measure of location closely related to the mean, and ϕ interpolates between variances, as in (1.2). Hence ϕ can be taken as a measure of dispersion.

It is also possible to model the dependence of variance on covariates by using the mixing parameter ϕ as a proxy for variance, for example, for the RZP04 distribution, $σ^{2} = (1 - 3 ϕ / 4) η$ . This modelling is sometimes required (e.g., Faddy and Smith, 2011).

4.2 Examples

4.2.1 Datasets and models used

In general, probability models will not fit every dataset well, and their usefulness must be evaluated over a range of datasets. Eight datasets were therefore used, with a spread of sample sizes and from a variety of subject areas, such as healthcare, biology, economics, biometrics and sport. A major purpose of fitting models is to estimate the effect of covariates, and 3 of the datasets had covariates.

The first dataset is the completed fertility dataset from the second (1985) wave of the German Socio-Economic Panel, described in Winkelmann (1995). It contains the number of children (0–11) and 10 demographic covariates for 1243 women. The count distribution is slightly underdispersed, and becomes more so after regressing on the covariates. Ridout and Besboas (2004) cite data on the number of outbreaks of strikes in the UK coal mining industry in successive four-week periods in the years 1948–1959, originally given by Kendall. Faddy and Bosch (2001) give (mainly) underdispersed data for number of foetal implants in mice. Out of implant distributions for 7 doses of a herbicide, we used three, the zeroth, first and final (sixth) dose levels. Some new data on yellow cards ‘awarded’ during Premier League football matches was used. Data for away teams was used here (Table 1). Data on takeover bids from Cameron and Trivedi (2013) was used; without covariates the dataset is slightly overdispersed, but with covariates it is underdispersed. Finally, a large dataset of doctor visits with covariates was used. The dataset was first used in Adesina et al. (2019). This dataset only has entries when there was at least one visit, and hence the model probabilities must be zero-truncated, so that $P_{k} \to P_{k} / (1 - P_{0})$ for k > 0. The dataset is underdispersed when the covariates are included.

Table 1.

The away-team yellow card count data.

Number	Probability	Count
0	0.1074	1453
1	0.2423	3279
2	0.2812	3806
3	0.2102	2845
4	0.1037	1404
5	0.0382	517
6	0.0126	170
7	0.0036	49
8	0.0007	9
9	0.0001	2

Table 2 shows the characteristics of the datasets, and Table 3 shows the results of fitting 7 models: The Poisson distribution, the COM distribution, the weighted Poisson distribution, the RZP02, RZP13, RZP04 distributions, and the latter used as a 3-parameter distribution, by allowing η^′ to vary. All distributions except the Poisson have 2 parameters, except for the weighted Poisson distribution and the new distributions with variable η^′.

Table 2.

Datasets used, with sample size, mean, variance and coefficient of dispersion.

Dataset	Subject area	Sample size	Mean	Variance	Coeff σ²/μ
Fertility	Healthcare/economics	1243	2.38	2.33	0.98
Strikes	Economics	156	1.0	0.77	0.77
Faddy-0	Biology	698	11.55	9.83	0.85
Faddy-1	Biology	307	11.86	12.10	1.02
Faddy-6	Biology	83	11.05	9.15	0.83
Yellow cards	Sport	13,584	2.15	1.94	0.90
Takeover Bids	Economics	126	1.74	2.05	1.18
Doctor Visits	Healthcare	1647	3.39	1.60	3.42

Table 3.

Fits of models to datasets: minus log-likelihood, mixing parameter ϕ and degrees of freedom and chi-squared of fit. The model ‘RZP02 + η^′’ is the RZP02 distribution with η^′ fitted, that is, a 3-parameter model, as is the weighted Poisson distribution. The lowest-AIC fit is marked with an asterisk.

Dataset	Model	−ℓ	ϕ	Ndf	X2
Fertility	Poisson	2186.8	—	7	117.4
	COM	2182.6	—	6	90.3
	weighted	2175.0	—	5	78.4
	RZP02 mixture	2159.6	0.43	6	51.0
	RZP13 mixture	2165.8	0.49	6	67.0
	RZP04 mixture	2155.0	0.30	6	36.5
	RZP02 + η^′ mixture*	2146.6	0.73	6	53.9
Strikes	Poisson	49.87	—	1	2.34
	COM	49.35	—	1	0.97
	weighted	49.01	—	1	0.38
	RZP02 mixture	48.94	0.56	1	0.32
	RZP13 mixture	49.25	0.71	1	0.89
	RZP04 mixture*	48.81	0.26	1	0.11
Faddy-0	Poisson	1837.8	—	16	193.5
	COM	1837.8	—	16	194.2
	weighted	1837.8	—	15	193.6
	RZP02 mixture	1827.4	0.42	14	151.4
	RZP13 mixture	1828.2	0.42	14	150.2
	RZP04 mixture	1812.1	0.39	14	128.7
	RZP02 + η^′ mixture*	1736.4	0.03	16	26.6
Faddy-1	Poisson	850.5	—	14	178.8
	COM	846.3	—	13	127.9
	weighted	850.4	—	13	128.7
	RZP02 mixture	842.5	−1.8	13	127.9
	RZP13 mixture	841.4	−1.99	14	129.2
	RZP04 mixture	845.0	0.30	12	103.7
	RZP02 + η^′ mixture*	787.4	0.90	15	29.9
Faddy-6	Poisson	215.2	—	9	21.6
	COM	215.1	—	8	19.6
	weighted	215.2	—	8	21.6
	RZP02 mixture	214.1	0.41	7	13.8
	RZP13 mixture	214.2	0.41	7	14.0
	RZP04 mixture	211.9	0.42	7	11.9
	RZP02 + η^′ mixture*	201.4	0.91	7	7.6
Yellows	Poisson	23,168	—	9	74.7
	COM	23,138	—	7	18.0
	weighted	23,135	—	6	11.4
	RZP02 mixture	23,153	0.12	8	45.2
	RZP13 mixture	23,147	0.20	8	35.4
	RZP04 mixture	23,147	0.08	8	35.4
	RZP02 + η^′ mixture*	23,131	0.2	6	2.2
Bids	Poisson	201.6	—	4	27.08
	COM	201.4	—	4	40.03
	weighted	194.9	—	2	7.27
	RZP02 mixture	194.8	0.61	3	10.9
	RZP13 mixture	197.7	0.61	3	17.3
	RZP04 mixture	196.3	0.43	3	10.3
	RZP02 + η^′ mixture*	185.4	0.90	2	4.16

4.2.2 Summary of results

First, note that the X² given for the datasets without covariates can be erratic, and a better measure of fit is minus the log-likelihood, which can be converted to the Akaike Information Criterion (to correct for number of parameters fitted) by multiplying by two and adding twice the number of parameters. Minimum-AIC is arguably the better measure of model fit.

Underdispersed models always improve the fit, sometimes greatly. Usually, the new models outperform the weighted Poisson and COM models; for the bids dataset with covariates, the weighted Poisson outperformed the RZP02 + η^′ mixture model by a small margin.

Our aim was to compare some of the new models against benchmarks and each other. The conclusion is that they do (generally) outperform the benchmarks. Among the new models, the 2-parameter RZP(02) model seems best, together with its 3-parameter form. For modelling the spikes that sometimes occur, RZP(04) and even RZP(08) models can be worth fitting. However, the RZP(13) and related models in general do not fit well, and can be disregarded.

4.2.3 Results in more detail

It can be seen from the X² values that the Poisson distribution usually gives a very poor fit. The fit is also poor for the strike data, where the small sample size means that better-fitting models do not have lower AIC. The COM distribution performs better, except for the first Faddy dataset where the fit is no better. The RZP02 distribution only performs worse than the COM model for the yellow-card data, otherwise it fits better. On average, the RZP02 distribution fits better than the RZP13, so it seems that distributions of the RZP(0m) type are more promising. Overall, the fits obtained were often still not good, but can be further improved using 3-parameter models, which allow flexible behaviour. An example of the use of the more flexible 3-parameter distributions is shown in Figure 4, the fit to the Faddy-6 dataset. Here the 3-parameter RZP02 mixture model allows a bump near 2 to be modelled, while the low-variance distribution models the main peak.

Figure 4.

The Faddy-6 dataset (impulses) showing the Poisson fit, and the fit of the 3-parameter RZP02 distribution. The fit of the latter has X2[7] = 7.56, that is, an acceptable fit.

Note the fit to the Faddy-1 dataset, which is slightly overdispersed, and where the RZP02 distribution fits with a negative value of ϕ. This shows the usefulness of a class of distributions that can also fit slightly overdispersed data. Similarly, the fit to the fertility dataset remains poor under all the models in Table 3. However, fitting the RZP08 mixture model gives a (minus) log-likelihood of 2133.5, and X²[5] = 8.3, p = .14. This is a good fit, obtained with $\hat{η} = 2.56 \pm 0.066, \hat{ϕ} = 0.267 \pm 0.028$ , and ${\hat{η}}^{'} = 1.863 \pm 0.076$ . What has happened here is that the very underdispersed RZP08 distribution has increased the peak size of the fertility at around 2 children, which is evidently the preferred number.

Often the purpose of model-fitting is to estimate regression coefficients of covariates. Here $η = η_{0} \exp (\sum_{i = 1}^{m} β_{m} x_{m})$ , where the x_m are covariates. There are 10 covariates of fertility in the Winkelmann (1995) dataset. Table 4 shows the log-likelihoods for fitted models, and again this class of models outperforms the COM model. The latter is now also computationally expensive, because all probabilities must be evaluated for each pattern of covariates.

Table 4.

Minus log-likelihoods for various fitted models for the fertility dataset with 10 covariates fitted, the bids dataset with 3 covariates fitted, and the doctor visits dataset with 5 covariates fitted. The last model and the weighted Poisson model are 3-parameter models. The lowest-AIC model is marked with an asterisk.

Dataset	Model	−ℓ
Fertility	Poisson	2101.8
	COM	2077.9
	weighted	2074.12
	RZP02 mixture	2055.3
	RZP13 mixture	2063.2
	RZP04 mixture	2045.5
	RZP08 mixture	2044.4
	RZP04 mixture +η^′	2054.0
	RZP08 mixture +η^′*	2041.9
Bids	Poisson	188.31
	COM	185.20
	weighted	172.84
	RZP02 mixture	180.52
	RZP13 mixture	184.93
	RZP04 mixture	180.53
	RZP02 mixture+η^′	176.29
Doctor visits	Poisson	2630.54
	COM	2595.14
	weighted*	2582.03
	RZP02 mixture	2523.33
	RZP13 mixture	2506.35
	RZP04 mixture*	2409.36
	RZP02 mixture+η^′	2458.53

Table 5 shows parameter values and standard errors under the Poisson model, and Table 6 shows the fitted coefficients from the RZP04 mixture model. This compares closely with the Poisson model fit, except that nearly all covariates are slightly more significant. Crucially, rural location increases the number of children born, and this is statistically significant at p = .049, whereas under the Poisson model p = .12. This shows the benefit of good modelling: residual error is the yardstick by which we measure effects, and this is reduced in a well-fitting model.

Table 5.

Fitted values of covariates with standard errors, z scores and p values for the fertility dataset under the Poisson model.

Parameter	Value	SE	z	p value
η	2.323	0.044	—
German	−0.200	0.072	−2.78	.005
Years schooling	0.034	0.032	1.03	.302
Vocational training	−0.153	0.044	−3.58	.0005
University	−0.155	0.159	−0.98	.329
Catholic	0.218	0.071	3.08	.002
Protestant	0.113	0.076	1.49	.137
Muslim	0.548	0.085	6.43	<.0001
Rural	0.059	0.038	1.55	.121
Year of birth	0.002	0.002	1.01	.310
Age at marriage	−0.030	0.007	−4.68	<.0001

Table 6.

Fitted values of covariates with standard errors, z scores and p values for the fertility dataset under the RZP04 mixture model.

Parameter	Value	SE	z	p value
η	2.297	0.034	—	—
Mixing parameter phi	0.400	—	0.036	—
German	−0.212	0.058	−3.68	.0002
Years schooling	0.034	0.025	1.35	.177
Vocational training	−0.120	0.035	−3.42	.001
University	−0.138	0.124	−1.12	.263
Catholic	0.189	0.054	3.51	.0005
Protestant	0.087	0.058	1.48	.138
Muslim	0.553	0.067	8.23	<.0001
Rural	0.060	0.031	1.97	.049
Year of birth	0.001	0.002	0.57	.571
Age at marriage	−0.025	0.005	−4.86	<.0001

Table 5 also shows fits to the bids and doctor visits datasets with covariates. The weighted Poisson distribution does best for the bids dataset, fitting slightly better than the RZP02 + η^′ mixture distribution, while for the doctor visits dataset, the RZP02 + η^′ mixture distribution fits much better than the weighted Poisson and COM distributions. From this study of diverse datasets, the best-fitting model is the RZP04 mixture model, which can cope with strong underdispersion, whilst the RZP02 mixture model also does well and is simpler. When the fit is still poor, one can move to the 3-parameter model.

4.3 Implementation of the models: Computation and feasibility

As this is the first work on these new distributions, the computations were prototyped in Fortran-90, using the NAG compiler and numerical subroutines library. This will not be the platform of choice for many nowadays, but the computations are simple for those wishing to recode into Python or R. For example, the useful RZP02 model with 2 or 3 parameters has pmf specified by (1.1) and (2.1). For many datasets, a naïve computation of the log-likelihood can be done directly. Using accurate starting values for η and ϕ did not seem to be important, but one can start η off as the sample mean, and ϕ = 0, that is, from the Poisson distribution.

When counts can be large, the naïve computation can cause numerical overflow, as both (2η)² ⁿ and (2n)! may be very large. In fact, this problem only occurred with the RZP04 model where we have (4η)⁴ ⁿ and 4n!. A quick solution is to compute the logarithm of the probability Q_n and then exponentiate it. Numerical problems were then only found with the weighted Poisson distribution, where we can have $a \to \infty$ or $r \to \infty$ .

The parameter ϕ < 1 and is positive for underdispersion. An easy way to impose this bound is to have the function minimizer work with the logit of ϕ and to back-transform it before computing Q_n.

5 Conclusions

A new class of underdispersed discrete distributions has been introduced that, when mixed with the Poisson distribution, can be underdispersed or overdispersed. Compared with current models such as the COM-Poisson and weighted Poisson distributions, they offer three practical advantages. First, probabilities needed for likelihood-based estimation can be simply computed, without the need to compute all probabilities in order to find the normalizing factor. Second, the moments can be found analytically. This means that the dependence of the mean on covariates is easily studied. Third, random numbers can be readily generated in several ways, given that one can generate Poisson-distributed random variables.

Note that the moments are more complicated than for the Poisson; as the moment generating function is simple, an algebra package can help with any laborious algebra.

Fits to underdispersed datasets, and one overdispersed dataset compare favourably with the COM and weighted Poisson distributions. Three datasets were also fitted with covariates, and it was possible to see how a better-fitting model enabled sharper inference, so that more conclusions could be drawn from the data.

From the data-fitting, the two most useful distributions emerged as the RZP02 and RZP04 distributions. The former is simpler and can model most data encountered, while the latter can model severe underdispersion down to coefficient of dispersion 1/4. Also, the RZP0m distributions show asymptotic behaviour earlier than others, which means that the Poisson and underdispersed components of 2-parameter mxtures will have almost the same mean. What is proposed is that the RZP02 distribution be used routinely for fitting underdispersed data, while the RZP04 and RZP08 distributions could be useful for more detailed modelling. This can give a good fit where distributions have a sharp peak; for example, the fertility dataset shows a peak at 2 children. Unusually, they can fit bimodal data.

As these distributions are straightforward to fit to data and fit well, it is hoped that practitioners will find them useful.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author received no financial support for the research, authorship and/or publication of this article.

Supplemental material

Appendices

References

Adesina

, Agunblade

, Oguntunde

and Adesina

(2019) Bayesian models for zero truncated data. Asian Journal of Probability and Statistics , 4, 1–12.

Ben Mzoughia

, Borle

and Limam

(2018) MCMC approach for modeling customer lifetime behavior using the COM-Poisson distribution. Applied Stochastic Models in Business and Industry , 34, 113–27.

Bermúdez

, Karlis

and Santolino

(2017) A finite mixture of multiple discrete distributions for modelling heaped count data. Computational Statistics and Data Analysis , 112, 14–23.

Boshnakov

, Kharrat

and McHale

(2017) A bivariate Weibull count model for forecasting association football scores. International Journal of Forecasting , 33, 458–66.

Cameron

and Trivedi

(2013) Regression Analysis of Count Data, 2nd edition . Cambridge: Cambridge University Press.

Castillo

and Pérez-Casany

(2005) Overdis- persed and underdispersed Poisson general- izations. Journal of Statistical Planning and Inference , 134, 486–500.

Faddy

and Bosch

(2001) Likelihood-based modeling and analysis of data underdispersed relative to the Poisson distribution. Biometrics , 57, 620–24.

Faddy

and Smith

(2011) Analysis of count data with covariate dependence in both mean and variance. Journal of Applied Statistics , 38, 2683–94.

Hilbe

(2011) Negative Binomial Regression, 2nd edition . Cambridge: Cambridge University Press.

10.

McShane

, Adrian

, Bradlow

and Fader

(2008) Count models based on Weibull interarrival times. Journal of Business and Economic Statistics , 26, 369–78.

11.

Ridout

and Besbeas

(2004) An empirical model for underdispersed count data. Statis- tical Modelling , 4, 77–89.

12.

Shmueli

, Minka

, Kadane

, Borle

and Boatwright

(2005) A useful distribu- tion for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. Journal of the Royal Statistical Society: Series C (AppliedStatistics) , 54, 127–42.

13.

Winkelmann

(1995) Duration dependence and dispersion in count-data models. Journal of Business & Economic Statistics , 13, 467–74.

14.

Winkelmann

(1996) A count data model for gamma waiting times. Statistical Papers , 37, 177–87.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.22 MB