Abstract
A class of new 1-parameter underdispersed distributions is introduced. Mixed with Poisson distributions; they generate 2- and 3-parameter discrete distributions that generalize the Poisson distribution and can be both under and over-dispersed. Probabilities are easy to compute and moments and random number generation are tractable. The distributions are described, and they are fitted to some underdispersed and overdispersed datasets. We show how inference for the effect of covariates sharpens on moving from the Poisson model. The fits compare favourably to two benchmarks, the COM Poisson distribution and the weighted Poisson distribution.
Introduction
Previous work on underdispersed distributions
The Poisson distribution is the distribution for modelling count data, and often gives an approximate fit. Its variance equals its mean, and so departures from Poissonness can be categorized as overdispersion (variance exceeds mean) or underdispersion (variance less than mean). Here we propose some new models for non-equidispersed data.
There is some probabilistic basis for modelling overdispersed distributions with the negative binomial model (e.g., Hilbe, 2011). There one assumes that the Poisson mean itself is drawn from a gamma distribution.
For underdispersion, with which we are mainly concerned, there is no convincing probabilistic model. One could consider an extended Poisson process or Markov birth process, where the rate of event occurrence scales up or down by a factor γ after each event. This would yield under or overdispersed models, but the probabilities are very complicated.
Instead of using probabilistic models, practitioners therefore rely on a host of ad-hoc models that could fit the data, such as the Conway-Maxwell distribution (COM); see, for example, Shmueli et al. (2005). Here the Poisson probability
Another class of models are gamma count models, for example, Winkelmann (1996) and Weibull count models (e.g., McShane et al., 2008; Boshnakov et al., 2017). The Poisson process has exponential intervals between events, and these can be replaced with Weibull or gamma-distributed intervals. These models can be thought of either as ad-hoc models or as having some probabilistic basis.
There are many other models for underdispersed data, such as weighted Poissons, for example, Castillo and Pérez-Casany (2005). There are many weighted Poisson distributions, but the one used here as a benchmark has probability mass function (pmf)
where C must be found by summing all probabilities, and a ≥ 0, r ≥ 0.
It is currently the case that there are no models that are tractable, for example, for likelihood computation, and which can model considerable underdispersion.
Overview of the new models
This is a summary to preface the more technical content below.
By mixing the Poisson distribution with one of a class of underdispersed distributions, one can obtain distributions that generalize the Poisson distribution and are underdispersed or mildly overdispersed. They can have 2 or 3 parameters, and are tractable in that probabilities are simple to write down, so that fitting to data is computationally fast. Also, their moments can be derived in terms of the model parameters. The mean in particular is needed, as we often wish to see how it depends on covariates.
Random numbers are easily generated. These are needed, for example, for Monte-Carlo simulations and some data analyses. This tractability compares favourably with our two benchmarks, the COM-Poisson and the weighted Poisson distributions; for these distributions all probabilities must be summed numerically to obtain the normalizing constant and moments.
However, the crucial measure of usefulness is whether or not the new distributions can fit data. Economists, healthcare workers and others need to do this primarily to evaluate the effect of covariates, for example, does fertility (number of offspring) vary with educational level?
Underdispersed datasets are not plentiful, but we studied eight, three with covariates, and taken from a range of subject areas. Overall, the new distributions performed well, and it is possible to recommend one in particular that would be a good starting point for fitting underdispersed data.
It has been mentioned that existing underdispersed distributions sometimes have a probabilistic basis, but the most useful ones such as COM-Poisson and weighted Poisson, do not. The distributions proposed here are also functional forms designed to be underdispersed, but with no strong probabilistic basis.
This article aims to introduce the new distributions and give their properties, and to demonstrate their usefulness with some fits to data.
Usage of the proposed new models
A class of 1-parameter underdispersed distributions is introduced, and the model comprises a mixture of these with the Poisson distribution. Admixture with the Poisson distribution is needed so that any amount of underdispersion can be modelled. The models have two parameters: a parameter η linked closely to the mean, and the mixing parameter ϕ. They can be made more flexible by assigning the location parameter η to the Poisson component of the mixture, and η′ to the underdispersed component, giving 3-parameter models. This extra parameter enables even bimodality to be modelled.
The mixture distribution has pmf
where P denotes the Poisson probability, and Q the corresponding probability from the underdispersed distribution. The variance is then
where
Negative mixture distributions do not have a probabilistic basis in the way that positive mixtures do, but are sometimes used as models. Thus the hypoexponential distribution with pdf
that generalizes the Erlang(2) distribution can be regarded as a negative mixture of exponentials.
With negative mixtures there is the danger that the probabilities Sx could become negative. If the non-Poisson distribution were longer-tailed than the Poisson, this would happen for any negative value of ϕ. However, when the other distribution is underdispersed, there is no such tail problem, and positive probabilities can exist for small negative values of ϕ. Hence the models proposed here can always be overdispersed. They were not particularly intended for this, because the negative binomial distribution usually fits well, has a probabilistic motivation, and so is preferable. However, sometimes one of a series of underdispersed datasets may turn out to be slightly overdispersed, and then the ability to model overdispersion is useful. Note that we must also have ϕ ≤ 1 to avoid negative probabilities.
The class of underdispersed probability models used is introduced next, then their main properties are given, followed by some fits of the mixed models to data. For readability, many technical details are relegated to appendices. All datasets are in the public domain and are referenced, except for the ‘yellow card’ dataset, which is given in this article. They are available in the supplementary material for this article.
The simplest example: A distribution from the cosh function
The Poisson pmf
Here only the even powers of η remain. The terms are now relabelled to be sequential probabilities, and so from the cosh expansion,
is a pmf. The scaling of η is done so that η will be the mean when it is large. This relabelling is the opposite approach from that of Bermúdez et al. (2017), who used Poisson probabilities to represent heaped count data, where, for example, all counts must be even.
The probability generating function (pgf)
Moments are generated most easily by differentiating the moment generating function (mgf)
As
The parameter ϕ is the mixing parameter; when ϕ = 0 we have a Poisson distribution and when ϕ = 1 we have one of the new underdispersed distributions. negative ϕ corresponds to overdispersion. In the 3-parameter case, we have η, the mean of the Poisson distribution, and η′, the underdispersed distribution parameter. As mentioned, when η′ is large it tends to the mean, so it is clearly close to the mean. One could take the mean of the underdispersed distribution as the parameter, but then it would be necessary to solve A practical point is that when fitting datasets with covariates we usually make η and η′ proportional to
Economists and clinicians often want to know how the mean response μ depends on the covariates. For simplicity, take one covariate. The mean of the mixture distribution is
where η takes the value η0 when X = 0. Differentiating this, when X = 0,
Thus the change in mean when X varies can be approximated. This calculation is messy but is more difficult for the benchmark distributions.
General properties of these distributions
In general, one can analytically sum series where m ≥ 0 terms are zero at regular intervals, giving a class of distributions. These must be named; a possible naming scheme is to give the numbers of the first and second nonzero terms. Thus the distribution derived from the cosh function would be RZP02, that is, ‘relabelled zeroed Poisson with terms 0, 2, 4, …’. The Poisson distribution itself would be RZP01.
In general for RZP0m distributions, we can write
A way to find Z to normalize these probabilities and to find the other distributions RZPj, m + j for 0 < j < m from them is given in appendix A. Random number generation for these distributions and also for mixture distributions is described in appendix B.
For RZP0m distributions, the means approach η quickly and monotonically (Figure 1), while the variances and coefficients of dispersion also quickly approach their asymptotic value, but may oscillate (Figure 2). That the means are monotonic can be proved, but the proof is omitted for brevity. It proceeds by differentiating (2.2), whence
The means of these distributions as a function of η; note that the slopes of the means can oscillate but the means are monotonic functions of η.
The coefficients of dispersion σ2/μ of these distributions as a function of the mean μ.
Here the use of evaluating these exponential series with only every mth term present has been to construct underdispersed distributions. However, they can also be used, without relabelling probabilities, to model spiky or ‘heaped count’ data. Bermúdez et al. (2017) model the frequencies of work disability days, which are often rounded to 5 or 7 days, and a mixture of the distributions here could also be used for that.
Some other useful models are now described.
An even more underdispersed distribution can be derived from the expansion
so tha
Hence
For large η, the variance is a quarter of the mean. It undergoes decaying oscillations as η increases.
Figure 3 shows the three distributions introduced so far and the Poisson distribution.
The Poisson pmf (impulses), RZP02, RZP13 and RZP04 distributions, all for η = 2.5.
The Poisson pmf (impulses), RZP02, RZP13 and RZP04 distributions, all for η = 2.5.
We give without proof the fact that Qx is a pmf, where
Similarly to previous cases, the pgf is
and the mean is therefore
As
The RZP13 distribution: A distribution from the sinh function
We have that
hence
It follows that
As
Asymptotics
As
For large η, only infinitesimal values of t are relevant, so we have the mgf of a normal distribution with mean η and variance η/m. These distributions approach normality faster than does the Poisson. For example, the skewness is initially greater than for the Poisson distribution, but it soon becomes smaller. It can also execute damped oscillations, as can the kurtosis. The presence of cosines in Z(η) alerts us to this possibility.
From (3.1), we can see that
Estimating parameters of mixture distributions
The parameters to be estimated are η, ϕ and any covariate regression coefficients βi modelled as
When ϕ < 0 for a negative mixture, the function minimizer might move to a large negative value of ϕ that would make probabilities negative. One can either find the bound on ϕ by requiring
The meaning of the model parameters is that η is a measure of location closely related to the mean, and ϕ interpolates between variances, as in (1.2). Hence ϕ can be taken as a measure of dispersion.
It is also possible to model the dependence of variance on covariates by using the mixing parameter ϕ as a proxy for variance, for example, for the RZP04 distribution,
Examples
Datasets and models used
In general, probability models will not fit every dataset well, and their usefulness must be evaluated over a range of datasets. Eight datasets were therefore used, with a spread of sample sizes and from a variety of subject areas, such as healthcare, biology, economics, biometrics and sport. A major purpose of fitting models is to estimate the effect of covariates, and 3 of the datasets had covariates.
The first dataset is the completed fertility dataset from the second (1985) wave of the German Socio-Economic Panel, described in Winkelmann (1995). It contains the number of children (0–11) and 10 demographic covariates for 1243 women. The count distribution is slightly underdispersed, and becomes more so after regressing on the covariates. Ridout and Besboas (2004) cite data on the number of outbreaks of strikes in the UK coal mining industry in successive four-week periods in the years 1948–1959, originally given by Kendall. Faddy and Bosch (2001) give (mainly) underdispersed data for number of foetal implants in mice. Out of implant distributions for 7 doses of a herbicide, we used three, the zeroth, first and final (sixth) dose levels. Some new data on yellow cards ‘awarded’ during Premier League football matches was used. Data for away teams was used here (Table 1). Data on takeover bids from Cameron and Trivedi (2013) was used; without covariates the dataset is slightly overdispersed, but with covariates it is underdispersed. Finally, a large dataset of doctor visits with covariates was used. The dataset was first used in Adesina et al. (2019). This dataset only has entries when there was at least one visit, and hence the model probabilities must be zero-truncated, so that
The away-team yellow card count data.
The away-team yellow card count data.
Table 2 shows the characteristics of the datasets, and Table 3 shows the results of fitting 7 models: The Poisson distribution, the COM distribution, the weighted Poisson distribution, the RZP02, RZP13, RZP04 distributions, and the latter used as a 3-parameter distribution, by allowing η′ to vary. All distributions except the Poisson have 2 parameters, except for the weighted Poisson distribution and the new distributions with variable η′.
Datasets used, with sample size, mean, variance and coefficient of dispersion.
Fits of models to datasets: minus log-likelihood, mixing parameter ϕ and degrees of freedom and chi-squared of fit. The model ‘RZP02 + η′’ is the RZP02 distribution with η′ fitted, that is, a 3-parameter model, as is the weighted Poisson distribution. The lowest-AIC fit is marked with an asterisk.
First, note that the X2 given for the datasets without covariates can be erratic, and a better measure of fit is minus the log-likelihood, which can be converted to the Akaike Information Criterion (to correct for number of parameters fitted) by multiplying by two and adding twice the number of parameters. Minimum-AIC is arguably the better measure of model fit.
Underdispersed models always improve the fit, sometimes greatly. Usually, the new models outperform the weighted Poisson and COM models; for the bids dataset with covariates, the weighted Poisson outperformed the RZP02 + η′ mixture model by a small margin.
Our aim was to compare some of the new models against benchmarks and each other. The conclusion is that they do (generally) outperform the benchmarks. Among the new models, the 2-parameter RZP(02) model seems best, together with its 3-parameter form. For modelling the spikes that sometimes occur, RZP(04) and even RZP(08) models can be worth fitting. However, the RZP(13) and related models in general do not fit well, and can be disregarded.
Results in more detail
It can be seen from the X2 values that the Poisson distribution usually gives a very poor fit. The fit is also poor for the strike data, where the small sample size means that better-fitting models do not have lower AIC. The COM distribution performs better, except for the first Faddy dataset where the fit is no better. The RZP02 distribution only performs worse than the COM model for the yellow-card data, otherwise it fits better. On average, the RZP02 distribution fits better than the RZP13, so it seems that distributions of the RZP(0m) type are more promising. Overall, the fits obtained were often still not good, but can be further improved using 3-parameter models, which allow flexible behaviour. An example of the use of the more flexible 3-parameter distributions is shown in Figure 4, the fit to the Faddy-6 dataset. Here the 3-parameter RZP02 mixture model allows a bump near 2 to be modelled, while the low-variance distribution models the main peak.
The Faddy-6 dataset (impulses) showing the Poisson fit, and the fit of the 3-parameter RZP02 distribution. The fit of the latter has X2[7] = 7.56, that is, an acceptable fit.
The Faddy-6 dataset (impulses) showing the Poisson fit, and the fit of the 3-parameter RZP02 distribution. The fit of the latter has X2[7] = 7.56, that is, an acceptable fit.
Note the fit to the Faddy-1 dataset, which is slightly overdispersed, and where the RZP02 distribution fits with a negative value of ϕ. This shows the usefulness of a class of distributions that can also fit slightly overdispersed data. Similarly, the fit to the fertility dataset remains poor under all the models in Table 3. However, fitting the RZP08 mixture model gives a (minus) log-likelihood of 2133.5, and X2[5] = 8.3, p = .14. This is a good fit, obtained with
Often the purpose of model-fitting is to estimate regression coefficients of covariates. Here
Minus log-likelihoods for various fitted models for the fertility dataset with 10 covariates fitted, the bids dataset with 3 covariates fitted, and the doctor visits dataset with 5 covariates fitted. The last model and the weighted Poisson model are 3-parameter models. The lowest-AIC model is marked with an asterisk.
Table 5 shows parameter values and standard errors under the Poisson model, and Table 6 shows the fitted coefficients from the RZP04 mixture model. This compares closely with the Poisson model fit, except that nearly all covariates are slightly more significant. Crucially, rural location increases the number of children born, and this is statistically significant at p = .049, whereas under the Poisson model p = .12. This shows the benefit of good modelling: residual error is the yardstick by which we measure effects, and this is reduced in a well-fitting model.
Fitted values of covariates with standard errors, z scores and p values for the fertility dataset under the Poisson model.
Fitted values of covariates with standard errors, z scores and p values for the fertility dataset under the RZP04 mixture model.
Table 5 also shows fits to the bids and doctor visits datasets with covariates. The weighted Poisson distribution does best for the bids dataset, fitting slightly better than the RZP02 + η′ mixture distribution, while for the doctor visits dataset, the RZP02 + η′ mixture distribution fits much better than the weighted Poisson and COM distributions. From this study of diverse datasets, the best-fitting model is the RZP04 mixture model, which can cope with strong underdispersion, whilst the RZP02 mixture model also does well and is simpler. When the fit is still poor, one can move to the 3-parameter model.
As this is the first work on these new distributions, the computations were prototyped in Fortran-90, using the NAG compiler and numerical subroutines library. This will not be the platform of choice for many nowadays, but the computations are simple for those wishing to recode into Python or R. For example, the useful RZP02 model with 2 or 3 parameters has pmf specified by (1.1) and (2.1). For many datasets, a naïve computation of the log-likelihood can be done directly. Using accurate starting values for η and ϕ did not seem to be important, but one can start η off as the sample mean, and ϕ = 0, that is, from the Poisson distribution.
When counts can be large, the naïve computation can cause numerical overflow, as both (2η)2
n
and (2n)! may be very large. In fact, this problem only occurred with the RZP04 model where we have (4η)4
n
and 4n!. A quick solution is to compute the logarithm of the probability Qn and then exponentiate it. Numerical problems were then only found with the weighted Poisson distribution, where we can have
The parameter ϕ < 1 and is positive for underdispersion. An easy way to impose this bound is to have the function minimizer work with the logit of ϕ and to back-transform it before computing Qn.
Conclusions
A new class of underdispersed discrete distributions has been introduced that, when mixed with the Poisson distribution, can be underdispersed or overdispersed. Compared with current models such as the COM-Poisson and weighted Poisson distributions, they offer three practical advantages. First, probabilities needed for likelihood-based estimation can be simply computed, without the need to compute all probabilities in order to find the normalizing factor. Second, the moments can be found analytically. This means that the dependence of the mean on covariates is easily studied. Third, random numbers can be readily generated in several ways, given that one can generate Poisson-distributed random variables.
Note that the moments are more complicated than for the Poisson; as the moment generating function is simple, an algebra package can help with any laborious algebra.
Fits to underdispersed datasets, and one overdispersed dataset compare favourably with the COM and weighted Poisson distributions. Three datasets were also fitted with covariates, and it was possible to see how a better-fitting model enabled sharper inference, so that more conclusions could be drawn from the data.
From the data-fitting, the two most useful distributions emerged as the RZP02 and RZP04 distributions. The former is simpler and can model most data encountered, while the latter can model severe underdispersion down to coefficient of dispersion 1/4. Also, the RZP0m distributions show asymptotic behaviour earlier than others, which means that the Poisson and underdispersed components of 2-parameter mxtures will have almost the same mean. What is proposed is that the RZP02 distribution be used routinely for fitting underdispersed data, while the RZP04 and RZP08 distributions could be useful for more detailed modelling. This can give a good fit where distributions have a sharp peak; for example, the fertility dataset shows a peak at 2 children. Unusually, they can fit bimodal data.
As these distributions are straightforward to fit to data and fit well, it is hoped that practitioners will find them useful.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author received no financial support for the research, authorship and/or publication of this article.
Supplemental material
Appendices
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
