We argue that measures of goodness of fit based on the value of the likelihood function should not be used when models are estimated by pseudo maximum likelihood. We illustrate this point by showing that when the dependent variable is not a count, some measures of goodness of fit for Poisson regression routinely reported by Stata commands depend on the scale of the data and are therefore uninformative.
With the impact of Santos Silva and Tenreyro’s (2006) work, Bill Gould’s 2011 post on the Stata Blog,1 and the availability of convenient community-contributed commands such as ppml (Santos Silva and Tenreyro 2011) and ppmlhdfe (Correia, Guimarães, and Zylkin 2020), the Poisson pseudo-maximum-likelihood (PPML) estimator introduced by Gouriéroux, Monfort, and Trognon (1984) gained popularity and has become the standard method to estimate multiplicative models. Santos Silva and Tenreyro (2022) present a brief survey of the use of the PPML estimator in economics and other fields.
In this note, we argue that measures of goodness of fit routinely reported by Stata commands should not be used when models are estimated by pseudo maximum likelihood. To illustrate our point, we consider the behavior of McFadden’s (1974) pseudo-R2 and the behavior of the information criteria proposed by Akaike (1973) and Schwarz (1978)—respectively, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC)—when the model is estimated by PPML and the dependent variable is not a count. We show that, in this case, such measures of goodness of fit depend on the scale of the dependent variable and are therefore uninterpretable.
More broadly, we argue that measures of goodness of fit based on the value of the log-likelihood function are generally invalid unless the assumed likelihood is correctly specified. Hence, such statistics should not be used to assess the adequacy of models estimated by pseudo maximum likelihood. The point we make may seem trivial, but apparently, many researchers are not aware of it and misuse such measures of goodness of fit.
The remainder of this note is organized as follows. Section 2 shows how the PPML objective function behaves when the scale of the data changes and notes that a similar result can be obtained for the inverse Gaussian distribution as used in Stata’s glm command. Section 3 shows how scale dependence affects McFadden’s pseudo-R2, and section 4 shows that both the BIC and the AIC are not immune to changes in scale. Finally, section 5 discusses and summarizes our findings.
PPML and scale
Consider a general multiplicative model in its exponential form for the sample (see, for example, Santos Silva and Tenreyro [2006]):
where is the outcome of interest, is a vector of explanatory variables where the first element is equal to 1, is a vector of associated parameters, and is an error term with .
If is not a count, its scale is arbitrary. In this case, we can rescale the problem by multiplying both sides of (1) by a scalar , leading to
where and for . Thus, changing the scale of the problem affects only the intercept of the model. For any nonnegative integer a, we have that , and so the Poisson (pseudo) log-likelihood function for the two problems can be written as
Equations (2) and (3) show that the values of and are different unless . Because the difference between (2) and (3) is not an additive constant, a likelihood-ratio (LR) test comparing two models with different specifications will depend on s, a result originally obtained by Yang and Hillberry (2023). Indeed, letting denote the log likelihood of the restricted model and writing the LR test statistic as
Therefore, small values of s favor the null hypothesis and vice versa.
The scale dependence of the likelihood function and of the LR stems from the fact that the Poisson distribution has a single parameter and thus is not scale invariant.
In contrast, the likelihood function and the LR are scale invariant for estimators based on two-parameter distributions, such as the normal or gamma. Likewise, the pseudo- maximum-likelihood estimator based on the inverse Gaussian distribution should not be affected by this problem, because the inverse Gaussian distribution is scale invariant. However, Stata’s glm command for the inverse Gaussian family maximizes a likelihood function where the shape parameter is restricted to be 1, and it is easy to show that in this case, the likelihood function is also not scale invariant and
where Λ*(·) denotes the LR based on the inverse Gaussian distribution. The fact that estimation based on the inverse Gaussian distribution is not scale invariant is particularly troubling because the inverse Gaussian distribution is often used to model continuous data for which scale is generally arbitrary.
The following example illustrates this behavior for the cases of the Poisson and inverse Gaussian distributions.
In itself, the fact that the LR depends on the scale of the data is not particularly important, because it is well understood that LR tests are not valid in the context of pseudo-maximum-likelihood estimation (see, for example, White [1982]). Therefore, these tests are not used in this context. Indeed, Stata’s lrtest command cannot even be used if a robust covariance matrix is requested.
However, popular measures of goodness of fit, such as McFadden’s pseudo-R2, the AIC, and the BIC, are implicitly based on LRs. Therefore, it is interesting to study their behavior in the context of the pseudo-maximum-likelihood estimator of exponential models for data that are not counts. Because of its popularity, we will focus on the case of PPML estimation, but similar results can be obtained for models estimated by glm using the inverse Gaussian family.
McFadden’s pseudo-R2
McFadden’s pseudo-R2 is a popular measure of goodness of fit for generalized linear models and is reported by poisson, Stata’s standard command for Poisson regression, as well as by the ppmlhdfe command of Correia, Guimaraes, and Zylkin (2020), which is recommended by Santos Silva and Tenreyro (2022) to estimate models by PPML.
Letting represent the log likelihood of the “constant-only” model, McFadden’s pseudo-R2 is defined as
It follows from (5) and (7) that for McFadden’s pseudo-R2 to be invariant to rescaling, we would need to be proportional to s, which is not the case. Therefore, when the dependent variable is not a count, a model estimated by PPML can be made to have almost any value of simply by changing the scale of the data. This is illustrated by the following example, in which varies between 0.05 and 0.98.
In short, when the dependent variable of a model estimated by PPML does not have a natural scale, the value of McFadden’s pseudo-R2 depends on the scale of the data and therefore is not informative. Some packages, such as FENmlm in R (see Bergé [2018]), report an adjusted pseudo-R2 that is also sensitive to the scale of the dependent variable.
If the researcher wants information about the goodness of fit, an alternative is to compute the R2 as the square of the correlation between the dependent variable and its fitted values, as in ppml (Santos Silva and Tenreyro 2011) and in Bergé’s (2018)FENmlm command in R; this has the advantage of depending only on the estimated conditional expectation, which is the only feature of the conditional distribution identified by a pseudo-maximum-likelihood estimator. See Tjur (2009) for more on the properties of this definition of R2 and for other definitions that also do not vary with the scale of the dependent variable.
Information criteria
The AIC and BIC are often reported by researchers when presenting estimation results obtained by PPML. These information criteria are reported by Stata’s glm command and by Bergé’s (2018)FENmlm command in R. Stata also offers the postestimation command estat ic, which can be used to compute the AIC and the BIC after various estimation commands. (To be precise, the glm command reports the AIC divided by n and calculates the BIC as , where D2 denotes the overall deviance. This does not change the scale-dependence issue. Additionally, when the irls option is used, glm reports the BIC but not the AIC.)
Rooted in information theory, these criteria assess how well a likelihood-based model fits the data while penalizing excessive model complexity. For both the AIC and the BIC, the model with the lowest value is the preferred one. Therefore, models can be ranked according to the value of the chosen information criterion (IC).
However, for models estimated by PPML, the ranking of different models based on the AIC and the BIC can be changed by changing the scale of the data.
To show this, we write a general IC for a Poisson regression as
where p is a penalty term that does not depend on the data. Note that the ranking of models does not change if we use a modified IC that differs from IC only by a constant. Therefore, an equivalent ranking can be based on
where denotes the Poisson (pseudo) log-likelihood function of the constant-only model and is an LR statistic as in (4).
Recalling the results from section 2, (8) makes clear that changing the scale of the problem will change the relative contribution of the penalty to the value of the information criteria. Indeed, the penalty becomes less important as s grows. Therefore, larger models become preferred for large enough values of s. It follows from (6) that the reverse happens if the model is estimated by glm using inverse Gaussian pseudo maximum likelihood. Finally, we note that if the models being compared have the same number of parameters, their order is not affected by the scale of the data, because they have the same penalty.
Our point is illustrated in the following example, where the model favored by the AIC and the BIC depends on the scale of the data.
Conclusions
The main attraction of pseudo-maximum-likelihood estimators is that they provide a very robust method to estimate the conditional expectation of the outcome of interest. However, as the name suggests, estimation is based on a likelihood function that is not assumed to be correctly specified. Therefore, test statistics and measures of goodness of fit based on the value of the maximized pseudolikelihood function are generally uninterpretable. In this note, this problem was illustrated by studying the behavior of standard measures of goodness of fit for models estimated by PPML, which are shown to depend on the scale of the data.
However, the problem with using likelihood-based measures of goodness of fit in this context is deeper than the scale dependence we used as an illustration. Indeed, and more broadly, when performing pseudo-maximum-likelihood estimation, the value of the pseudolikelihood is not a reliable measure of fit because the likelihood is very likely to be misspecified. Therefore, likelihood-based statistics should not be used to evaluate models estimated by pseudo maximum likelihood even if estimation is based on distributions that are scale invariant, such as the normal and gamma distributions. To illustrate this, we note that in Example 3 of section 4 the restricted model is preferred if estimation is performed by gamma pseudo maximum likelihood, but the unrestricted model is preferred if Gaussian pseudo maximum likelihood is used. That is, the preferred model depends on the particular pseudo-maximum-likelihood estimator that is used, even if all of them identify the same set of parameters.
Although we focused on pseudo-R2s and information criteria, the problem extends to other measures of goodness of fit. For example, estimation using poisson allows the use of the postestimation command estat gof, which produces two goodness-of-fit statistics whose values depend on the scale of the data. One is based on the deviance and behaves like the LR. The other is a measure of overdispersion that is also proportional to the scale, because changing the scale of the data by a factor of s changes the mean by s but changes the variance by s2. Therefore, these goodness-of-fit statistics are not meaningful except when the empirical application requires the data to follow a Poisson distribution. In a related contribution, Bosquet and Boulhol (2015) note that the results of the negative binomial estimator are also scale dependent.
In summary, because the validity of the pseudo-maximum-likelihood estimators depends only on the correct specification of the conditional expectation, measures of goodness of fit for models estimated by that method should depend only on this feature of the conditional distribution. It would be useful if, in future versions, Stata changed the way the pseudo-R2 is computed in poisson; changed the likelihood function used for glm with the inverse Gaussian family; and stopped reporting the values of AIC and BIC and disabled the use of commands such as estat gof and estat ic when a robust covariance matrix is used, like what currently happens with lrtest.
Supplemental Material
sj-txt-1-stj-10.1177_1536867X251341411 - Supplemental material for
Supplemental material, sj-txt-1-stj-10.1177_1536867X251341411 for by in The Stata Journal
Footnotes
Acknowledgments
We are grateful to Stephen Jenkins (handling editor), an anonymous referee, and Federico Martellosio for many helpful comments and suggestions. We retain sole responsibility for any errors or omissions.
6
To install the software files as they existed at the time of publication of this note, type
. net sj 25-2
. net get st0778 (to install ancillary files, if available)
About the authors
Nick Green is a student in the economics doctoral program at the University of Surrey. This work is based on Nick’s MRes dissertation at Surrey.
João M. C. Santos Silva is a professor of economics at the University of Surrey. João has worked on a variety of topics, including on models for nonnegative data with a mass point at zero, particularly on the estimation of the gravity equation for trade flows.
References
1.
AkaikeH. 1973. “Information theory and an extension of the maximum likelihood principle”. In Second International Symposium on Information Theory, edited by PetrovB. N.CsákiF., 267–281. Budapest: Akadémiai Kiadó.
2.
BergéL.2018. Efficient estimation of maximum likelihood models with multiple fixed- effects: The R package FENmlm. CREA Discussion Paper Series 2018-13, Center for Research in Economic Analysis, University of Luxembourg. https: //EconPapers. repec.org/RePEc:luc:wpaper:18-13.
3.
BosquetC.BoulholH.. 2015. What is really puzzling about the “distance puzzle”. Review of World Economics151: 1–21. https: // doi.org/10.1007/s10290-014-0201-x.
4.
CorreiaS.GuimarãesP.ZylkinT.. 2020. Fast Poisson estimation with highdimensional fixed effects. Stata Journal20: 95–115. https: // doi.org / 10.1177 / 1536867X20909691.
5.
GouriérouxC.MonfortA.TrognonA.. 1984. Pseudo maximum likelihood methods: Theory. Econometrica52: 681–700. 10.2307/1913471.
6.
McFaddenD. 1974. “Conditional logit analysis of qualitative choice behavior”. In Frontiers in Econometrics, edited by ZarembkaP., 105–142. New York: Academic Press.
7.
SilvaSantosand SJ. M. C.. Tenreyro. 2006. The log of gravity. Review of Economics and Statistics88: 641–658. 10.1162/rest.88.4.641.
8.
2011. poisson: Some convergence issues. Stata Journal11: 207–212. https:
9.
// doi.org/10.1177 / 1536867X1101100203.
10.
2022. The log of gravity at 15. Portuguese Economic Journal21: 423–437.
11.
10.1007/s10258-021-00203-w.
12.
SchwarzG. 1978. Estimating the dimension of a model. Annals of Statistics6: 461–464. https: // doi.org/10.1214/aos /1176344136.
13.
TjurT. 2009. Coefficients of determination in logistic regression models—a new proposal: The coefficient of discrimination. American Statistician63: 366–372. https: // doi.org/10.1198/tast.2009.08210.
14.
WhiteH. 1982. Maximum likelihood estimation of misspecified models. Econometrica50: 1–25. 10.2307/1912526.
15.
YangA.HillberryR.. 2023. Variable scaling and hypothesis testing in the gravity model. https: // doi.org/10.2139 /ssrn.4575424.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.