Sage Journals: Discover world-class research

Abstract

Two-step approaches for synthesizing proportions in a meta-analysis require first transforming the proportions to a scale where their distribution across studies can be approximated by a normal distribution. Commonly used transformations include the log, logit, arcsine, and Freeman-Tukey double-arcsine transformations. Alternatively, a generalized linear mixed model (GLMM) can be fit directly on the data using the exact binomial likelihood. Unlike popular two-step methods, this accounts for uncertainty in the within-study variances without a normal approximation and does not require an ad hoc correction for zero counts. However, GLMMs require choosing a link function; we illustrate how the AIC can be used to choose the best fitting link when different link functions give different results. We also highlight how misspecification of the link function can introduce bias; using an empirical sandwich estimator for the standard error may not sufficiently avoid undercoverage due to link function misspecification. We demonstrate the application of GLMMs and choice of link function using data from a systematic review on the prevalence of fever in children with COVID-19.

Keywords

Generalized linear mixed model meta-analysis proportion prevalence AIC link

What is already known

Generalized linear mixed models (GLMMs) allow for the synthesis of proportions from a meta-analysis using a one-step approach, where the model is fitted directly on the study proportions without first transforming them.

GLMMs incorporate uncertainty in the within-study variances without normal approximation and do not require an ad hoc correction for zero counts.

What is new

Misspecification of the link function can lead to bias and subsequent undercoverage, which cannot necessarily be negated by an empirical sandwich estimator for the variance of the fixed effect.

The probit and logit links tend to lead to similar results, while the cloglog differs; these differences are greater when the prevalence is high.

The AIC can be used to select the best-fitting link function; we demonstrate this process using an applied example about the prevalence of fever in children with COVID-19.

Potential impact for readers outside the authors’ field

GLMMs should be considered as a flexible alternative to two-step methods when synthesizing proportions in a meta-analysis; however, we recommend assessing the sensitivity of results to the choice of link function and subsequently choosing a link based on model selection criteria such as the AIC.

Background

Several different approaches exist for synthesizing proportions in a meta-analysis within a random effects framework. Two-step methods require first transforming the study proportions from a 0% to 100% scale to one where the distribution of proportions across studies can be approximated using a normal distribution. Commonly used transformations include the log, logit, arcsine, and Freeman-Tukey double-arcsine transformations.¹ A random effects model can then be fitted on these transformed proportions, and the results subsequently transformed back to the original scale. Alternatively, a generalized linear mixed model (GLMM) can be fitted directly on the observed study proportions without transforming them in a separate step.^1–4 This uses the exact binomial likelihood of the observations and importantly, unlike two-step approaches, accounts for uncertainty in the within-study variances and does not require an ad hoc correction for zero counts.^2,3,5 Furthermore, Schwarzer et al.¹ and Röver and Friede⁶ highlighted bias and inconsistent results due to issues of monotonicity and invertibility that can occur when using the Freeman–Tukey double-arcsine transformation (FTT) in the meta-analysis of proportions.

GLMMs can be easily extended to the case of multivariate meta-analysis, where related proportions of multiple outcomes or treatments may be reported in each study, and these outcomes are modeled jointly.^7–9 These multivariate methods are particularly useful when one or more outcomes have missing data. In this case, they borrow information across outcomes, increasing precision and avoiding bias when the data are missing at random.^10,11 In addition to the meta-analysis of prevalence data,⁸ multivariate GLMMs have been frequently implemented in the meta-analysis of diagnostic test accuracy^12–15 as well as network meta-analysis.^16–21

GLMMs can be easily implemented using popular software packages such as R,²² SAS, and Stata. The “meta”²³ and “metafor”²⁴ packages in R can be used to fit frequentist GLMMs in the specific context of meta-analysis; however, the R package “lme4”²⁵ can also be used. In SAS software, the GLIMMIX and NLMIXED procedures can be used to fit frequentist GLMMs, while the BGLIMM procedure can be used for Bayesian GLMMs.²⁶ The “meglm” and “metan”²⁷ commands in Stata can be used for frequentist GLMMs, while the “bayes: meglmc” command can be used for Bayesian GLMMs. Each of these methods is user-friendly, with little coding necessary. Empirical sandwich estimators of the variance are easily implemented in SAS software, using the EMPIRICAL option in the GLIMMIX or NLMIXED procedure, or by specifying “vce(robust)” in Stata.

A current limitation of the GLMM is that while the sample sizes for each study are incorporated when constructing the likelihood, unlike with the two-step approaches, explicit weights for each study are not available. However, how to quantify the contribution of each study can be an area of future work. Another limitation is that random effects models may result in overdispersion of the data relative to the model.^28,29 One could alternatively use the generalized estimating equations (GEE) method to estimate the population-averaged proportion; this approach would be robust to misspecification of the covariances within studies.³⁰ However, the investigation of between-study heterogeneity is a central component of meta-analysis, and this type of approach would not explicitly model this heterogeneity.³¹

While GLMMs offer a flexible and accessible approach to conducting a meta-analysis of proportions, they require choosing a link function and making a parametric (typically normal) assumption about the random effects distribution. In this paper we explore the robustness of GLMMs in the meta-analysis of proportions to misspecification of the link function, and whether the AIC can be used to reliably select the best fitting link. We illustrate this model selection process using data on the prevalence of fever from 36 studies included in a recent systematic review and meta-analysis of clinical characteristics and laboratory findings in children with COVID-19.³²

Generalized linear mixed model (GLMM)

We begin by reviewing the formulation of the GLMM for univariate proportion data. Let $y_{i}$ be the observed count, $π_{i}$ be the study-specific underlying proportion, and $n_{i}$ be the sample size for study $i \in {1, \dots, N}$ . Then, the GLMM specifies that:

y_{i} \sim B i n o m i a l (n_{i}, p_{i}),

g (π_{i}) = μ + θ_{i},

θ_{i} \sim N (0, τ^{2}),

where

μ

is the overall proportion on the transformed scale via link function

g (\cdot)

and

τ^{2}

is the between-study variance. Widely used link functions include the logit, log, probit, and complementary log-log (cloglog) links.² This model allows for both the explicit estimation of the between-study variance as well as the entire distribution of underlying study-specific proportions. For example, assuming a logit link function, the study proportions will follow a logit-normal distribution with location parameter

μ

and scale parameter

τ

. The random effects framework further allows for constructing a prediction interval for the proportion in a future study, another intuitive way of describing between-study heterogeneity.^31,33 The commonly used estimate of the overall or pooled proportion

\hat{π} = g^{- 1} (\hat{μ})

can be interpreted as the median proportion across studies.² The marginal or population-averaged proportion can be estimated from the results of the GLMM; this estimate has a closed form when using a probit link function but can be estimated using numerical methods if another link is chosen.²

As previously mentioned, the GLMM is fitted directly to the data using a “one-step approach” and fully accounts for uncertainty in the observed within-study variances, whereas traditional “two-step” approaches assume these variances within studies are known.^2,3 Furthermore, the flexibility in the choice of link function allows the GLMM to accommodate different distributional shapes. For example, the cloglog and cauchit links can allow for skewed and heavy-tailed latent distributions, respectively.² One can select a link function based on model selection criteria such as the Akaike information criterion (AIC), or equivalently the Bayesian information criterion (BIC) (see Appendix), or consider model averaging in a Bayesian framework. To investigate this further, we explored the robustness of GLMMs to link function misspecification in the meta-analysis of proportions and whether the AIC can reliably select the best fitting link function. We also assessed whether using an empirical sandwich estimator of the variance of the fixed effect could minimize undercoverage bias due to link function misspecification.

Simulations

Methods of simulation

We simulated 2000 iterations of data within SAS On Demand for Academics using one of three links (cloglog, logit, or probit) under several different true median prevalences (π = 0.05 or 0.3) and between-study standard deviations, (τ; = 0.5, 1, 1.5, or 3); each simulation contained 25 studies with 100 participants each. We fitted six GLMMs using the GLIMMIX procedure for each set of simulated data, assuming a different link function and variance estimator (model-based or “MBN” empirical sandwich), and then found the AIC, bias, and the 95% coverage probabilities from each model.

Simulation results

The coverage using both the empirical sandwich and model-based standard errors largely remained near 95% when the link function was correctly specified (Figure 1). For cases where π = 0.3, if the data were generated from a logit link and a probit link was assumed, the coverage remained near 95% (and vice versa). However, when data were generated using a cloglog link, using a probit or logit link for the analysis resulted in coverage below 95% (and vice versa). This difference between the probit and logit links compared to the cloglog link became more apparent as the between-study heterogeneity increased. When π = 0.05, the coverage probabilities were less sensitive to the choice of link function.

Figure 1.

95% coverage across 2000 simulations when the median prevalence (π) was 0.3; the left side shows the results for the model-based SE estimator and the right side shows those for the sandwich estimator. Each panel represents the true link function and the x-axis represents the between-study standard deviation ( $τ$ ).

The bias tended to be lowest when the link function was correctly specified and the between-study variance was low. The impact of model misspecification on the bias was larger when π = 0.3 than when π = 0.05 (Figure A.2). We hypothesize that this is due to the shapes of the link functions, as the functions resemble each other more closely in the lower tail (Figure A.3).

We saw that using the model-based estimator tended systematically towards undercoverage and that the sandwich estimator tended towards over-coverage (Figure 1 and A.1). Coverage issues are driven both by the accuracy of the estimated standard error and by bias; the use of an empirical sandwich estimator will not remedy bias due to an ill-fitting link. However, undercoverage was slightly less extreme when using the sandwich estimator than the model-based estimator, as the average estimate of the standard error (MBSE) was more similar to the empirical standard error (ESE) when using the sandwich estimate, compared to the model-based estimate (average absolute difference for sandwich across all conditions: 0.01, for model-based: 0.03).

While the AIC had difficulty differentiating between logit and probit links when π = 0.3, as their shapes are similar, we observed that the AIC was able to correctly differentiate between a cloglog and a logit/probit link at least 47% of the time when the between-study standard deviation was 0.5 (Figure 2). As this standard deviation increased to 3, the rate increased to over 75% for all true links. We observed the largest impact of link function misspecification on bias and coverage in scenarios with high between-study heterogeneity; thus, the AIC’s differentiation ability was the greatest in scenarios where the correct specification of the link function was most important.

Figure 2.

Each bar shows the proportion of the 2000 simulations in which each link function was chosen by AIC when the true median prevalence (π) was 0.3. Each panel represents the true link function and the x-axis represents the between-study standard deviation ( $τ$ ).

Prevalence of fever in children with COVID-19

We illustrate the model selection process and interpretability of the GLMM using an example of the prevalence of fever in pediatric cases of COVID-19.³² We fit GLMMs with logit, probit, and cloglog link functions using the GLIMMIX procedure within SAS On Demand for Academics, using both model-based and empirical sandwich (“MBN” option) estimates for the standard errors (see Appendix for SAS code). Table 1 presents the results for the estimated median prevalence for the three models, the 95% prediction intervals for a new study, and the AICs. The estimated median prevalence was similar across link functions, with the logit (0.462, 95% CI: [0.392, 0.533]) and probit (0.463, 95% CI: [0.394, 0.532]) link estimates being more similar compared to the cloglog link (0.448, 95% CI: [0.282, 0.520]). Figure A.4 shows the estimated distributions of study prevalences using the three different links. The model-based and ESE estimates were also similar across the three link functions; the prediction intervals were also similar but varied more than the point estimates. The AIC was lowest for the cloglog link (216.20) compared to the logit (216.44) and probit links (216.68), suggesting the cloglog link provided the best fit to the data, though the differences in AIC were extremely small.

Table 1.

Median prevalence (95% CI) and 95% prediction interval for a new study prevalence of fever estimated using GLMM with different link functions and model-based or empirical standard error (SE).

SE calculation		Link function
SE calculation		Logit	Probit	Cloglog
Model-based	Estimate (95% CI)	0.462 (0.392, 0.533)	0.463 (0.394, 0.532)	0.448 (0.383, 0.520)
	95% prediction interval	(0.158, 0.797)	(0.154, 0.797)	(0.178, 0.836)
Empirical	Estimate (95% CI)	0.462 (0.391, 0.534)	0.463 (0.396, 0.531)	0.448 (0.383, 0.520)
	95% prediction interval	(0.158, 0.797)	(0.153, 0.798)	(0.178, 0.836)
	AIC	216.44	216.68	216.20

With the cloglog link, the estimated median prevalence of fever in children with COVID-19 was 44.8% (95% CI: [28.2%, 52.0%]). The study-level and pooled prevalence estimates are shown in Figure A.5. The 95% prediction interval indicates that we would expect 95% of future studies to have a prevalence of fever between 17.8% and 83.6%. We can plot the entire estimated distribution of study prevalences from the model, as shown in Figure A.6. This wide range of predicted values indicates a high degree of heterogeneity in the prevalence of fever across the pediatric studies included in the meta-analysis.

Conclusions

GLMMs provide an interpretable and flexible approach to summarizing proportions in a meta-analysis of multiple related studies. However, using a GLMM requires specifying a link function, and different choices can lead to different results. In particular, while the logit and probit functions generally have similar shapes, the cloglog link differs more substantially. As misspecification of the link function can lead to bias, undercoverage due to a misspecified link function cannot necessarily be avoided by using an empirical sandwich estimate of the variance, though we observed a small improvement when using this. To address this, we found that the AIC is an effective tool for choosing the link function in cases where alternate link functions give differing results. We recommend investigating the sensitivity of results to the choice of link function when conducting meta-analyses of proportions using GLMMs and using the AIC to choose the best-fitting link function when these results differ.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NIH National Heart, Lung, and Blood Institute (T32HL129956) and the NIH National Library of Medicine (R01LM012982).

ORCID iDs

Lianne K Siegel

Milena Silva

Lifeng Lin

Appendix

References

Schwarzer

Chemaitelly

Abu-Raddad

, et al. Seriously misleading results using inverse of Freeman-Tukey double arcsine transformation in meta-analysis of single proportions. Res Synth Methods 2019; 10: 476–483. DOI: 10.1002/jrsm.1348.

Lin

Chu

. Meta-analysis of proportions using generalized linear mixed models. Epidemiology 2020; 31: 713–717. DOI: 10.1097/EDE.0000000000001232.

Lin

. Arcsine-based transformations for meta-analysis of proportions: Pros, cons, and alternatives. Health Sci Rep 2020; 3: e178. DOI: 10.1002/hsr2.178.

Bakbergenuly

Kulinskaya

. Meta-analysis of binary outcomes via generalized linear mixed models: a simulation study. BMC Med Res Methodol 2018; 18: 70. DOI: 10.1186/s12874-018-0531-9.

Hamza

van Houwelingen

Stijnen

. The binomial distribution of meta-analysis was preferred to model within-study variability. J Clin Epidemiol 2008; 61: 41–51. DOI: 10.1016/j.jclinepi.2007.03.016.

Röver

Friede

. Double arcsine transform not appropriate for meta-analysis. Res Synth Methods 2022; 13: 645–648. DOI: 10.1002/jrsm.1591.

Trikalinos

Hoaglin

Schmid

. An empirical comparison of univariate and multivariate meta-analyses for categorical outcomes: empirical comparison of univariate vs multivariate meta-analyses. Stat Med 2014; 33: 1441–1459. DOI: 10.1002/sim.6044.

Siegel

Rudser

Sutcliffe

, et al. A Bayesian multivariate meta‐analysis of prevalence data. Stat Med 2020; 39: 3105–3119. DOI: 10.1002/sim.8593.

Chu

Nie

Chen

, et al. Bivariate random effects models for meta-analysis of comparative studies with binary outcomes: methods for the absolute risk difference and relative risk. Stat Med 2012; 21(6): 621–633.

10.

Kirkham

Riley

Williamson

. A multivariate meta-analysis approach for reducing the impact of outcome reporting bias in systematic reviews. Stat Med 2012; 31: 2179–2195. DOI: 10.1002/sim.5356.

11.

Jackson

Riley

White

. Multivariate meta-analysis: potential and promise. Stat Med 2011; 30: 2481–2498. DOI: 10.1002/sim.4172.

12.

Chu

Cole

. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol 2006; 59: 1331–1332. DOI: 10.1016/j.jclinepi.2006.06.011.

13.

Chu

Guo

Zhou

. Bivariate random effects meta-analysis of diagnostic studies using generalized linear mixed models. Med Decis Making 2010; 30: 499–508. DOI: 10.1177/0272989X09353452.

14.

Reitsma

Glas

Rutjes

AWS

, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005; 58: 982–990. DOI: 10.1016/j.jclinepi.2005.02.022.

15.

Nie

Cole

, et al. Statistical methods for multivariate meta-analysis of diagnostic tests: an overview and tutorial. Stat Methods Med Res 2016; 25: 1596–1619. DOI: 10.1177/0962280213492588.

16.

Zhang

Carlin

Neaton

, et al. Network meta-analysis of randomized clinical trials: reporting the proper summaries. Clin Trials 2014; 11: 246–262. DOI: 10.1177/1740774513498322.

17.

Zhang

Chu

Hong

, et al. Bayesian hierarchical models for network meta-analysis incorporating nonignorable missingness. Stat Methods Med Res 2017; 26: 2227–2243. DOI: 10.1177/0962280215596185.

18.

Hong

Chu

Zhang

, et al. A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Res Synth Methods 2016; 7: 6–22. DOI: 10.1002/jrsm.1153.

19.

Wang

Lin

Hodges

, et al. The impact of covariance priors on arm-based Bayesian network meta-analyses with binary outcomes. Stat Med 2020; 39: 2883–2900. DOI: 10.1002/sim.8580.

20.

White

Turner

Karahalios

, et al. A comparison of arm-based and contrast-based models for network meta-analysis. Stat Med 2019; 38: 5197–5213. DOI: 10.1002/sim.8360.

21.

Wang

Lin

Murray

, et al. Bridging randomized controlled trials and single-arm trials using commensurate priors in arm-based network meta-analysis. Ann Appl Stat 2021; 15: 1767–1787. DOI: 10.1214/21-AOAS1469.

22.

R Core Team. R . A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2021.

23.

Schwarzer

. meta: an R package for meta-analysis. R News 2007; 7: 40–45.

24.

Viechtbauer

. Conducting meta-analyses in R with the metafor package. J Stat Softw 2010; 36: 1–48. DOI: 10.18637/jss.v036.i03.

25.

Bates

Mächler

Bolker

, et al. Fitting linear mixed-effects models using lme4. J Stat Softw 2015; 67: 1–48, DOI: 10.18637/jss.v067.i01.

26.

Rott

Lin

Hodges

, et al. Bayesian meta-analysis using SAS PROC BGLIMM. Res Synth Methods 2021; 12: 692–700. DOI: 10.1002/jrsm.1513.

27.

Harris

Deeks

Altman

, et al. metan: fixed- and random-effects meta-analysis. STATA J 2008; 8: 3–28.

28.

Doi

SAR

Furuya-Kanamori

Thalib

, et al. Meta-analysis in evidence-based healthcare: a paradigm shift away from random effects is overdue. Int J Evid Based Healthc 2017; 15: 152–160. DOI: 10.1097/XEB.0000000000000125.

29.

Doi

SAR

Furuya-Kanamori

. Selecting the best meta-analytic estimator for evidence-based practice: a simulation study. Int J Evid Based Healthc 2020; 18: 86–94. DOI: 10.1097/XEB.0000000000000207.

30.

Preisser

Inan

Powers

, et al. A population-averaged approach to diagnostic test meta-analysis. Biom J 2019; 61: 126–137. DOI: 10.1002/bimj.201700187.

31.

Riley

Higgins

JPT

Deeks

. Interpretation of random effects meta-analyses. BMJ 2011; 342: d549. DOI: 10.1136/bmj.d549.

32.

Kharoud

Asim

Siegel

, et al. Review of clinical characteristics and laboratory findings of COVID-19 in children-Systematic review and Meta-analysis. medRxiv [Preprint], 2020. https://doi.org/10.1101/2020.09.23.20200410

33.

Lin

. Use of prediction intervals in network meta-analysis. JAMA Netw Open 2019; 2: e199735. DOI: 10.1001/jamanetworkopen.2019.9735.

Choice of link functions for generalized linear mixed models in meta-analyses of proportions

Abstract

Keywords

What is already known

What is new

Potential impact for readers outside the authors’ field

Background

Generalized linear mixed model (GLMM)

Simulations

Methods of simulation

Simulation results

Prevalence of fever in children with COVID-19

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

Appendix

References