Statistical Heterogeneity in Oral Health Meta-Analyses

Abstract

Providing the summary effect size and its uncertainty, a prediction interval, and a measure of statistical heterogeneity constitute good reporting practices in meta-analyses. Popular statistical heterogeneity measures comprise the τ² and I² statistics. However, researchers often rely unduly on the I² statistic, using naive categorizations to gauge the extent of heterogeneity, leading to misuses of the meta-analysis models, deficiencies in reporting, and misleading conclusions. The present study aimed to provide empirical evidence on the reporting and interpretation of statistical heterogeneity in systematic reviews of oral health published between 2021 and 2023 in 21 leading specialty and general dental journals. Systematic reviews with at least 1 meta-analysis on binary or continuous outcomes with the most studies were identified. Characteristics were extracted at the systematic review and meta-analysis levels. In total, 313 systematic reviews with meta-analyses were analyzed. Within this cohort of meta-analyses, the random-effects model (89%, n = 278) was frequently applied. Almost all meta-analyses (98%, n = 308) reported the I² value, and 51% (n = 160) reported the τ² value. For this sample, the median I² was 76% (range: 0%–100%), and the median τ² was 0.29 (range: 0–2,632), with 13% (n = 20/160) of these meta-analyses reporting zero τ ² . Most of the meta-analyses (96%, n = 299) based the heterogeneity interpretation on I² and only 21 (7%) on τ². Although 49% (n = 152) of the meta-analyses chose the meta-analysis model a priori, only 41% (n = 63/152) justified this choice. Furthermore, 42% (n = 131) of the 313 meta-analyses chose the meta-analysis model based on the I². Within oral health meta-analyses, there is evidence of overreliance on I² when reporting and interpreting statistical heterogeneity and selecting the meta-analysis model. The inappropriate use of I² in meta-analysis model selection and interpretation of statistical heterogeneity may have implications for the quality of conclusions delivered to the end users of systematic reviews.

Keywords

dental health uncertainty systematic reviews meta-analysis

Introduction

Pairwise meta-analysis is a long-established evidence synthesis tool with a central role in making informed decisions about the benefit-harm profile of 2 interventions. Attractive elements of pairwise meta-analysis include the increased accuracy and precision of the summary results and the greater statistical power compared with a single study, particularly when statistical heterogeneity is mitigated (Hedges and Pigott 2001).

Selecting the appropriate meta-analysis model (hereafter referred to as “model”) is pivotal for credible conclusions. Decisions regarding the model should be made during the protocol stage of the review (Lasserson et al. 2023). A random-effects model is more suitable when studies are retrieved from literature searches, as this model accounts for the inherent clinical and methodological heterogeneity across primary studies (Borenstein et al. 2010). A fixed-effect model implies one true treatment effect, and any differences in the observed results of the studies are attributed to sampling error (within-study variance) (Borenstein et al. 2010; Kanters 2022). Therefore, using a fixed-effect model is difficult to defend when the included studies come from literature searches (Borenstein et al. 2010).

The principal sources of heterogeneity in a meta-analysis are distinguished as clinical, methodological, and statistical, attributed to variance in the true treatment effects across studies (Thompson 1994; Richardson et al. 1995; Melsen et al. 2014). Statistical heterogeneity can be identified (1) visually in a forest plot by observing how much the study-specific treatment effects and their 95% confidence intervals (CIs) overlap and (2) statistically by applying the Cochran’s Q test and estimating the parameters I² and τ² (Fletcher 2007). The I² is an intuitive parameter that describes the proportion of variation in observed treatment estimates attributed to statistical heterogeneity rather than sampling error alone and comprises a relative measure of statistical heterogeneity (Rücker et al. 2008; Borenstein 2023; Deeks et al. 2023). Conversely, τ² is an absolute measure of statistical heterogeneity and quantifies the variance of the true treatment effects across the studies (Rücker et al. 2008; Borenstein 2024).

Within the health care literature, I² and τ² are consistently used in the reported meta-analysis results (Thorlund et al. 2012; Alba et al. 2016; Langan et al. 2017). However, questions have been raised regarding the appropriateness of the interpretation of I² and τ² (Rücker et al. 2008; Borenstein et al. 2017; Borenstein 2023; Borenstein 2024). Common misconceptions include using I² to gauge the variability of the effect size across studies and to classify the extent of statistical heterogeneity using arbitrary thresholds, treating I² as an absolute heterogeneity measure and resorting to I² or τ² to select the model (Rücker et al. 2008; Borenstein 2022; Borenstein 2023; Borenstein 2024). In meta-analyses with a few studies, I² may overestimate or underestimate the degree of true statistical heterogeneity (von Hippel 2015). Furthermore, the study size and event frequency (for binary outcomes) may affect the estimation of I² (Thorlund et al. 2012; von Hippel 2015). Consequently, using I² to describe the extent of statistical heterogeneity can be problematic. A proposed solution is to accompany the estimated I² with the 95% CI to communicate the uncertainty around the estimation (Ioannidis et al. 2007; Ioannidis 2008; Thorlund et al. 2012; von Hippel 2015).

Systematic reviews (SRs) with meta-analysis have been well-received by the oral health community. The surge of several bibliographic surveys and empirical studies on the reporting quality and statistical aspects of published SRs in oral health demonstrates the rising popularity of this study design in the broad oral health field. The use of SRs in clinical decisions makes it imperative that the limitations are recognized and the results of SRs are properly interpreted. To our knowledge, an empirical evaluation of the reporting and interpretation of I² and τ² in oral health SRs has not been previously undertaken. The primary aim of this study was to report the frequency of reporting I² and τ² in oral health meta-analyses. A secondary aim was to assess how the SR authors report and interpret I² and τ².

Materials and Methods

Eligibility Criteria

Twenty-one English-language oral health journals with the highest impact factor (2023) were included. This cohort consisted of the 2 highest impact factor (2023) journals from the following dental specialties: cariology, endodontology, periodontology, orthodontics, oral surgery/oral pathology, oral and maxillofacial surgery, implantology, reconstructive and pediatric dentistry and 3 general oral health journals. To be eligible, SRs had to report interventional procedures involving human participants or animals and have at least 1 meta-analysis for binary or continuous outcomes containing at least 3 primary studies. In SRs with more than 1 meta-analysis, the meta-analysis with the most primary studies was selected. Scoping reviews were excluded for not including any study syntheses. Network meta-analyses were also excluded for having additional analysis and reporting requirements around statistical heterogeneity. Where multiple versions of the same SR existed, we included the most recent.

Selection of SRs

Two authors (Z.T. and J.S.) independently searched journal websites manually to identify eligible SRs. The search date was from September 10, 2023, to September 17, 2023. SRs published between June 1, 2021, and August 31, 2023, were considered for further selection based on our inclusion and exclusion criteria. The terms “systematic review” or “meta-analysis” were screened in the title, abstract, and full article text. In case of disagreement, a third author (N.P.) was consulted to discuss the eligibility of the SR until consensus.

Data Extraction

Prior to data collection, a prepiloting process based on 20 SRs was undertaken to ensure consistency between the authors (E.K. and Z.T.) regarding the understanding of the extracted elements at the SR and meta-analysis levels. One hundred percent agreement was reached. All extracted elements were entered into a prepiloted Microsoft Excel® data collection sheet. Any disagreements relating to the extraction were resolved by discussion among the authors (Z.T., E.K., L.S., and N.P.) until consensus.

The following characteristics at the SR level were extracted: journal title, year of publication, journal impact factor, number of authors, the continent affiliated with the first author (Europe, America, Asia, or other), and PROSPERO registration (yes or no). At the meta-analysis level, the following characteristics were extracted from forest plots, tables, and text of the included SRs: the model (fixed-effect or random-effects), between-study heterogeneity estimator used, I² and τ² value with 95% CIs for the selected meta-analysis (if reported), the effect measure of the pooled estimate (e.g., mean difference, standardized mean difference, odds ratio, or risk ratio), and whether the Hartung-Knapp-Sidik-Jonkman (HKSJ) adjustment for the 95% CI of the summary effects was applied (relevant only for random-effects meta-analysis). An assessment was made on whether the model was chosen a priori or based on the I² value. If it was chosen a priori, the full text of the SR was reviewed for any justification to support this decision by the authors, and the verbatim statements were documented. Information on whether the interpretation of the heterogeneity was based on the I² or τ² value was also recorded.

Statistical Analysis

The extracted data were tabulated, and descriptive statistics were used to summarize the continuous (median, interquartile range [IQR], and range) and categorical characteristics (absolute and relative frequencies). A bubble plot was used to illustrate the frequency of SRs per journal over the study time frame. Box plots with integrated dots were created to present the distribution of the point estimate and 95% CI bounds for I² and τ². Stacked bar plots were used to illustrate the frequency of heterogeneity interpretation with I² or τ² conditionally on whether the model was chosen a priori. All analyses were conducted using R statistical software (R Core Team 2023). The ggplot2 R package was used to produce the figures (Wickham 2016). The data and the functions supporting the present study’s findings are publicly available at https://github.com/LoukiaSpin/Statistical-Heterogeneity-Reporting-Quality-Oral-health-Systematic-Reviews.git.

Results

Six hundred eight oral health SRs published between June 1, 2021, and August 31, 2023, were initially identified from 21 journals (Supplementary Table S2). Following the eligibility criteria, 313 SRs containing meta-analyses with at least 3 primary studies were deemed suitable for analysis (Fig. 1 and Appendix Table S1). One hundred percent agreement was reached regarding the final selected SRs.

Figure 1.

PRISMA flowchart of article identification and selection (N = 313).

The characteristics of the included SRs are summarized in Table 1. SRs included a median of 6 authors (IQR: 4 to 6), ranging from 2 to 16. Most SRs were published in Europe (35%, n = 111), followed by Asia (33%, n = 104) and America (25%, n = 77), and were registered with the PROSPERO database (75%, n = 235). The journals BMC Oral Health, Journal of Prosthetic Dentistry, and Journal of Dentistry published the most SRs (Supplementary Fig. S1).

Table 1.

Characteristics of included systematic reviews.

Characteristic	Summary (N = 313)
Number of authors
Median	6.00
IQR	(4.00, 6.00)
Range	(2.00, 16.00)
Continent
Europe	111 (35.0%)
Asia	104 (33.0%)
America	77 (25.0%)
Other	21 (6.7%)
PROSPERO
Yes	235 (75.0%)
No	78 (25.0%)

IQR, interquartile range.

The characteristics of the meta-analyses are summarized in Table 2. At the meta-analysis level, the random-effects model was more commonly used (89%, n = 278) than the fixed-effect model (11%, n = 35). Of the 278 meta-analyses that applied the random-effects model, only 72 (26%) reported a heterogeneity estimator. Of the 72 meta-analyses that reported a heterogeneity estimator, 42 (58%) applied DerSimonian and Laird, followed by the restricted maximum likelihood (36%, n = 26), Paule-Mandel (4%, n = 3), and DerSimonian-Kacker (1%, n = 1). Only 1 of the meta-analyses with a random-effects model reported a 95% CI for τ². All 35 meta-analyses with a fixed-effect model and almost all meta-analyses with a random-effects model (98%, n = 273) reported the I² statistic; however, only 8 (3%) meta-analyses with a random-effects model provided a 95% CI for the I² statistic, and none of the meta-analyses with a fixed-effect model did.

Table 2.

Characteristics of the Included Meta-Analyses.

Characteristic	Total (N = 313)	Meta-Analysis Model Used
Characteristic	Total (N = 313)	Fixed Effect (n = 35; 11%)	Random Effects (n = 278; 89%)
Estimator of heterogeneity parameter (τ²) applied^a
DerSimonian and Laird	42 (13.4%)	0 (0.0%)	42 (15.1%)
Restricted maximum likelihood	26 (8.3%)	0 (0.0%)	26 (9.4%)
Paule and Mandel	3 (1.0%)	0 (0.0%)	3 (1.1%)
DerSimonian and Kacker	1 (0.3%)	0 (0.0%)	1 (0.4%)
Not reported	206 (65.8%)	0 (0.0%)	206 (74.1%)
Not applicable	35 (11.2%)	35 (100.0%)	0 (0.0%)
The 95% confidence interval of τ² is reported
Yes	1 (0.3%)	0 (0.0%)	1 (0.4%)
No	277 (88.5%)	0 (0.0%)	277 (99.6%)
Not applicable	35 (11.2%)	35 (100.0%)	0 (0.0%)
The I² statistic is reported
Yes	308 (98.4%)	35 (100.0%)	273 (98.2%)
No	5 (1.6%)	0 (0.0%)	5 (1.8%)
The 95% confidence interval of I² is reported
Yes	8 (2.6%)	0 (0.0%)	8 (2.9%)
No	305 (97.4%)	35 (100.0%)	270 (97.1%)
Effect measured used
Odds ratio	53 (16.9%)	7 (20.0%)	46 (16.5%)
Risk ratio	40 (12.8%)	9 (25.7%)	31 (11.2%)
Risk difference	3 (1.0%)	0 (0.0%)	3 (1.1%)
Hazard ratio	5 (1.6%)	1 (2.9%)	4 (1.4%)
Incident ratio	1 (0.3%)	1 (2.9%)	0 (0.0%)
Event rate	1 (0.3%)	1 (2.9%)	0 (0.0%)
Mean difference	157 (50.2%)	11 (31.4%)	146 (52.5%)
Standardized mean difference	53 (16.9%)	5 (14.3%)	48 (17.3%)
Meta-analysis model chosen a priori?
Yes	152 (48.6%)	3 (8.6%)	149 (53.6%)
No	142 (45.4%)	27 (77.1%)	115 (41.4%)
Unclear	14 (4.5%)	5 (14.3%)	9 (3.2%)
Not reported	5 (1.6%)	0 (0.0%)	5 (1.8%)
A priori chosen meta-analysis model was explained^b
Yes	63 (41.4%)	1 (33.3%)	62 (41.6%)
No	89 (58.6%)	2 (66.7%)	87 (58.4%)
Choice of meta-analysis model based on I²
Yes	131 (41.9%)	26 (74.3%)	105 (37.8%)
No	166 (53.0%)	5 (14.3%)	161 (57.9%)
Unclear	11 (3.5%)	4 (11.4%)	7 (2.5%)
Not reported	5 (1.6%)	0 (0.0%)	5 (1.8%)
Heterogeneity interpretation was based on I²
Yes	299 (95.5%)	34 (97.1%)	265 (95.3%)
No	1 (0.3%)	0 (0.0%)	1 (0.4%)
Unclear	8 (2.6%)	1 (2.9%)	7 (2.5%)
Not reported	5 (1.6%)	0 (0.0%)	5 (1.8%)
Heterogeneity interpretation based on τ²^a
Yes	21 (6.7%)	0 (0.0%)	21 (7.6%)
No	245 (78.3%)	0 (0.0%)	245 (88.1%)
Unclear	7 (2.2%)	0 (0.0%)	7 (2.5%)
Not reported	5 (1.6%)	0 (0.0%)	5 (1.8%)
Not applicable	35 (11.2%)	35 (100%)	0 (0.0%)

Twenty-six meta-analyses with a random-effects model used the Mantel–Haenszel method, where τ² is estimated by incorporating the Mantel–Haenszel Q statistic (for the heterogeneity test) into the formula for the DerSimonian and Laird estimator.

The sample has been restricted to 152 meta-analyses (3 with a fixed-effect model and 149 with a random-effects model), where the meta-analysis model was chosen a priori.

Half of the meta-analyses used the mean difference (50%, n = 157), followed by the standardized mean difference and odds ratio in a tie (17%, n = 53) and the risk ratio (13%, n = 40) (Table 2). Other less frequently employed effect measures included the risk difference (1%, n = 3), hazard ratio (2%, n = 5), incidence ratio (0.3%, n = 1), and event rate (0.3%, n = 1).

Almost half of the 313 meta-analyses chose the model a priori (49%, n = 152); however, only 63 of 152 (41%) meta-analyses explicitly provided a reason for this choice (Table 2). Three (9%) meta-analyses with a fixed-effect model and 149 (54%) meta-analyses with a random-effects model chose the model a priori; only 1 of 3 (33%) and 62 of 149 (42%) of these meta-analyses, respectively, explained the reason for selecting a fixed-effect or random-effects model. Overall, 14 (4%) meta-analyses gave unclear information on whether there was an a priori model selection. Supplementary Table S3 provides the verbatim from those 14 meta-analyses with unclear information about the a priori selected model. Supplementary Table S4 lists the quotations from the 63 meta-analyses that justified the model selected a priori.

Forty-two percent (n = 131) of the meta-analyses chose the model based on the I² value (Table 2). Most of the meta-analyses with a fixed-effect model (74%, n = 26) relied on the I² value. Interestingly, most meta-analyses with a random-effects model (58%, n = 161) did not base the model selection on the I² value.

Almost all included meta-analyses (96%, n = 299) interpreted statistical heterogeneity based on the I² value, regardless of the selected model (fixed effect or random effects). Of the 278 meta-analyses with a random-effects model, only 21 (8%) interpreted statistical heterogeneity using the τ² parameter.

Of the 278 meta-analyses with a random-effects model, only 5 (2%) reported to have applied the HKSJ adjustment when calculating the 95% CI for the pooled estimate. Almost all meta-analyses (98%, n = 308) reported the I² value in their results, which ranged from 0% to 100% (IQR: 41% to 93%), with a median of 76% (Fig. 2). Of these meta-analyses, 43 (14%) provided a zero I², and only 3 meta-analyses reported a 100% I² value. Of the 278 meta-analyses with a random-effects model, 160 (58%) reported the τ² in their results: τ² had a median of 0.29, ranging from 0 to 2,632.09 (IQR: 0.04 to 1.09) (Fig. 2). Twenty (13%) of these meta-analyses reported a zero τ²; of those, 4 (20%) reported to have employed DerSimonian and Laird or the Mantel–Haenszel method, followed by 2 meta-analyses with restricted maximum likelihood and 1 with Paule-Mandel—the remaining 9 meta-analyses did not report a heterogeneity estimator.

Figure 2.

Box plots with integrated dots illustrate the estimated I² (in percentage) and τ² alongside the bounds (lower and upper) of their 95% confidence intervals extracted from the meta-analyses. The y-axis of the plot for τ² is on the logarithmic scale.

Figure 3 shows the frequency of heterogeneity interpretation using I² or τ² based on whether the model was selected a priori. Regardless of whether the model was selected a priori, almost all meta-analyses interpreted statistical heterogeneity using I² (Fig. 3A). Conversely, only 11% (n = 16) of meta-analyses with a priori model selection and 4% (n = 5) without a priori model selection interpreted statistical heterogeneity using τ² (Fig. 3B). The same pattern was observed when investigating heterogeneity interpretation (using I² or τ²) in association with whether the selected model was based on the I² value (Fig. 3C, D).

Figure 3.

Stacked bar plots on the frequency of interpreting statistical heterogeneity using I² and τ² conditionally on whether the meta-analysis model was chosen a priori (plots A and B, respectively) or based on the I² value (plots C and D, respectively). Green, red, and purple colors refer to yes, no, and other concerning interpreting statistical heterogeneity using I² and τ² (legend colors). Percentages refer to the number of meta-analyses in a particular-colored bar divided by the total number of meta-analyses in the corresponding category of the x-axis.

Discussion

The values of I² and τ² should not guide the selection of the meta-analysis model. Statistical heterogeneity is inherent when synthesizing studies with different protocols and should be expected (Higgins 2008). Therefore, testing for or quantifying statistical heterogeneity to decide on the meta-analysis model is redundant and misleading (Deeks et al. 2023). A nonzero I² or τ² indicates present statistical heterogeneity, and investigating the prespecified sources of statistical heterogeneity by employing, for instance, meta-regression or sensitivity analyses is crucial to interpreting the results (e.g., do the summary results vary in different subpopulations?) (Deeks et al. 2023). The random-effects model properly accommodates the statistical heterogeneity in the meta-analysis results and should be the preferred approach. Nevertheless, SR authors insist on selecting the model based on data-driven endeavors.

Other criteria SR authors may consider in selecting the model, beyond the statistical heterogeneity measures, deserve investigation. A survey interview of SR authors and journal editors, for instance, on their meta-analysis knowledge and involvement of a statistician, is currently missing and would be a valuable addition to the relevant literature. The role of the reviewers and journal editors in ensuring proper reporting and interpretation of the meta-analysis results is also crucial to the quality of the evidence ecosystem. For instance, updating the submission guidelines with information about the common misuses and omissions around the reporting and undertaking of meta-analysis would greatly enhance a quality peer review for submitted SRs with meta-analysis.

The present study revealed that I² was used almost exclusively (96%) to assess and interpret the level of heterogeneity. Conversely, only 21 (7%) meta-analyses used τ² for this purpose. Furthermore, 42% of SRs chose the model based on the I² values. Our findings are consistent with previous relevant literature (Page et al. 2018; Bassani et al. 2019). In the analysis of 110 SRs, more than half used I² with arbitrary thresholds to interpret heterogeneity (Page et al. 2018). In addition, 30% of SRs relied on I² or the Cochran’s Q test to decide on the model (Page et al. 2018). Similarly, in the assessment of the reporting characteristics of 495 SRs, in 115 of the 221 SRs (52%) that investigated statistical heterogeneity, the I² values inappropriately guided the model choice (Bassani et al. 2019).

Ideally, the estimated I² should always be reported with a 95% CI (Ioannidis et al. 2007; Ioannidis 2008; von Hippel 2015). The reporting of CIs provides an understanding of the plausibility of the range values, especially when few or small studies populate the meta-analysis, where substantial uncertainty is inevitable (Thorlund et al. 2012). In our study, only 8 of 313 meta-analyses provided the 95% CIs for I². However, oral health meta-analyses include generally few primary studies, which may lead to wide, uninformative CIs that cover very small and very large I² values (Koletsi et al. 2016; Koletsi et al. 2018; Botelho et al. 2022). It is unclear whether the underreporting of CIs for I² results from a lack of awareness by review authors or the presence of uninformative CIs due to the limited included studies. Moreover, the underreporting of CIs for the heterogeneity measures could be related to the available features of the software used. For example, R and Stata allow users to program necessary analysis functions, ensuring the meta-analysis features are up to date with the relevant methodological literature.

The reported findings are concerning as they suggest a lack of awareness of the appropriate use of I² and τ² by SR authors. The issue of incorrect interpretation of heterogeneity based on I² is likely to be amplified, as meta-analyses undertaken within oral health typically involve few studies (Saltaji et al. 2013; Saltaji et al. 2016; Koletsi et al. 2018; Botelho et al. 2022; Tatas et al. 2024). To ensure a more accurate gauging of statistical heterogeneity, we recommend the following: (1) τ² should be routinely reported with its 95% CI for being a proper measure of statistical heterogeneity and integral element of the prediction intervals (Rücker et al. 2008; Borenstein 2024); (2) I² has an intuitive interpretation and ideally should be reported alongside its 95% CI in a forest plot to allow a visual inspection of the degree of statistical heterogeneity; (3) the meta-analysis model should be selected during the SR protocol stage rather than based on malpractices, including after calculating τ² and I² or inspecting the forest plot; and (4) the involvement of a statistician at the protocol stage to ensure appropriate evidence synthesis methods.

This study comprised SRs published in leading dental journals over a 3-y period. Non-English studies were excluded, which could have resulted in selection bias. This could be further compounded by the fact that only oral health journals with the highest current impact factor were included. Furthermore, the median number of included studies may not represent all available meta-analyses published in the oral health literature. However, the present study aimed to establish a baseline understanding of this specific statistical issue that can affect the clinical conclusions presented in SRs. Given the overall agreement of our findings with those previously published in the literature, we do not expect the results would have differed if all available dental meta-analyses were included.

Reporting the prediction intervals alongside the 95% CI of the summary estimate constitutes good reporting practices in a meta-analysis. Prediction intervals indicate the range of values within which the effect size of a future study would be found if randomly selected from the same population of the synthesized studies (Borenstein 2024). They are pivotal in the confidence placed in the summary results: a 95% CI that excludes the value of no difference accompanied by a wide prediction interval that includes the null value indicates the presence of nonnegligible statistical heterogeneity that needs to be investigated. However, we did not assess the reporting quality of the prediction intervals since this topic has already been investigated in recent bibliographic surveys (Faggion et al. 2021; Seehra et al. 2021; Tatas et al. 2023; Tatas et al. 2024).

Conclusions

The reporting quality of the statistical heterogeneity parameters in oral health SRs with meta-analysis is suboptimal.

There is evidence of an overreliance on I² when reporting and interpreting statistical heterogeneity.

Selecting the meta-analysis model based on the I² value is common in oral health systematic reviews and should be avoided.

The parameter τ² should be routinely reported alongside the 95% CI when a random effects model is used, and the prediction interval should be provided.

Misinterpretation of statistical heterogeneity can implicate the quality of conclusions delivered to the end-users of SRs. For instance, relying on I² and using naive categorizations to gauge the magnitude of statistical heterogeneity may spuriously affect decisions to investigate sources of statistical heterogeneity in further analyses and downgrade the quality of the available evidence, with negative implications for subsequent clinical practice recommendations.

Author Contributions

Z. Tatas, contributed to data acquisition, interpretation, drafted and critically revised the manuscript; E. Kyriakou, contributed to data acquisition, drafted and critically revised the manuscript; J. Seehra, contributed to data acquisition, critically revised the manuscript; N. Pandis, contributed to conception or design, drafted and critically revised the manuscript; L.M. Spineli, contributed to conception or design, supervision, data acquisition, interpretation, drafted and critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work.

Supplemental Material

sj-docx-1-jdr-10.1177_00220345251316279 – Supplemental material for Statistical Heterogeneity in Oral Health Meta-Analyses

Supplemental material, sj-docx-1-jdr-10.1177_00220345251316279 for Statistical Heterogeneity in Oral Health Meta-Analyses by Z. Tatas, E. Kyriakou, J. Seehra, N. Pandis and L.M. Spineli in Journal of Dental Research

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Data Availability

The data and the functions supporting the present study’s findings are publicly available at .

ORCID iDs

Z. Tatas

L.M. Spineli

A supplemental appendix to this article is available online.

References

Alba

Alexander

Chang

MacIsaac

DeFry

Guyatt

. 2016. High statistical heterogeneity is more frequent in meta-analysis of continuous than binary outcomes. J Clin Epidemiol. 70:129–135.

Bassani

Pereira

GKR

Page

Tricco

Moher

Sarkis-Onofre

2019. Systematic reviews in dentistry: current status, epidemiological and reporting characteristics. J Dent. 82:71–84.

Borenstein

2022. In a meta-analysis, the I-squared statistic does not tell us how much the effect size varies. J Clin Epidemiol. 152:281–284.

Borenstein

2023. How to understand and report heterogeneity in a meta-analysis: the difference between I-squared and prediction intervals. Integr Med Res. 12(4):101014. doi:10.1016/j.imr.2023.101014

Borenstein

2024. Avoiding common mistakes in meta-analysis: understanding the distinct roles of Q, I-squared, tau-squared, and the prediction interval in reporting heterogeneity. Res Synth Methods. 15(2):354–368.

Borenstein

Hedges

Higgins

Rothstein

. 2010. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 1(2):97–111.

Borenstein

Higgins

Hedges

Rothstein

. 2017. Basics of meta-analysis: I² is not an absolute measure of heterogeneity. Res Synth Methods. 8(1):5–18.

Botelho

Mascarenhas

Viana

Proença

Orlandi

Leira

Chambrone

Mendes

Machado

2022. An umbrella review of the evidence linking oral health and systemic noncommunicable diseases. Nat Commun. 13(1):7614. doi:10.1038/s41467-022-35337-8

Deeks

Higgins

JPT

Altman

McKenzie

Veroniki

(editors). Chapter 10: Analysing data and undertaking meta-analyses [last updated November 2024]. In: Higgins

JPT

Thomas

Chandler

Cumpston

Page

Welch

(editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook.

10.

Faggion

Jr Menne

Pandis

2021. Prediction intervals should be included in meta-analyses published in dentistry. Eur J Oral Sci. 129(6):e12827. doi:10.1111/eos.12827

11.

Fletcher

2007. What is heterogeneity and is it important? BMJ. 334(7584):94–96.

12.

Hedges

Pigott

. 2001. The power of statistical tests in meta-analysis. Psychol Methods. 6(3):203–217.

13.

Higgins

JP.

2008. Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 37(5):1158–1160.

14.

Ioannidis

JP.

2008. Interpretation of tests of heterogeneity and bias in meta-analysis. J Eval Clin Pract. 14(5):951–957.

15.

Ioannidis

Patsopoulos

Evangelou

2007. Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 335(7626):914–916.

16.

Kanters

2022. Fixed- and random-effects models. In: Evangelou

Veroniki

editors. Methods in molecular biology. Humana (NY): Springer. p. 41–65.

17.

Koletsi

Fleming

Michelaki

Pandis

2018. Heterogeneity in Cochrane and non-Cochrane meta-analyses in orthodontics. J Dent. 74:90–94.

18.

Koletsi

Valla

Fleming

Chaimani

Pandis

2016. Assessment of publication bias required improvement in oral health systematic reviews. J Clin Epidemiol. 76:118–124.

19.

Langan

Higgins

JPT

Simmonds

2017. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Res Synth Methods. 8(2):181–198.

20.

Lasserson

Thomas

Higgins

JPT

. Chapter 1: Starting a review [last updated August 2021]. In: Higgins

JPT

Thomas

Chandler

Cumpston

Page

Welch

(editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook.

21.

Melsen

Bootsma

MCJ

Rovers

Bonten

MJM

. 2014. The effects of clinical and statistical heterogeneity on the predictive values of results from meta-analyses. Clin Microbiol Infect. 20(2):123–129.

22.

Page

Altman

McKenzie

Shamseer

Ahmadzai

Wolfe

Yazdi

Catalá-López

Tricco

Moher

2018. Flaws in the application and interpretation of statistical analyses in systematic reviews of therapeutic interventions were common: a cross-sectional analysis. J Clin Epidemiol. 95:7–18.

23.

R Core Team. 2023. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. https://www.R-project.org/

24.

Richardson

Wilson

Nishikawa

Hayward

. 1995. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 123(3):A12–A13.

25.

Rücker

Schwarzer

Carpenter

Schumacher

2008. Undue reliance on I(2) in assessing heterogeneity may mislead. BMC Med Res Methodol. 8(1):79. doi:10.1186/1471-2288-8-79

26.

Saltaji

Cummings

Armijo-Olivo

Major

Amin

Major

Hartling

Flores-Mir

2013. A descriptive analysis of oral health systematic reviews published 1991–2012: cross sectional study. PLoS One. 8(9):e74545. doi:10.1371/journal.pone.0074545

27.

Seehra

Stonehouse-Smith

Pandis

2021. Prediction intervals reporting in orthodontic meta-analyses. Eur J Orthod. 43(5):596–600.

28.

Tatas

Koutsiouroumpa

Seehra

Mavridis

Pandis

2023. Do pooled estimates from orthodontic meta-analyses change depending on the meta-analysis approach? A meta-epidemiological study. Eur J Orthod. 45(6):722–730.

29.

Tatas

Kyriakou

Koutsiouroumpa

Seehra

Mavridis

Pandis

2024. Most meta-analyses in oral health do not have conclusive and robust results. J Dent. 149:105309. doi:10.1016/j.jdent.2024.105309

30.

Thompson

SG.

1994. Why sources of heterogeneity in meta-analysis should be investigated. BMJ. 309(6965):1351–1355.

31.

Thorlund

Imberger

Johnston

Walsh

Awad

Thabane

Gluud

Devereaux

Wetterslev

2012. Evolution of heterogeneity (I²) estimates and their 95% confidence intervals in large meta-analyses. PLoS One. 7(7):e39471. doi:10.1371/journal.pone.0039471

32.

von Hippel

PT.

2015. The heterogeneity statistic I² can be biased in small meta-analyses. BMC Med Res Methodol. 15:35. doi:10.1186/s12874-015-0024-z

33.

Wickham

2016. ggplot2: elegant graphics for data analysis. Springer-Verlag, New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB