Abstract
Background:
Subgroup analyses are widely used to evaluate the heterogeneity of treatment effects in randomized clinical trials. However, there is a limited investigation of the quality of prespecified and reported subgroup analyses in stroke trials. This study evaluated the credibility of subgroup analyses in stroke trials.
Methods and analysis:
We searched Medline/PubMed, Embase, the Cochrane Central Register of Controlled Trials, and the Web of Science from inception to 24 March 2021. Three reviewers screened, extracted, and analyzed the data from the publications. Primary publications of stroke trials that reported at least one subgroup effect and had published corresponding study protocols were included. The Instrument for Assessing the Credibility of Effect Modification Analyses (ICEMAN) was used to examine the quality of the subgroup effects reported, with each subgroup effect assigned a credibility rating ranging from very low to high. Subgroup effects with two or more “definitely no” responses received a low credibility rating. The risk of bias was assessed using the Cochrane Risk-of-Bias tool for randomized trials version 2.
Results:
Seventy-four articles met the inclusion criteria and reported a combined total of 647 subgroup effects. The median sample size was 1264 (interquartile range (IQR): 380–3876), and the median number of subgroups prespecified in the protocol was 6 (IQR: 2–10). Sixty-one (82%) studies used the univariate test of interaction. Of the total 647 subgroup effects reported in these studies, 319 (49%) were reported in acute stroke trials, while 423 (65%) had low credibility.
Conclusion:
The quality of subgroup analysis reporting in stroke trials remains poor. More effort is needed to train trialists on the best methods for designing and performing subgroup analyses, and how to report the results.
Trial registration number:
We prospectively registered the review with International Prospective Register for Systematic Reviews (registration number: CRD42020223133)
1. What is the issue and what we know so far about it?
■ Credibility of subgroup effects in clinical trials remains a major challenge because of the perils of subgroups that include increased risk of false-positive and false-negative conclusions
■ This study aims to assess the credibility of reported subgroup analyses in published stroke trials using the Instrument for Assessing the Credibility of Effect Modification Analyses (ICEMAN) criteria
2. What are the key findings from this study?
■ The credibility of reported subgroup analyses in stroke trials remained generally low.
■ Most stroke trials still do not report on the justification or direction for effect modification for the selected subgroups
■ Many of the studies that were reviewed used the incorrect statistical method to assess subgroup effects
3. What are the implications of the findings?
■ More effort is needed to train trialists on the best methods for designing and performing subgroup analyses, and how to report the results
■ Journals need to include the requirements to assess the credibility of reported subgroup effects as part of their reporting
1. Trialists must be better educated on the best methods for conducting subgroup analyses to prevent avoid misleading results
2. Biological and sample size considerations should take precedence to minimize data dredging
3. Journal editors and professional societies play an important role in improving the credibility of reported subgroup analyses
4. Checklists, such as the ICEMAN checklist, should be considered necessary for the publication of trial protocols and subsequent subgroup findings
Introduction
Evaluation of treatment effect heterogeneity is an integral part of the analysis of clinical trials.1–3 Identifying subgroups with different treatment effects generate new hypotheses for future clinical trial research and enables interventions to be tailored to specific patients. Subgroup analyses are commonly performed in stroke randomized controlled trials (RCTs) to evaluate the heterogeneity of treatment effects.4,5 Reporting the overall treatment effect without accounting for subgroup effects can be misleading when patients with different characteristics respond differently to the same intervention. In addition, incorrectly reported subgroup analyses lead to wrong conclusions, which harm clinical policies and practices.6,7
Several studies have drawn the attention of the clinical trials community to the prevalence and impact of incorrect reporting of subgroup analysis to support claims of treatment effect heterogeneity.8–11 Subgroup analysis was incorporated into the Consolidated Standards of Reporting Trials (CONSORT) statement in response to these findings. 12 Despite the addition of subgroup analysis to the CONSORT guidelines, selective reporting of significant subgroup analyses, a lack of prior evidence on potentially relevant subgroups, and a failure to use appropriate statistical analysis remain major concerns.13,14
Many authors have developed guidelines and checklists to improve the conduct and reporting of subgroup analysis findings.15–19 However, these checklists vary in the length and type of criteria included, which creates ambiguity in their applications. Schandelmaier et al. 20 recently created the Instrument for Assessing the Credibility of Effect Modification Analyses (ICEMAN) checklist, a shorter five-item Likert-validated checklist for assessing the credibility of subgroup analyses in RCTs. The ICEMAN checklist reduces the number of recommendations and assigns a credibility rating to each subgroup effect ranging from very low to high. While Farrokhyar et al. 21 mentioned ICEMAN as a detailed checklist for systematic reviews, Kilpeläinen et al. 22 used the ICEMAN checklist to investigate the credibility of subgroup effects in urology trials. Until now, the credibility of subgroup analyses reported in stroke trials has not been examined. This study aims to assess the credibility of reported subgroup analyses in published stroke trials.
Methods
Design and registration
This systematic review followed the guidelines outlined in the Cochrane Handbook of Systematic Reviews of Interventions. 23 The review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 24 The review protocol was prospectively registered at PROSPERO (registration number: CRD42020223133). The review was conducted using the Studies, Data, Methods, and Outcomes (SDMO) framework. 25 The study design (S) focused on RCTs, and the data (D) were primary stroke trial results published in any biomedical journal. The method (M) contrasts the prespecified analyses in protocol to those in the published results, and the outcome (O) was the reporting quality subgroup analysis in stroke clinical trials. The three medical subject headings (MeSH) used were RCTs, stroke trials, and subgroup analysis/effect modifications (Tables S1–S4: Supplementary Materials).
Eligibility criteria and search strategy
Eligible studies are included in this review if they (1) reported at least one subgroup analysis along with the primary outcome findings in the main publication, (2) had published corresponding study protocols, and (3) were RCTs. Non-human trials, systematic reviews, literature reviews, meta-analyses, gray literature, conference abstracts, and publications in languages other than English were excluded. Studies whose protocols were not published or made available elsewhere were also excluded. Four electronic medical databases were searched for relevant publications from inception until 24 March 2021, namely, (1) Medline/PubMed, (2) Embase, (c) the Cochrane Central Register of Controlled Trials (CENTRAL), and (3) Web of Science. The identified studies were imported into Covidence, a web-based application for conducting systematic reviews. 26 Covidence eliminated duplicate studies and provided independent reviewers with a customized review template.
Selection of studies and data extraction
Three reviewers (A.A., J.A., and A.O.) independently screened titles and abstracts in duplicate to identify articles that met the inclusion criteria. These reviewers completed the full-text review of each eligible study and retrieved the necessary data. Disagreements among reviewers were resolved via a consensus process involving T.S. and B.K.M. Data extracted from each included study were the first author’s name and country, year of publication, trial sample size, type of intervention, the prespecified subgroup analysis in the original protocol, and the number of subgroups analyzed in the primary publication. Other data extracted were the subgroup effect studied, the predicted direction of the effect modification, whether the authors justified the subgroup effect, and whether they used an appropriate statistical test.
Assessment of subgroup analyses’ credibility and risk of bias
The reviewers independently graded the credibility of each reported subgroup effect in duplicate using ICEMAN criteria. The ICEMAN questions (and answers) with the overall ratings are listed (Table S5 and Figure S1). Each subgroup effect was assigned a credibility rating ranging from “Very Low” to “High” based on the responses to the questions. A subgroup effect was assigned a “Very low” credibility rating if all the responses were “definitely no” or “probably no,” and a “Low credibility” rating if there were at least two “definitely no” responses. Furthermore, a subgroup effect was given a moderate credibility rating when there was only one “definitely no” or two “probably no” responses, and a high credibility rating when none of the responses were “definitely no” or “probably no.” Finally, the reviewers used the Cochrane Risk-of-Bias tool for randomized trials version 2 (RoB-2) to critically assess the risk of bias in duplicate for each study. 27
Data analysis
After rating the effects with the ICEMAN, a descriptive analysis of the eligible subgroup effects was performed. These analyses were stratified by the type of stroke trial (acute stroke, primary, and prevention), sample size, and credibility. Trend analysis was performed on the extracted meta-data using three time periods: publications before 2010, between 2010 and 2014, and publications after 2014. The time intervals were selected to account for scientific progress in stroke trials. The revised CONSORT statement was published in 2010, and 2015 was recognized as the year of endovascular treatment due to many landmark stroke trials.12,28 Furthermore, the descriptive analysis of the risk of bias based on the five domains of the RoB-2 was provided. All analyses were conducted in R 4.2.0 and Stata 17.0.29,30
Results
A total of 9234 studies were imported into Covidence from the four electronic databases, with 2889 duplicates removed and 6345 articles screened. The reviewers eliminated 5798 publications as irrelevant at the title and abstract screening stage. After excluding 473 studies at the full-text review stage, 74 publications met the inclusion criteria (Table S6). There was substantial agreement among the reviewers at the title and abstract screening, with an interrater agreement κ = 0.69. See the PRISMA workflow diagram in Figure 1. Also, the PRISMA checklist is shown in Table S7.

The PRISMA flow diagram of study selection.
Characteristics of included studies
Table 1 describes the characteristics of the included studies. The 74 studies included in the review had a total sample size of 249,305 trial participants and 647 subgroup effects, with a median study sample size of 1264 (IQR: 380–3876). There were 42 (57%) acute stroke trials and 73 (99%) were multicenter trials. Twenty-five studies (34%) used the modified Rankin scale as the primary outcome measure, while 32 (43%) used time-to-event outcomes. Thirty studies (40%) had favorable results (i.e. significant treatment effect based on the primary outcome). The median number of subgroups prespecified in the protocol was 6 (IQR: 2–10), and the median number of subgroups examined was 8 (IQR: 5–12). The univariate test of interaction was used in 61 studies (82%).
The main characteristics of included articles (n = 74).
Q1: first quartile; Q3: third quartile.
One-at-a-time interaction.
Subgroup effect credibility and trend analysis
Tables 2 and 3 describe the characteristics and the credibility of subgroup effect reported in trial publications. Three hundred nineteen (49%) subgroup effects were in acute stroke trials, 421 (65%) subgroup effects were prespecified in the protocols, 185 (29%) subgroup effects were in analyses that took statistical power into account, and 40 (6%) subgroup effects were significant (Table 2). Six (1%) subgroup effects had an expected direction of effect, the reasons for conducting the analysis were provided in 146 (23%) subgroup effects, and 27 (4%) subgroup effects were in analyses that considered fewer subgroup effects (Table 3). The scatterplot comparing sample size to the number of subgroup effects examined is shown in Figure 2. Overall, 34 (5%) subgroup effects were very low in credibility, 423 (65%) were low in credibility, 184 (29%) were moderate in credibility, and 6 (1%) were high in credibility. Trend analyses of reported subgroup effects by year of publication show that 12 (11%) subgroup effects reported in trials published before 2010 had moderate credibility rating, while 130 (35%) subgroup effects reported in trials published after 2014 had moderate credibility rating (Figure 3).
Characteristics of subgroup effect reported in trial publications (n = 647).
Categorized the subgroup effect without providing justification.
The credibility of subgroup effects (n = 647).
ICEMAN: Instrument for Assessing the Credibility of Effect Modification Analyses.
Rating was performed using the ICEMAN checklists.

Scatterplot of sample size against the number of subgroup effects analyzed.

Trend analysis of quality of reporting among the included studies.
Risk of bias
Three bias risk domains were at least 90% low risk (Figure S2). Many studies used an allocation sequence, concealed treatment allocation, and balanced baseline covariates. Few studies used the prospective randomized open-blinded endpoint (PROBE) study design, making blinding participants and trial personnel difficult. However, this is a design feature of the study rather than a deviation from the intended intervention domain. The measurement of the outcome and the selection of the reported result domains had reduced risk relative to other domains (i.e. 74% and 55%, respectively). Using an ineffective outcome measure and the failure to blind the outcome assessors led to a high risk of bias in the measurement of the outcome domain. The discrepancy between the prespecified analyses in the protocol and the published analyses was the source of the increased risk within the selection of reported results domain.
Discussion
Evidence from this review revealed that the credibility of reported subgroup analyses in stroke trials remained generally poor. Despite the plethora of checklists available to guide subgroup analysis, majority of stroke trials still do not report on the rationale or direction for effect modification for the selected subgroups. In addition, most reviewed studies used the incorrect statistical method (i.e. univariate interaction), resulting in multiple statistical significance tests and an inflated overall Type II error.
The poor credibility of report subgroup analyses highlights the need for a multipronged approach to address this issue. Regrettably, the low credibility of published subgroup analyses observed in this review of stroke trials is consistent with findings from systematic reviews of published trials in other disciplines. The used of the ICEMAN checklist in other studies have showed that subgroup effects had a very low to low credibility rating. Kilpeläinen et al. 22 criticized the poor conduct and reporting of subgroup analyses in urology trials. Using data from the well-known Prostate Cancer Intervention Versus Observation Trial (PIVOT) trial 31 that has influenced clinical practice guidelines, 32 they demonstrated the use of ICEMAN for assessing the credibility of findings from subgroup analyses and argued that results of subgroup analysis of this trial had low credibility rating. Furthermore, Saragiotto et al. 33 examine the credibility of subgroup analyses in back pain trials and concluded that subgroup analyses in these published trials had low credibility rating. Similarly, Wallach et al. 34 discovered that efforts to verify statistically significant subgroup differences claimed in many RCTs are uncommon, and when they do occur, the claimed subgroup differences are not replicated. Here we provide a few recommendations to address this issue. First, there is need for more education of trialists on the best methods for performing subgroup analyses to minimize misleading results. Given that many reported subgroup analyses are underpowered, biological and sample size considerations should be used to guide the selection of few subgroups that will be conducted to avoid data dredging.35–37 On the other hand, editorial boards of journals and professional societies play an important gate-keeping role in improving the credibility of reported subgroup analyses. We recommend medical journals adopt editorial policies that mandate authors to report subgroup analyses consistent with an approved checklist, such as ICEMAN checklist, as part of their requirements for publishing trial protocols and subsequent findings.
Our review included many studies to ensure that the credibility of stroke trial subgroup effects reporting was thoroughly examined. Rather than focusing on a single intervention, the review was designed to include all stroke trial settings (primary, secondary, and acute stroke treatment). A major limitation of this study is the generalizability of our study findings. This review of published stroke trials was not exhaustive since it was restricted to the top four medical databases excluded studies in languages other than English. It is possible we might have missed other relevant studies. Second, this review focused on reported results from primary publications from each trial and excluded trials in which subgroup analyses are only reported in detail in a separate paper. Finally, our review excluded 88 trials without published protocols (as a separate manuscript or as a supplementary document to the primary publication). These excluded studies are more likely to be trials published in journals where publication of study protocols is not mandated at all, or published before medical journals mandated trial protocol publication. Nevertheless, we believe that this study is inclusive of high-quality stroke trials and the reported findings are likely to be unchanged even if eligibility criteria are relaxed to be inclusive of all stroke trials.
In conclusion, this review highlights the need for improvement in the credibility of subgroup analyses in stroke trials. Although there have been some trends toward improved reporting of subgroup analyses in recent years, more work is needed to achieve a reasonable standard for reporting subgroup findings. Reporting guidelines, such as the ICEMAN checklists, are recommended to guide the choice of subgroups to be investigated for treatment heterogeneity and the appropriate analyses to be conducted to improve the credibility of reported subgroup analyses in stroke trials.
Supplemental Material
sj-docx-1-wso-10.1177_17474930231168517 – Supplemental material for The credibility of subgroup analyses reported in stroke trials is low: A systematic review
Supplemental material, sj-docx-1-wso-10.1177_17474930231168517 for The credibility of subgroup analyses reported in stroke trials is low: A systematic review by Ayoola Ademola, Lehana Thabane, Joel Adekanye, Ayooluwanimi Okikiolu, Samuel Babatunde, Mohammed A Almekhlafi, Bijoy K Menon, Michael D Hill, Kevin A Hildebrand and Tolulope T Sajobi in International Journal of Stroke
Footnotes
Authors’ contribution
AA, LT, and TTS conceptualized the study; AA drafted the initial version of the manuscript; AA, JA, AO, and SB participated in the review and data extraction. All authors read, critically revised, and approved the final manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: AA received doctoral funding from the Prevention of Post-Traumatic Contractures with Ketotifen II (PERK II) (supported by the United States Army Medical Research Acquisition Activity, United States Department of Defense). Also, AA received the Eyes High International Doctoral Scholarship and the Alberta Graduate Excellence Scholarship from the University of Calgary.
Availability of data and materials
The data obtained or analyzed during this study were included in the manuscript (and its supplementary information files).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
