Abstract
Purpose of the Review:
Though cluster randomized trials (CRTs) provide robust evidence for intervention by controlling contamination of interventions, there could be some loss of statistical efficiency. The Consolidated Standards of Reporting Trials (CONSORT) recommends reporting intraclass correlation coefficients (ICC) to understand this phenomenon, though not many studies follow this. This meta-analysis explored the compliance of CRTs in major depression for reporting ICC besides deriving the pooled ICC and pooled mean differences of intervention outcomes.
Collection and Analysis of Data:
Thirty-four articles on CRTs in major depression were identified from PubMed, Cochrane Library, PsychINFO, and Embase, and relevant data were extracted. Only 20 studies were eligible for meta-analysis of intervention, among which 8 reported ICC. We used DerSimonian and Laird’s inverse variance method to calculate the pooled estimates. Only eight (40%) of the CRTs reported using ICC both for designing the study and examining intervention outcomes. The pooled ICC was 0.07 (95% confidence interval [CI]: 0.05, 0.09) with a low heterogeneity (I2 = 28%). Among the 20 studies, 65% used different psychosocial methods alone as intervention, with substantial heterogeneity. The pooled standardized mean difference of depression scores (–0.46; 95% CI: –0.79, –0.13) indicated the effectiveness of psychosocial interventions irrespective of combined pharmacotherapy (z = 2.71, p value = 0.01). Further, a subgroup analysis of intervention effects revealed that the results were significant only for the CRTs with ICC conformity.
Conclusions:
The ICCs can affect the intervention outcomes. Therefore, as indicated by this meta-analysis, CRTs must adhere to the CONSORT guidelines on reporting ICC. Future CRTs on major depression can utilize the pooled ICC estimate from this study, especially for sample size estimations.
Keywords
Randomized controlled trials (RCTs) are generally preferred to examine the intervention outcomes. However, RCTs cannot effectively control the contamination of interventions in population-based studies, thus increasing the risk of type II errors. 1 In this context comes the role of cluster randomization trials (CRTs), which follow a population-based approach for large-scale intervention studies. However, the homogeneity among units within clusters may increase the standard error of the outcome variable and decrease the power of the study.2–4 A more homogenous cluster will have a small within-cluster variance, while the between-cluster variance will be large, which can be revealed by the intraclass correlation (ICC, ρ).5–8 The ICC estimates are useful to account for clustering after calculating the sample size using conventional methods. The design effect uses the formula, [1+ (m-1)* ρ], where ρ represents the ICC and m is the average cluster size. 9 The Consolidated Standards of Reporting Trials (CONSORT)10–11 recommended reporting both adjusted and unadjusted estimates to indicate the extent of clustering.7–8 Since many peer-reviewed journals require the CRT reports to adhere to the CONSORT recommendations, this article examines the relevance of ICC, taking the example of CRTs on major depression, which is the fourth leading cause of disability worldwide.12–14
Major depression is associated with numerous adverse outcomes, including difficulties in role transitions, reduced role functioning, elevated risk of a wide range of secondary disorders, and increased risk of early mortality due to physical disorders and suicide.14–15 Antidepressant medications and psychotherapies are indicated to manage the condition.12,15–17 In this regard, this study aims to report the compliance of CRTs related to intervention outcomes in major depression with the CONSORT recommendations on reporting ICCs through a meta-analytic approach.
Methods
This study obtained the Institute’s Ethical Clearance before registering the protocol with the PROSPERO database [CRD42020177857]. Published articles with the following criteria were considered for inclusion in the study: i) CRTs on major depression where in the study identified the CRT status either in the title, abstract, or in the text; ii) studies conducted on humans; iii) published in the English language; and iv). published between January 1, 2004 and December 31, 2020. Accordingly, the databases such as PubMed, Cochrane Library, Embase, and PsychINFO were searched. Appropriate search terms were used in advanced search options in databases using a combination of main keywords, Boolean operators, [tiab], and Mesh terms. The use of both MeSH and tiab terms enhanced the likelihood of finding all relevant articles related to the search criteria. The search terms used in PubMed included (“Cluster randomized trial”[MeSH] OR “Cluster randomized controlled trial” [tiab] OR “Cluster randomized controlled trial”[MeSH] OR “Cluster randomi*” [tiab] OR “Cluster-randomi*”[tiab]) AND (“Major depression”[MeSH] OR “Major depress*”[tiab] OR “Major depressive disorder”[MeSH] OR “Major depressive disorder”[tiab]). A similar approach was adopted for the Cochrane Library, Embase, and PsychINFO databases. The CRTs identified from these databases were imported into EndNote X9 to check for duplication. Then, two reviewers (AJ and PKM) screened the studies independently for eligibility by evaluating the title and abstract and referred to a third reviewer (BB, TK, KT or GM) when a discrepancy was noted. Accordingly, all the relevant articles were identified and accessed. The data related to the type of intervention, duration of the study, primary and secondary outcomes, cluster details, ICC, and sample size were extracted manually.
Statistical Analysis
Preliminary Analysis on ICC
Compliance with CONSORT guidelines on reporting of ICC was noted in terms of the ICC used by the individual studies for sample size estimation and the resulting ICC for each primary outcome. Appropriate descriptive statistics were used to summarize the ICC. The ICC value used at the design stage and the resulting ICC value were compared for each study using the Wilcoxon signed rank test. Further, the statistical approach used for accounting clustering in selected CRTs was evaluated. Finally, a random-effects meta-analysis of Fisher’s z-transformed ICC was carried out using DerSimonian and Laird’s inverse variance method to calculate the pooled ICC estimate and its 95% confidence interval (CI). Eight studies were included in the meta-analysis.
Meta-Analysis of the Effectiveness of Interventions
The studies included in the meta-analysis were assessed using the PICO (Population, Intervention, Comparison, and Outcomes) format recommended by the Cochrane Collaboration. The intervention approaches used in the CRTs were coded as per the intervention programs recommended by the American Psychological Association (2019). 18 Hedge’s g standardized mean differences (SMDs) and its 95% CIs were calculated as the study outcome was continuous, wherein a negative SMD indicated that the intervention was effective. The quality of the included studies was evaluated using the Cochrane risk of bias assessment tool. We adopted a random-effects model and used the inverse-variance method based on DerSimonian and Laird’s estimation for estimating the summary measure and the heterogeneity variance (τ2).
The I2-statistic measured the impact of heterogeneity. Egger’s test was used to evaluate the effects of small studies, and the funnel plot was used to check for publication bias. We adopted an intention-to-treat analysis, wherein the total sample size randomized for the intervention and the placebo groups at the beginning of the study were considered instead of completers. Thus, our findings minimized the effect of attrition bias. We performed two sensitivity analyses, one for change in depression scores after excluding small studies (Supplementary Figure 3) and the other excluding high risk of biased trials (Supplementary Figure 4), and four subgroup analyses. The statistical significance was checked at a 5% level of significance. All analyses were conducted using the Cochrane RevMan 5.3 and Stata 14.0 software.
Results
The systematic search resulted in 305 records [PubMed (58 records), Cochrane Library (101 records), PsychInfo (97 records), and Embase (49 records)]. After removing the duplications, 168 records were retained for further screening. Further, 134 records were excluded for various reasons. Finally, 34 articles were found eligible for systematic review, among which eight were suitable for the meta-analysis of ICC. Including them, 20 studies were eligible for the meta-analysis of intervention outcomes (see Figure 1).
Studies Included in the Meta-Analyses.
Reporting of ICC by Individual CRTs
Among the 34 CRTs included in the systematic review, there was only fifty percent of studies used ICC for sample size calculation; 44% of studies reported ICC for specific outcomes only; and 32% of studies reported the ICC values used in the sample size estimation and the resulting ICC as well. However, data for the intervention effects on out-come were available only for 20 (59%) studies, and 8 (40%) studies reported both intervention effects on outcomes as well as ICC. The median ICC used in various trials at the design stage was 0.05, with first and third quartiles as (0.01, 0.06) at the design stage, whereas the ICC estimates were 0.06 (0.03, 0.10) for those reported after completion of the study. However, ICC values did not significantly differ with regard to the stage of consideration (Mann-Whitney U-statistic = 45.5; p value = 0.27), indicating a consensus across the two stages.
Meta-Analysis of ICC
Eight CRTs were subject to a meta-analysis of ICC values. The pooled ICC estimate was 0.07 (95% CI: 0.05, 0.09), with an I2, a measure of heterogeneity, of 28% (
Forest Plot for ICC with I2– statistic = 28%.
The funnel plot appears asymmetric, with smaller studies tending to have larger ICC values (Supplementary Figure 1). This may suggest publication bias. Further, the estimated bias coefficient was –2.07 in Egger’s test for small study effects, with a standard error of 0.69. However, this test reported a p value of 0.02, which indicates significant evidence for the presence of small-study effects.
Meta-Analysis of the Effectiveness of Interventions
Among 20 CRTs, 13 (65%) studies used different psychosocial methods alone as an interventional strategy, whereas the remaining 35% of studies used pharmacological treatment, such as antidepressants in combination with psychosocial interventions. The psychosocial interventions ranged from counseling, psychoeducation, supportive work, family education, and cognitive-behavior therapy (CBT) to interpersonal therapy. Three studies have also followed a group format for intervention.19–21 However, only five studies used evidence-based interventions such as CBT and interpersonal psychotherapy (IPT).20–24 Remaining studies have used interventions that are either conditionally recommended (n = 8; 40%) or have insufficient evidence (n = 7; 35%). Half of the studies were of one-year duration for intervention, and the rest had a duration of 3 to 18 months.
The majority of the studies (75%) had participants in the age range of 18–75 years, while the remainder had participants either below 18 or above 55 years of age. Among all, only three studies were exclusively conducted among the female population.23–25 Further, 44% of studies have used the Patient’s Health Questionnaire-9 (PHQ-9) to measure depression severity. Other studies used a variety of standardized rating scales such as measures including the Hamilton Depression Rating Scale (HDRD/ HAMD), the Beck Depression Inventory (BDI), the Edinburgh Postnatal Depression Scale (EPDS), the Center for Epidemiologic Studies Depression Scale (CES-D), and Major Depression Inventory (MDI). A total of eight studies were reported and accounted for the clustered nature of data.24, 26–32
Based on the twenty CRTs, an attempt was made to assess the effectiveness of intervention on depression scores measured using different scales. The summary measure [SMD (95% CI): -0.46 (–0.79, –0.13)] for the depression scores measured in psychosocial studies as well as studies with psychosocial combined with antidepressants was statistically significant (z = 2.71, p value = 0.01). The heterogeneity among the studies was very high (I2 = 99%). Due to small trials, three studies exhibited broader confidence intervals in effect size.19,30,33 However, the intervals were narrow in large trials.34, 35 (Figure 3).
Forest Plot for Change in Depression Scores Based on Interventions with the Assessment of Risk of Bias.
Though we wanted to examine the ICC effects for the intervention outcomes where mental health professionals were involved, we could not achieve this because there was only one among the 20 qualified studies was eligible for the meta-analyses. 36
Publication Bias
Egger’s test results suggested that there were no small-study effects (p value = 0.20). However, the reporting bias cannot be reliably interpreted because the symmetricity of the funnel plot was unclear due to the presence of several outliers (Supplementary Figure 2).
Risk of Bias Assessments
The majority of the studies adopted a random sequence generation technique and used appropriate statistical models to synthesize the missingness in the study outcome. The risk of bias assessment indicated that most studies carried low risks of selection and attrition bias and high risks of detection bias (Figures 3 and 4).
Sensitivity Analyses
Sensitivity analyses were done by excluding (a) small studies,19, 30, 33 and (b) high-risk biased studies.31,33, 35, 37 On excluding the effects of small-sized trials (Supplementary Figure 3), the intervention effects [SMD (95% CI): –0.45 (–0.81, –0.10)] remained significant (z = 2.48, p value = 0.01). Likewise, on excluding the effects of high-risk biased trials (Supplementary Figure 4), the intervention effects [–0.53 (–0.97, –0.09)] remained statistically significant (z = 2.34, p value = 0.02) as well. The impact of heterogeneity (I2 = 99%) did not change in both the sensitivity analyses.
Risk of Bias Graph: Review Authors’ Judgments About Each Risk of Bias Item Presented as Percentages Across All Included Studies.
Subgroup Analyses
Subgroup analyses were conducted for (a) studies that used outcome measure as PHQ-9 scale22, 35–37, 40–43 versus studies other than PHQ-9 scale19–21, 24, 25, 28, 30, 31, 33, 34, 38, 39; (b) studies wherein general practitioners were involved30, 31, 33, 34, 37, 41–43 versus studies when they were not involved19- 25, 35, 36,38–40; (c) studies with psychosocial intervention19–21, 23, 24, 34–36, 39–43 versus studies with psychosocial intervention and pharmacotherapy22, 25, 30, 31, 33, 37, 38; and lastly (d) studies that reported ICC22-24, 34, 35, 38, 42, 43 versus studies that did not report ICC.19-21, 25, 28, 30, 31, 33, 37, 39–41
Summary measures from Supplementary Figures 5 and 6 indicated that the intervention was significantly more efficacious than the control in studies that used the PHQ-9 scale [SMD (95% CI): –0.36 (–0.60 to –0.12); z = 3.00; p value = 0.01; I2 = 94%]; however, it was not statistically significant in studies that used scales other than PHQ-9 [–0.50 (–1.12 to 0.11); z = 1.60; p value = 0.11; I2 = 99%], respectively.
Interestingly, summary measures from both studies wherein general practitioners were involved [–0.16 (–0.30 to –0.01); z = 2.15; p value = 0.03] (Supplementary Figure 7) and when they were not involved [–0.59 (–1.14 to –0.04); z = 2.10; p value = 0.04] (Supplementary Figure 8) indicated that the intervention was significantly more efficacious than control. However, heterogeneity found in studies where general practitioners were involved was much lower (I2 = 70%) than in studies when they were not involved (I2 = 99%).
Further, from Supplementary Figures 9 and 10, summary measures indicated that the intervention was significantly more efficacious than control in studies with psychosocial interventions [–0.47 (–0.92 to –0.02); z = 2.07; p value = .04; I2 = 99%]; however, it was not statistically significant in studies with psychosocial interventions and pharmacotherapy [–0.42 (–0.86 to 0.02); z = 1.88; p value = .06; I2 = 99%], respectively. In addition, heterogeneity was considerably high in both groups.
Finally, we performed a subgroup analysis of the intervention effects among the studies in reference to ICC conformity. Accordingly, it was found that the intervention studies that reported ICC (Supplementary Figure 11) were significantly more effective than control [–0.39 (–0.62 to –0.15); z = 3.26; p value = 0.01; I2 = 97%]; whereas the studies that did not report ICC (Supplementary Figure 12) were found to be statistically insignificant [–0.51 (–1.29 to 0.27); z = 1.28; p value = 0.20; I2 = 99%].
Although the primary analysis indicated considerable heterogeneity, adherence to the protocols and involvement of general practitioners would have reduced it from high to moderate. However, statistical heterogeneity could not be explained by excluding small studies, high-risk biased studies, and studies with scales other than PHQ-9. Among the 34 studies, only 12 clearly mentioned the type of intervention carried out under the CRT. With regard to the level of evidence for the intervention employed in the CRTs, 16 (47.1%) had the recommended interventions, 12 (35.3%) had the conditionally recommended interventions, and 6 (17.6%) had the interventions that had insufficient evidence.
Discussion
The CRTs help study intervention outcomes on a large population with minimal resources compared to other allocation strategies. The scope of CRTs will be enhanced if we know the degree of correlation among the clusters. In this regard, the ICCs provide the necessary information to be utilized in sample estimation and statistical analyses of CRTs. However, reporting the ICCs became mandatory only in 2004, when the CONSORT guidelines were extended. 7 Against this backdrop, this study explored the CRTs in major depression for reporting ICC as recommended by CONSORT, calculating the pooled ICC, and evaluating the intervention outcomes. Accordingly, we have selected the studies published from 2004 onward for this research.
Meta-Analysis of ICCs
In our study, it was observed that only 50% of the studies reported that ICC values were used at the design stage for sample size calculation. Only 44% of the studies reported ICC values for specific outcomes. Moreover, about one-third of the eligible studies only complied with the CONSORT recommendation on ICC. Further, ten studies (28%) did not clarify the methods adopted for accounting for the clustering. A large number of eligible CRTs used multilevel modeling (n = 9) and linear mixed modeling (n = 8) approaches to analyze the data to account for the clustering effect, whereas only three studies reported the use of generalized estimating equations. Nonetheless, examination of the studies (n = 11) that reported ICC values both at the design stage and after study completion for depression-specific outcomes did not reveal any statistically significant difference in the ICC scores. The meta-analyses of ICC revealed that the pooled ICC for various measures used in CRTs on major depression was 0.07, which is relatively small. This finding is in line with the magnitude of ICC suggested by Bland (2000), which is generally less than 0.1in CRTs. 44 Even though the level of clustering yielded from this meta-analysis is low, the statistical power of the individual CRTs would have been compromised if the ICCs had not been considered in the research design. Therefore, the reported ICC can be used in future research while planning a CRT to maintain adequate power to detect a clinically significant treatment effect.
Evidence for Intervention
Studies included in the meta-analysis to assess the intervention effects on depression scores had an enormous amount of heterogeneity, irrespective of the measures used for the study. This could be due to differences in study settings, populations, varying interventions, doses of antidepressants, duration of the study, study outcomes, assessment tools, and so on. In order to check the robustness of the conclusions from the meta-analysis of CRTs, various sensitivity analyses were carried out, and they showed a statistically significant difference with substantial heterogeneity.
This study found that the CRTs used a variety of interventions, with very few studies (n = 12) clearly identifying the evidence-based interventions such as CBT, supportive psychotherapy, or interpersonal psychotherapy that are generally recommended for the treatment of major depression.18, 45 Further analysis revealed that four more studies have employed evidence-based interventions. It was also noted that quite a few CRTs employed interventions that were conditionally recommended or had insufficient evidence. There were only 5 out of 34 studies (15%) that had mental health professionals such as trained psychiatrists, clinical psychologists, or psychiatric social workers involved in the designing and implementing of the therapies. Even then, there were no details required, such as the fidelity of the intervention and contamination of the intervention. However, CRTs generally generate evidence for a specified intervention, psychological or pharmacological, or both, in major depression. The quality of intervention can vary depending on the competence of the personnel involved in the intervention delivery. In this regard, we wanted to examine the differences in the effectiveness of the intervention, if any, depending on the involvement of mental health professionals and other personnel. However, we could not achieve this because there was only one among the 20 eligible CRTs that involved mental health professionals in the intervention delivery, rendering it not conducive for a meta-analysis. The effects of the intervention varied further when the ICC conformity was considered. Only those studies with ICC accommodation at the design and analysis level alone revealed significant intervention effects. These findings strengthen the need to comply with CONSORT guidelines when reporting ICC.
This study has certain merits and limitations, which need to be considered to understand its scope and for planning future research. One of the strengths of this meta-analysis is the adoption of multiple databases for literature searches. In addition, we were able to identify a considerably large number of studies for systematic review for CONSORT adherence and a meta-analysis of ICC and interventions in CRTs. Moreover, this study provided substantial evidence of heterogeneity in CRTs. The limitation was that we could include only the studies published in the English language. Despite this limitation, this study highlights the relevance of ICC and assesses the effectiveness of various interventions for the treatment of major depression using the data reported worldwide. To the best of our knowledge, no such meta-analysis approach was used for a pooled ICC and change in depression scores to understand the intervention effects in CRTs.
Conclusions
Though the CONSORT recommendation focused on helping researchers evaluate the appropriateness of the sample size estimation and provide the magnitude of clustering for each outcome, many authors still need to incorporate it in their published manuscripts. Our review highlights the relevance, use, and reporting of ICC in CRTs on major depression. Compliance with the CONSORT recommendation on reporting of ICC was found only in one-third of eligible studies, which is a serious concern, too. Further, our study also revealed that the ICCs can affect the intervention outcomes. Therefore, CRTs must adhere to the CONSORT guidelines on reporting ICC, as indicated by this meta-analysis. Hence, all journals need to actively encourage future CRTs to adhere to CONSORT recommendations on ICC as an essential criterion for publishing it. At a practical level, researchers can utilize the calculated, pooled ICC estimate from this study for future trials as a reference value for sample size estimation, targeting major depression. Lastly, this study indicates the effectiveness of various interventions on depression scores.
Supplemental Material
Supplemental material for this article is available online.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Declaration Regarding the Use of Generative AI
None used.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the intramural grant of the National Institute of Mental Health and Neurosciences (NIMHANS), Bengaluru, India [Grant number: NIMH/PROJ/00578/ 2018–2019].
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
