Sage Journals: Discover world-class research

Abstract

Background/Aims

Cluster randomized trials with multiple endpoints feature complex correlation structures. Estimating treatment effects in a meaningful way that respects differences in scale and clinical importance is challenging. Pairwise comparison methods address these challenges by constructing all pairs featuring one treatment and one control participant, then evaluating endpoints in hierarchical order. For cluster randomized trials featuring such a “hierarchical composite endpoint,” we develop large-sample confidence interval estimators and hypothesis tests for the nonparametric treatment effect referred to here as the “win probability.”

Methods

For each pair of participants (one treated and one control), responses on each endpoint are compared in order of descending clinical importance until it can be determined which participant responded better (“won”) or all endpoints are exhausted. Dividing the number of wins attributed to the treatment arm by the total number of “pairwise comparisons” yields a point estimate of the win probability. The win probability can be transformed into alternative effect measures, including the “win difference” and “win odds.” A two-stage procedure, or “win fraction” approach, is used to obtain variance estimators for the win probability. Each participant’s multivariate response is transformed into a univariate “win fraction,” which quantifies the proportion of times they won when compared to all participants in the comparison arm. A working linear mixed model is applied to the win fractions to obtain cluster-adjusted point estimates of the win probability and its variance. Inference proceeds by the central limit theorem. Simulation is used to assess the performance of the proposed estimators for a hierarchical composite endpoint comprised of one binary component (more important) and one continuous component (less important) across a range of cluster trial designs. Performance of an empirical bootstrap estimator is also investigated. A case study using data from the REACT cluster trial demonstrates application of the methods, and corresponding SAS and R code is provided.

Results

Simulation suggests that the nominal 95% coverage probability is well maintained and type I error is controlled. Due to the large-sample nature of our method, confidence intervals may be conservative (over coverage) for fewer than 30 clusters. In comparison, the empirical bootstrap estimator is liberal (under coverage) for all numbers of randomized clusters (up to 50).

Conclusion

Our win fraction method uses a working linear mixed model to obtain confidence intervals and hypothesis tests which respect coverage and type I error. It is faster than the bootstrap, applicable to multiple components on different scales, bypasses specification of complex correlation matrices, permits adjustment, and can be implemented in existing software.

Keywords

Cluster randomized trials hierarchical composite endpoints pairwise comparison win probability win odds net benefit win difference linear mixed model U-statistics

Get full access to this article

View all access options for this article.

References

Donner

Klar

. Design and analysis of cluster randomization trials in health research. London: Arnold, 2000.

Gregson

Taylor

Owen

, et al. Hierarchical composite outcomes and win ratio methods in cardiovascular trials: a review and consequent guidance. Circulation 2025; 151: 1606–1619.

Lim

Brown

Helmy

, et al. Composite outcomes in cardiovascular research: a survey of randomized trials. Ann Intern Med 2008; 149: 612–617.

Zou

Choi

Y-H

. Distribution-free approach to the design and analysis of randomized stroke trials with the modified Rankin Scale. Stroke 2022; 53: 3025–3031.

Zou

Smith

Zou

, et al. A rank-based approach to design and analysis of pretest-posttest randomized trials, with application to COVID-19 ordinal scale data. Contemp Clin Trials 2023; 126: 107085.

Zou

Qiu

. Parametric and nonparametric methods for confidence intervals and sample size planning for win probability in parallel-group randomized trials with Likert item and Likert scale data. Pharm Stat 2023; 22: 418–439.

Evans

Rubin

Follmann

, et al. Desirability of Outcome Ranking (DOOR) and Response Adjusted for Duration of Antibiotic Risk (RADAR). Clin Infect Dis 2015; 61: 800–806.

Finkelstein

Schoenfeld

. Combining mortality and longitudinal measures in clinical trials. Stat Med 1999; 18: 1341–1354.

Buyse

. Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Stat Med 2010; 29: 3245–3257.

10.

Pocock

Ariti

Collier

, et al. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J 2012; 33: 176–182.

11.

Dong

Hoaglin

Qiu

, et al. The win ratio: on interpretation and handling of ties. Stat Biopharm Res 2020; 12: 99–106.

12.

Mao

. Defining estimand for the win ratio: separate the true effect from censoring. Clin Trials 2024; 21: 584–594.

13.

Zou

. Confidence interval estimation for treatment effects in cluster randomization trials based on ranks. Stat Med 2021; 40: 3227–3250.

14.

Davies Smith

Jairath

Zou

. Rank-based estimators of global treatment effects for cluster randomized trials with multiple endpoints on different scales. Stat Methods Med Res 2025; 34: 1267–1289.

15.

Cohen

. Statistical power analysis. Curr Dir Psychol Sci 1992; 1: 98–101.

16.

Rahlfs

Zimmermann

Lees

. Effect size measures and their relationships in stroke studies. Stroke 2014; 45: 627–633.

17.

Bebu

Lachin

. Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics 2016; 17: 178–187.

18.

Barnhart

Lokhnygina

Matsouaka

, et al. Trial design with win statistics for multiple time-to-event endpoints with hierarchy. Stat Biopharm Res 2025; 17: 197–210.

19.

Obuchowski

. Nonparametric analysis of clustered ROC curve data. Biometrics 1997; 53: 567.

20.

Roy

Harrar

Konietschke

. The nonparametric Behrens-Fisher problem with dependent replicates. Stat Med 2019; 38: 4939–4962.

21.

Brunner

Munzel

. The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom J 2000; 42: 17–25.

22.

Van Breukelen

GJP

Candel

MJJM

Berger

MPF

. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Stat Med 2007; 26: 2589–2603.

23.

Wang

Harhay

Tong

, et al. On the mixed-model analysis of covariance in cluster-randomized trials. arXiv. [Preprint] 2021. DOI: 10.48550/ARXIV.2112.00832.

24.

Adams

Gulliford

Ukoumunne

, et al. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol 2004; 57: 785–794.

25.

Eldridge

Ashby

Feder

, et al. Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care. Clin Trials 2004; 1: 80–90.

26.

Kahan

Forbes

Ali

, et al. Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study. Trials 2016; 17: 438.

27.

Ivers

Taljaard

Dixon

, et al. Impact of CONSORT extension for cluster randomised trials on quality of reporting and study methodology: review of random sample of 300 trials, 2000-8. BMJ 2011; 343: d5886.

28.

Khanna

Bressler

Levesque

, et al. Early combined immunosuppression for the management of Crohn’s disease (REACT): a cluster randomised controlled trial. Lancet 2015; 386: 1825–1834.

29.

. On the bias in the AUC variance estimate. Pattern Recognit Lett 2024; 178: 62–68.

30.

Ajufo

Nayak

Mehra

. Fallacies of using the win ratio in cardiovascular trials. JACC Basic Transl Sci 2023; 8: 720–727.

31.

Fuyama

Ogawa

Mizusawa

, et al. Impact of correlations between prioritized outcomes on the net benefit and its estimate by generalized pairwise comparisons. Stat Med 2023; 42: 1606–1624.

32.

Giai

Maucort-Boulch

Ozenne

, et al. Net benefit in the presence of correlated prioritized outcomes using generalized pairwise comparisons: a simulation study. Stat Med 2021; 40: 553–565.

33.

Rauch

Jahn-Eimermacher

Brannath

, et al. Opportunities and challenges of combined effect measures based on prioritized outcomes. Stat Med 2014; 33: 1104–1120.

34.

Zhang

Jeong

J-H

. Inference on win ratio for cluster-randomized semi-competing risk data. JPN J Stat Data Sci 2021; 4: 1263–1292.

35.