Does ignoring clustering in multicenter data influence the performance of prediction models? A simulation study

Abstract

Clinical risk prediction models are increasingly being developed and validated on multicenter datasets. In this article, we present a comprehensive framework for the evaluation of the predictive performance of prediction models at the center level and the population level, considering population-averaged predictions, center-specific predictions, and predictions assuming an average random center effect. We demonstrated in a simulation study that calibration slopes do not only deviate from one because of over- or underfitting of patterns in the development dataset, but also as a result of the choice of the model (standard versus mixed effects logistic regression), the type of predictions (marginal versus conditional versus assuming an average random effect), and the level of model validation (center versus population). In particular, when data is heavily clustered (ICC 20%), center-specific predictions offer the best predictive performance at the population level and the center level. We recommend that models should reflect the data structure, while the level of model validation should reflect the research question.

Keywords

Mixed model logistic regression clinical prediction model calibration discrimination predictive performance bias

Get full access to this article

View all access options for this article.

References

Moons

KGM

Altman

Reitsma

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73.

Steyerberg

. Clinical prediction models: A practical approach to development, validation, and updating, New York, NY: Springer US, 2009.

Sprague

Matta

Bhandari

, et al. Multicenter collaboration in observational research: improving generalizability and efficiency. J Bone Joint Surg Am 2009; 91(Suppl 3): 80–86.

Snijders

TAB

Bosker

. Multilevel analysis: an introduction to basic and advanced multilevel modeling, 2nd ed. London: Sage, 2012.

Bouwmeester

Twisk

Kappen

, et al. Prediction models for clustered data: comparison of a random intercept and standard regression model. BMC Med Res Methodol 2013; 13.

Debray

TPA

Moons

KGM

Ahmed

, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med 2013; 32: 3158–3180.

Skrondal

Rabe-Hesketh

. Prediction in multilevel generalized linear models. J R Stat Soc Ser A 2009; 172: 659–687.

Pavlou

Ambler

Seaman

, et al. A note on obtaining correct marginal predictions from a random intercepts model for binary outcomes. BMC Med Res Methodol 2015; 15: 59–59.

van Klaveren

Steyerberg

Perel

, et al. Assessing discriminative ability of risk models in clustered data. BMC Med Res Methodol 2014; 14.

10.

Van Oirbeek

Lesaffre

. Assessing the predictive ability of a multilevel binary regression model. Comput Stat Data Anal 2012; 56: 1966–1980.

11.

Kaijser

Bourne

Valentin

, et al. Improving strategies for diagnosing ovarian cancer: a summary of the International Ovarian Tumor Analysis (IOTA) studies. Ultrasound Obstet Gynecol 2013; 41: 9–9.

12.

Harrell

. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, New York, NY: Springer, 2001.

13.

Cox

. Two further applications of a model for binary regression. Biometrika 1958; 45: 562–565.

14.

Van Calster

Nieboer

Vergouwe

, et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 2016; 74: 167–176.

15.

Babyak

. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med 2004; 66: 411–421.

16.

Wynants

Bouwmeester

Moons

, et al. A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data. J Clin Epidemiol 2015; 68: 8–8.

17.

Steyerberg

Eijkemans

Habbema

. Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J Clin Epidemiol 1999; 52: 935–942.

18.

Peduzzi

Concato

Kemper

, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373–1379.

19.

Vittinghoff

McCulloch

. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007; 165: 710–718.

20.

Peduzzi

Concato

Feinstein

, et al. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995; 48: 1503–1510.

21.

Courvoisier

Combescure

Agoritsas

, et al. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol 2011; 64: 993–1000.

22.

Harrell

Lee

Mark

. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–387.

23.

Steyerberg

Eijkemans

Harrell

Jr , et al. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19: 1059–1079.

24.

Testa

Kaijser

Wynants

, et al. Strategies to diagnose ovarian cancer: new evidence from phase 3 of the multicentre international IOTA study. Br J Cancer 2014; 111(4): 680–688.

25.

Zeger

Liang

K-Y

Albert

. Models for longitudinal data: a generalized estimating equation approach. Biometrics 1988; 44: 1049–1060.

26.

Neuhaus

Kalbfleisch

Hauck

. A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int Stat Rev 1991; 59: 25–35.

27.

Neuhaus

. Statistical methods for longitudinal and clustered designs with binary responses. Stat Methods Med Res 1992; 1: 249–273.

28.

Adams

Gulliford

Ukoumunne

, et al. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol 2004; 57: 785–794.

29.

Kahan

Harhay

. Many multicenter trials had few events per center, requiring analysis via random-effects models or GEEs. J Clin Epidemiol 2015; 68: 1504–1511.

30.

R Development Core Team. R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing, 2011.

31.

Bates D, Maechler M and Bolker B. lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42. 2011.

32.

Maas

CJM

Hox

. Sufficient sample sizes for multilevel modeling. Methodology 2005; 1: 86–92.

33.

Paccagnella

. Sample size and accuracy of estimates in multilevel models. Methodology 2011; 7: 111–120.

34.

Moineddin

Matheson

Glazier

. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol 2007; 7.

35.

Molenberghs

Verbeke

. Models for discrete longitudinal data, New York: Springer, 2005.

36.

Kahan

. Accounting for centre-effects in multicentre trials with a binary outcome – when, why, and how? BMC Med Res Methodol 2014; 14: 1–11.

37.

Neuhaus

McCulloch

Boylan

. Estimation of covariate effects in generalized linear mixed models with a misspecified distribution of random intercepts and slopes. Stat Med 2013; 32: 2419–2429.

38.

Kahan

Morris

. Analysis of multicentre trials with continuous outcomes: when and how should we account for centre effects? Stat Med 2013; 32: 1136–1149.

39.

Maas

Hox

. The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Comput Stat Data Anal 2004; 46: 427–440.

40.

Snell

Hua

Debray

, et al. Multivariate meta-analysis of individual participant data helped externally validate the performance and implementation of a prediction model. J Clin Epidemiol 2016; 69: 40–50.

41.

van Klaveren

Götz

Op de Coul

, et al. Prediction of Chlamydia trachomatis infection to facilitate selective screening on population and individual level: a cross-sectional study of a population-based screening programme. Sex Transm Infect 2016; 92: 433–440.

42.

Riley

Ahmed

Debray

, et al. Summarising and validating test accuracy results across multiple studies for use in clinical practice. Stat Med 2015; 34: 2081–2103.

43.

Vergouwe

Moons

KGM

Steyerberg

. External validity of risk models: use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol 2010; 172: 971–980.

44.

Janssen

KJM

Moons

KGM

Kalkman

, et al. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 2008; 61: 76–86.

45.

Steyerberg

Borsboom

van Houwelingen

, et al. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 2004; 23: 2567–2586.

46.

Van Houwelingen

Thorogood

. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med 1995; 14: 1999–2008.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.67 MB