Abstract
Background:
MyTEMP was a cluster-randomized trial to assess the effect of using a personalized cooler dialysate compared to standard temperature dialysate for potential cardiovascular benefits in patients receiving maintenance hemodialysis in Ontario, Canada.
Objective:
To conduct Bayesian analyses of the MyTEMP trial, which sought to determine whether adopting a center-wide policy of personalized cooler dialysate is superior to a standard dialysate temperature of 36.5°C in reducing the risk of a composite outcome of cardiovascular-related deaths or hospitalizations.
Design:
Secondary analysis of a parallel-group cluster-randomized trial.
Setting:
In total, 84 dialysis centers in Ontario, Canada, were randomly allocated to the 2 groups.
Patients:
Adult outpatients receiving in-center maintenance hemodialysis from dialysis centers participating in the trial.
Measurements:
The primary composite outcome was cardiovascular-related death or hospital admission with myocardial infarction, ischemic stroke, or congestive heart failure during the 4-year trial period.
Methods:
MyTEMP trial data were analyzed using Bayesian cause-specific parametric Weibull methods to model the survival time with 6 pre-defined reference priors of normal distributions on the log hazard ratio for the treatment effect (strongly enthusiastic, moderately enthusiastic, non-informative, moderately skeptical, skeptical, strongly skeptical). For each analysis, we reported the posterior mean, 2nd, 50th, and 98th percentiles of the treatment effects (hazard ratios) and 96% credible interval (CrI). We also reported the estimated posterior probabilities for different magnitudes of treatment effects.
Results:
Regardless of priors, Bayesian analysis yielded consistent posterior means and a 96% CrI. The posterior distribution of the hazard ratio was concentrated between 0.95 and 1.05, indicating there was probably no substantial difference between the 2 trial arms.
Limitations:
The interpretation of Bayesian methods highly depends on the prior distributions. In our study, the prior distributions were determined by 2 experts without a formal elicitation method. A formal elicitation is encouraged in future trials to better quantify experts’ uncertainty about the treatment effect. In addition, we used cause-specific parametric Weibull methods to model survival time, as semi-parametric methods were not available in the standard Bayesian statistical software package at the time of analysis.
Conclusions:
Our Bayesian analysis indicated that implementing personalized cooler dialysate as a center-wide policy is unlikely to yield meaningful benefits in reducing the composite outcome of cardiovascular-related deaths and hospitalizations, regardless of prior expectations, whether optimistic or skeptical, about the intervention’s effectiveness.
Introduction
Randomized controlled trials are commonly designed and analyzed using a frequentist probability framework from which statistical hypothesis testing and/or confidence intervals (CIs) are established. In a frequentist framework, estimates of treatment effects on the primary outcome are based solely on trial data and are often designed with a 2-sided level of statistical significance of .05, from which P-values less than .05 are used to infer statistical significance and inform clinical decisions. However, the American Statistical Association has issued a statement arguing that the interpretation of trials (eg, the efficacy of the treatment) should not be based solely on an arbitrary cut-off, such as a P-value ≤ .05. 1
As an alternative, investigators may adopt Bayesian methods to analyze clinical trial data. Bayesian analysis is a statistical approach that can be used in clinical trials that combines existing knowledge with new trial data to better understand the effects of treatment better. 2 Unlike traditional methods (ie, a frequentist approach), which only look at data collected during the trial, Bayesian methods allow researchers to start with an initial expectation (or assumption) about how effective a treatment might be. This initial expectation, known as a “prior belief,” might come from earlier studies, clinical experience, or expert opinion.
As new data from the trial becomes available, researchers update their initial expectations with incoming evidence. The final updated understanding, called the “posterior belief,” provides a clearer, more complete picture of how effective a treatment is, combining both past knowledge and current trial results. 3 This approach is particularly helpful when there is uncertainty about a treatment’s effectiveness because it allows researchers to formally incorporate “uncertainty” into the analysis.4-6 Rather than simply concluding whether a treatment effect is statistically significant, Bayesian analysis provides a probability statement indicating whether a treatment may or may not be effective. This means that after observing the data, researchers can conclude something like, “there is a 90% probability that the intervention group is doing better than the control group,” which may be easier for clinicians and decision-makers to understand.7,8
This approach can also help interpret findings from trials that do not show statistically significant results. 9 A recent systematic review used Bayesian methods to re-analyze 49 trials that had non-statistically significant results.7,8 The results showed that 15 trials had a posterior probability of any benefit greater than 80%. 7 Overall, Bayesian analysis helps researchers and clinicians make clearer, more informed, and practical decisions about treatments by combining existing evidence with new clinical trial results.
The Major Outcomes With Personalized Dialysate TEMPerature (MyTEMP) was a pragmatic two-arm, parallel-group, open-label, cluster-randomized trial that assessed whether adopting a center-wide policy of personalized cooler dialysate is superior to a standard dialysate temperature of 36.5°C at reducing the risk of a composite outcome, cardiovascular-related deaths, or hospitalizations. 10 A total of 84 dialysis centers in Ontario, Canada, participated in the trial, and centers were randomly allocated to the 2 groups. The main analysis used an intent-to-treat frequentist approach with a patient-level model (Fine and Gray’s subdistribution hazard model) and adjusted for the correlation of outcomes within the clusters (centers) with generalized estimating equations (GEE). Additional details are available in the study protocol, statistical analysis plan, and final report.10-12
The MyTEMP trial was powered to detect a 20% reduction in the hazard rate of the primary outcome. However, the authors acknowledged that effects less than 20% could still be clinically meaningful. Bayesian analysis allows for the inference of the distribution of the estimated treatment effects, which enables us to estimate the posterior probabilities that the treatment effect exceeded a range of potential values for different magnitudes of treatment effects (ie, HR >1, HR <1, HR <0.95, HR <0.90). We consulted clinician experts to reach a mutually agreed upon set of cut-offs that reflect whether the treatment shows evidence of any harm, any benefit, minimum clinically meaningful benefit, and more than trivial benefits. Meanwhile, the authors acknowledge that clinicians may have varying degrees of clinical uncertainty about the benefit of the intervention. The Bayesian method can incorporate these uncertainties via priors and allows authors to explicitly examine how the clinician’s uncertainty about the intervention effect could impact the results. Therefore, presented here is a pre-specified secondary analysis of the MyTEMP data using a Bayesian framework.
Methods
Following the previously published statistical analysis plan, 11 secondary Bayesian analyses of the MyTEMP trial were performed using 6 pre-defined reference priors with normal distributions on the log hazard ratio (HR) for the treatment effect (non-informative, strongly enthusiastic, moderately enthusiastic, strongly skeptical, moderately skeptical, skeptical). Specified priors with rationales are described in detail in Table 1. A visualization of priors is presented in Figure 1. These priors were discussed based on existing studies with 2 content experts and 1 independent Bayesian statistician. 9 Separate Bayesian models were run for each prior distribution on the log HR for the primary outcome. Each model was adjusted for the covariates constrained in the randomization at the center level (the historical rate of the primary composite outcome for the center in which a patient entered the trial) and at the patient level: age, sex, rurality, race (Caucasian vs other), modified Charlson index, number of hospital admissions in the previous 12 months, number of unique hypertensive medications (at least 1 Hypertensive Rx vs others), late nephrology referral (within 3 months of initiating dialysis vs otherwise), type of hemodialysis vascular access (catheter or other), the concentration of serum albumin (<35 g/L vs ≥ 35 g/L), and a history of myocardial infarction, congestive heart failure, peripheral arterial disease, or diabetes mellitus. Compared to the frequentist analysis, we converted variables with more than 2 categories to binary variables to reduce the computational time for the adjusted analyses. We were able to compare our results with the corresponding frequentist cause-specific model. However, due to the limited availability of current methods in Bayesian analysis for clustered survival data, our Bayesian analysis differs from a primary frequentist approach in notable ways. First, the frequentist analysis employed a semi-parametric model using both the cause-specific hazard and the subdistribution hazard. Due to the limited availability of software for Bayesian semi-parametric models that handle clustered survival outcomes, and the constraints of existing tools regarding prior specifications, we opted to model the survival time using a parametric Weibull distribution. 13 Our focus was on the cause-specific hazard. Second, the frequentist approach used a GEE approach with an independent working correlation to adjust for the clustering in dialysis centers, resulting in a population-averaged, or marginal, interpretation of the parameter effects. We added a random intercept to account for clustering effects, as GEE is unavailable under the Bayesian framework.
Characteristics of Reference Prior Probability Distributions Representing Prior Beliefs About Primary Composite Endpoint Benefit.
HR = hazard ratio; SD = standard deviation.

Reference priors showing the plausible range of hazard ratios of primary composite endpoint benefit with the use of personalized, temperature-reduced dialysate protocol.
For each analysis, we reported the posterior mean, 2nd, 50th, and 98th percentiles of the treatment effects (HRs) to align with the original trial analysis, which allocated a 0.04 significance level (alpha) for the primary analysis, thus reported 96% credible intervals (CrIs). We also reported the estimated posterior probabilities for different magnitudes of treatment effects as specified above. For each analysis, we ran simulations with Markov chain Monte Carlo (MCMC) to get posterior distribution with 3 chains. For each chain, 15,000 MCMC iterations were kept for inference following the first 10,000 discarded iterations (ie, burn-in iterations). The analyses were done using rjags package under R version 3.6. 14 The main purpose of using multiple chains was to evaluate model performance, as it is essential to ensure that the posterior is sampled sufficiently and appropriately. The increase in the number of chains becomes computationally expensive, and the final choice often depends on the computational resources. The common choice of the number of chains is between 2 and 4, and we chose 3 as it is the default choice in rjags. We also chose a large number of burn-in iterations to ensure that the chain moved to the modal region of the posteriors before our sampling.15,16
The MyTEMP trial data used to conduct this study were linked from administrative healthcare databases in Ontario, Canada, held at ICES (ices.on.ca). These datasets were linked using unique encoded identifiers and analyzed at ICES. ICES is an independent, non-profit research institute whose legal status under Ontario’s health information privacy law allows it to collect and analyze health care and demographic data, without consent, for health system evaluation and improvement. MyTEMP trial is registered at ClinicalTrials.gov, NCT02628366.
Results
Table 2 shows the findings of the Bayesian analysis, showing that the posterior means of HRs and the 96% CrIs were consistent across all prior distributions. A visualization of posterior distribution is presented in Figure 2. For instance, using a non-informative prior, the mean HR was 1.02 with a 96% CrI of 0.98 to 1.07. With a strongly skeptical prior, the mean HR was also 1.02, with a 96% CrI of 0.97 to 1.07. In the case of a strongly enthusiastic prior, the HR was slightly lower at 1.01, accompanied by a 96% CrI of 0.96 to 1.06. These results align closely with those from the frequentist analysis but have better precision. With the frequentist analysis, the HR estimate was 1.00, with a 96% CI of 0.89 to 1.11.
HR = hazard ratio; SD = standard deviation.
The original MyTEMP analysis allocated a significance level (alpha) of .04 for the primary analysis and provided 96% confidence intervals. To align with the analysis, we provide the 96% credible intervals.
For the adjusted hazard ratio, the model was adjusted for the covariates constrained in the randomization at the center level (the historical rate of the primary composite outcome for the center in which a patient entered the trial) and at the patient level (age, sex, rurality, race, modified Charlson index, number of hospital admissions in the previous 12 months, number of unique hypertensive medications (categorical: missing, 0, 1, 2, 3), first nephrologist visit <3 months before hemodialysis initiation, type of hemodialysis vascular access, concentration of serum albumin (categorical: <34, 34-38, ≥38), history of myocardial infarction, congestive heart failure, peripheral arterial disease, and diabetes mellitus).

Posterior probability distributions in adjusted analyses for hazard ratios for the benefit of using personalized, temperature-reduced dialysate protocol on primary composite endpoints by 6 reference priors.
The probability that the HR was above or below a certain pre-specified threshold is presented in Table 3. We found that the probability of the HR having a value less than 1 (eg, a beneficial effect of lower temperature dialysate vs a temperature of 36.5°C) was 16% in the non-informative setting, 18% in the strongly skeptical priors and 34% in the strongly enthusiastic setting, and the probability of the HR having a value greater than 1 (eg, a harmful effect of lower temperature dialysate vs a temperature of 36.5°C) was 84%, 82%, and 66%, respectively. We decided to conduct a post hoc calculation and showed that the probability of an HR greater than 1.05 for non-informative and strongly enthusiastic priors was 13.3% and 4.0%, respectively, and the probability of an HR less than 0.95 was 0.1% and 0.7%, respectively. These results showed that the HRs would be centered between 0.95 and 1.05 and mirror what was observed in the frequentist analysis, concluding that there is no evidence that the personalized, temperature-reduced dialysate protocol (intervention) is superior to a dialysate temperature of 36.5°C (control).
Posterior Probability of Treatment Effects Being More or Less Than a Pre-Specified Threshold in Adjusted Analysis. a
HR = hazard ratio; SD = standard deviation.
For the adjusted hazard ratio, the model was adjusted for the covariates constrained in the randomization at the center level (the historical rate of the primary composite outcome for the center in which a patient entered the trial) and at the patient level (age, sex, rurality, race, modified Charlson index, number of hospital admissions in the previous 12 months, number of unique hypertensive medications [at least 1 Hypertensive Rx vs others], late nephrology referral [within 3 months vs unknown], type of hemodialysis vascular access [catheter or other], concentration of serum albumin [<35 g/L vs ≥ 35 g/L], history of myocardial infarction, congestive heart failure, peripheral arterial disease, and diabetes mellitus).
The HR >1.05 was completed in a post hoc analysis to further understand the potential for harm in the intervention. All other prior beliefs were pre-specified.
Discussion
This study presents an additional Bayesian analysis of the MyTEMP trial, investigating the impact of a center-wide policy of personalized, temperature-reduced dialysate on a composite outcome of cardiovascular-related mortality or significant cardiovascular-related hospitalization. Our Bayesian analysis supplements our frequentist analysis by directly incorporating clinical uncertainties via 6 different prior beliefs, representing 6 different clinical perceptions of the potential benefits of the intervention. We found that the Bayesian analysis results are similar across different priors, which indicates that, regardless of whether clinicians strongly believe that the center-wide policy of personalized cooler dialysate will work, it will not change the trial’s conclusion. This insight cannot be offered by frequentist analysis. We also calculated the posterior probabilities to quantify the potential harm or benefits of the intervention, which are usually unavailable in frequentist analysis.
The original trial report concluded that “center-wide delivery of personalized cooler dialysate did not significantly reduce the risk of major cardiovascular events compared with standard temperature dialysate” based on an adjusted HR of 1.00 (96% CI = 0.89-1.11). 12 We would reach a similar conclusion regardless of the prior distributions in the Bayesian analysis. The posterior probability is concentrated around 1.0, with a high probability that the estimated HR lies between 0.95 and 1.05. Based on our non-informative and strongly enthusiastic priors, the probability of an HR less than 0.95 was 0.1% and 0.7%, respectively. These results suggest that it is unlikely that a lower personalized dialysate temperature will have a clinically meaningful benefit on the MyTEMP primary outcome if adopted as a center-wide policy. The estimated averaged treatment effects (HRs) were close to the point estimates obtained in frequentist analyses, and the 96% CrIs were close to the CIs from the frequentist analyses, though the CrIs tended to be narrower (which is to be expected as Bayesian methods usually have higher precision given the additional information from the priors). This is not surprising due to the size of the trial, where the data drove the results during the trial, and the prior distributions had a relatively small impact on the posterior distributions.
In summary, the secondary Bayesian analyses of the MyTEMP trial offer alternative ways to interpret the trial results. Our Bayesian analysis directly provided 2 key pieces of useful information for clinicians: (1) the probability that 1 trial arm is better than another and (2) how likely it is that the treatment effect reaches clinically meaningful thresholds (eg, a certain percentage reduction in risk). Traditional frequentist methods typically do not offer this level of practical insight; they can only tell us whether there’s evidence of a difference or not, without indicating how likely it is that a meaningful treatment effect exists. For example, in our primary analysis of MyTEMP, we can only conclude that there is no difference in the rate of our primary outcome between using a personalized, cooler dialysate compared to usual dialysate temperature. In contrast, the Bayesian approach allowed us to clearly state that the chance of achieving a clinically important benefit (such as a 20% reduction in risk) is effectively zero. Another example is that our reanalysis allowed us to conclude a 13% probability that personalized, cooler dialysate could result in potential harm (HR >1.05 for our primary outcome) compared to usual care, which was much higher than the probability of benefit (0.1% probability of HR <0.95) when using for non-informative priors.
Despite the potential benefits in trial interpretations as we described above, in recent years, we have seen that Bayesian methods have been used in clinical trials to improve trial efficiency. Although this was not the focus of our study, our Bayesian analysis of the MyTEMP trial can still statistically benefit from these advantages. For example, in a cluster-randomized trial using Bayesian methods, a weakly informative prior distribution could potentially increase the precision and power of the trial. 17 In our study, this is reflected in the narrower width of the CrIs than that of the CIs in frequentist analysis. We acknowledge that the Bayesian method often involves complex methodologies and statistical programming. Our reanalysis serves as a good example of how these analyses can be conducted. While we focused on the Bayesian analysis of trials, we also acknowledge that the Bayesian design of clinical trials has the potential to reduce the required sample sizes, which is particularly useful in rare kidney disease trials. Robinson et al18,19 demonstrated the use of Bayesian adaptive design for a nephrology trial, which allows investigators to stop the trial early and enroll fewer patients. From a methodological perspective to our knowledge, there is a very limited number of studies focusing on Bayesian methods in clustered time-to-event outcomes,20-22 and cluster-randomized trials (CRTs) with survival outcomes are less common than other CRTs with other types of outcomes.23,24,25 Bayesian hierarchical models can efficiently pool information across patient subgroups to improve the precision of treatment effect estimates.
Although our Bayesian reanalysis produced similar results compared to the frequentist analysis, we would like to emphasize that this is not always the expectation. As an example, the Bayesian reanalysis of the Frequent Hemodialysis Network (FHN) Nocturnal Trial, a randomized controlled trial comparing frequent (6-times-per-week) nocturnal home hemodialysis vs conventional 3-times-per-week dialysis in patients with end-stage renal disease, provided a different interpretation compared to the original frequentist analysis. 26 The frequentist results suggested increased mortality risk (HR = 3.88, 95% CI = 1.27-11.79) for frequent nocturnal dialysis during extended follow-up. The Bayesian analysis revisited these findings by highlighting considerable uncertainty. Specifically, Bayesian results showed that the estimated HR was considerably closer to 1 (no effect) with 95% credible intervals of 0.84 to 1.97 (conservative prior) and 0.76 to 2.32 (enthusiastic prior), suggesting that the frequentist mortality signal likely represented a chance finding from a small and underpowered trial. Thus, the Bayesian reanalysis underscores uncertainty rather than definitive evidence of harm or benefit from frequent nocturnal dialysis. We also recommend that one should not make a decision whether to conduct a Bayesian reanalysis or not based on whether they would expect a similar or different conclusion from the frequentist analysis. The purpose of Bayesian reanalysis is not to seek the same or different conclusions but to “switch the gear” and provide extra interpretation that may help clinicians interpret the results.
There are some limitations to this study. First, the validity of Bayesian methods highly depends on the prior distributions. In our study, the prior distributions were determined by a limited number of experts without a formal elicitation method. Further studies may want to involve a formal elicitation exercise to improve the reliability of the prior distributions.27-29 Also, in this study, we used cause-specific parametric Weibull methods to model the survival time. There might be better methods currently unavailable in statistical software. For additional details on the limitations of the specific trial, please refer to the primary report. 12
Conclusion
The Bayesian reanalysis showed that there was a low probability of observing any benefits with implementing personalized cooler dialysate vs standard temperature dialysate as a center-wide policy, regardless of prior enthusiasm or skepticism for the beneficial effects of the intervention. The probability of the estimated HR being below 0.95 or above 1.05 was low.
Footnotes
Acknowledgements
This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). The research was conducted by members of the ICES Kidney, Dialysis, and Transplantation team, at the ICES Western facility. This document used data adapted from the Statistics Canada Postal CodeOM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from ©Canada Post Corporation and Statistics Canada. Parts of this material are based on data and/or information compiled and provided by the Canadian Institute for Health Information and the Ontario Ministry of Health. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. Parts of this material are based on data and information provided by Ontario Health (OH). The opinions, results, view, and conclusions reported in this paper are those of the authors and do not necessarily reflect those of OH. No endorsement by OH is intended or should be inferred.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethics Approval
This is a reanalysis of clinical trial data. No extra ethical approval is needed. The original trial is registered at ClinicalTrials.gov, NCT02628366. The use of the ICES data in this project is authorized under section 45 of Ontario’s Personal Health Information Protection Act and does not require review by a Research Ethics Board.
Consent to Participate
The Research Ethics Board approved our application with an alteration to the informed consent process as described in the protocol of the original trial.
Consent for Publication
Consent for publication was obtained from all authors.
Availability of Data and Materials
The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at
(email: das@ices.on.ca). The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.
