Abstract
Decades of animal research show therapeutic hypothermia (TH) to be potently neuroprotective after cerebral ischemic injuries. While there have been some translational successes, clinical efficacy after ischemic stroke is unclear. One potential reason for translational failures could be insufficient optimization of dosing parameters. In this study, we conducted a systematic review of the PubMed database to identify all preclinical controlled studies that compared multiple TH durations following focal ischemia, with treatment beginning at least 1 hour after ischemic onset. Six studies met our inclusion criteria. In these six studies, six of seven experiments demonstrated an increase in cerebroprotection at the longest duration tested. The average effect size (mean Cohen's d ± 95% confidence interval) at the shortest and longest durations was 0.4 ± 0.3 and 1.9 ± 1.1, respectively. At the longest durations, this corresponded to percent infarct volume reductions between 31.2% and 83.9%. Our analysis counters previous meta-analytic findings that there is no relationship, or an inverse relationship between TH duration and effect size. However, underreporting often led to high or unclear risks of bias for each study as gauged by the SYRCLE Risk of Bias tool. We also found a lack of investigations of the interactions between duration and other treatment considerations (e.g., method, delay, and ischemic severity). With consideration of methodological limitations, an understanding of the relationships between treatment parameters is necessary to determine proper “dosage” of TH, and should be further studied, considering clinical failures that contrast with strong cerebroprotective results in most animal studies.
Introduction
Preclinical research and meta-analyses demonstrate that therapeutic hypothermia (TH) is highly cerebroprotective after ischemia (Dumitrascu et al., 2016). While there have been some clinical successes, such as using mild TH for hypoxic ischemic encephalopathy in newborns (Chiang et al., 2017), efficacy following cardiac arrest in adults remains somewhat unclear (Callaway, 2023), even more so for ischemic stroke (Hong, 2019). For the latter, there is a disconnect between animal and clinical findings (Kuczynski et al., 2020). Indeed, the use of TH for ischemic stroke appears lost in the “valley of death” between basic science and clinical use (Seyhan, 2019), likely owing to difficulty inducing TH, side-effects of TH, and insufficient knowledge of dosage. There are enough data showing that the effectiveness of TH varies with a complex interplay of patient factors (age, comorbidities, degree of collateral blood supply, reperfusion status, etc.), cooling method (e.g., brain-selective vs. systemic, physical vs. pharmacological), and TH parameters (e.g., intervention delay, depth, duration).
Regardless, this complexity has not been fully worked out as it is virtually impossible to study all of the main effects and interactions in animal studies. A more pragmatic approach is to move forward into clinical trials with effective and clinically practical treatment protocols (dosages) based upon sufficient high-quality preclinical data for the targeted patient population(s).
One major consideration for implementing TH is the duration of cooling; brief and prolonged cooling both have their advantages and limitations. Brief cooling is highly effective during an ischemic insult, and likely also at the time of reperfusion. When delayed, however, more protracted cooling is needed. For instance, Dietrich and colleagues (1993) showed that 3 hours of immediate postischemic cooling only transiently rescued CA1 neurons after forebrain ischemia. Conversely, Colbourne and Corbett (1994, 1995) were the first to show that protracted cooling was persistently protective against forebrain ischemic injury, even after a delay. There have also been numerous studies investigating delayed cooling after onset of focal ischemia. Intraischemic cooling is highly protective, and as such, brief bouts of TH may be sufficient when initiated early, or against milder insults (Karibe et al., 1994). Although considering that time to treatment for ischemic stroke patients may be delayed many hours, immediate and brief TH is likely impractical.
Ischemic injury progresses over hours to days, and likely requires longer cooling to sufficiently suppress delayed or protracted injurious processes, such as intracranial pressure (ICP) elevations (Hood et al., 2023). Additionally, the necessary duration of cooling likely differs with the method used (Clark et al., 2009). Clinicians must develop a strategy to quickly induce cooling, maintain it for the necessary duration, and rewarm at an appropriate rate. Discrete studies that have rigorously varied TH durations have not been undertaken, extant literature is heterogeneous and inferences from meta-analyses are probably untenable.
Attempts to understand the myriad factors that determine the efficacy of TH have been undertaken in both in vitro and in vivo models. On the first, Lyden et al. (2019) suggested that brief cooling is most cerebroprotective after finding that TH may impede the intrinsic protective mechanisms of astrocytes, in vitro. On the latter, limited studies looking at varying parameters tend to find increasing protection with prolonging duration (Clark et al., 2008). Thus, meta-analyses for TH in ischemic stroke have been unable to determine a definite relationship between infarct reduction and duration of TH, finding either a slightly inverse relationship (Van Der Worp et al., 2007) or lack thereof (Dumitrascu et al., 2016). Cross-study (e.g., meta-analytic) comparisons are often inherently confounded; for instance, studies using brief TH more often use shorter survival times than those investigating longer TH treatments. Furthermore, owing to numerous limitations, in vitro models do not always reliably predict in vivo work. If these issues go unaddressed, they will further impede translation. In all, this raises the question: do we have enough data to decide upon optimal dosage, such as the duration of cooling?
Will we risk more clinical tests based upon limited animal and clinical data? For these reasons, we completed a systematic review of studies that directly compared two or more TH durations while controlling all other factors pertaining to treatment and focal ischemia. Examining within-study effects introduces a level of internal control that is not possible in between-study comparisons of TH durations. We expected to find a positive relationship between effect size and TH duration. We also anticipated concerns regarding the scientific rigor, translational relevance, quality of evidence, and risks of bias in each study. Through this analysis, we aimed to identify gaps in parametric research that should be targeted to improve the odds of translating TH as a cerebroprotectant for ischemic stroke.
Methods
Systematic search
We performed a systematic search of the PubMed database in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines using the following terms:
((stroke OR “ischemic stroke” OR “ischaemic stroke” OR “cerebral ischemia” OR “cerebral ischaemia” OR “focal ischemia” OR “focal ischaemia” OR “brain ischemia” OR “brain ischaemia” OR “middle cerebral artery occlusion”) AND (hypothermia OR cooling OR temperature OR cold)) AND (animal OR rat OR mouse OR rodent OR preclinical OR “in vivo”) NOT (neonatal[Title]) NOT (newborn[Title]) NOT (aortic[Title]) NOT (myocardial[Title]) NOT (renal[Title]) NOT (hypoxia[Title]) NOT (hypoxic[Title]) NOT (“subarachnoid hemorrhage”[Title]) NOT (“in vitro”[Title]).
On October 11, 2022, the above search criteria yielded 3605 titles and abstracts of studies to be reviewed for inclusion. To identify any articles that may have been missed by our search terms, we performed an unstructured Google Scholar search and reviewed multiple secondary sources, which presented no additional studies. To identify any articles published after October 2022, we ran our search terms again in PubMed in January of 2023, which also returned no new applicable studies. Studies included were English language, controlled, in vivo preclinical studies, which specifically tested two or more durations of delayed (i.e., greater than 1-hour delay from stroke onset) TH in focal ischemic stroke models. We excluded studies that initiated TH earlier than 1 hour after ischemic onset in this analysis given its limited translational relevance with common time-to-treatment delays and lengthy hypothermic induction durations (Huber et al., 2019).
Statistical analysis
We calculated Cohen's d for each group for comparisons of the efficacy of postischemic hypothermic duration. Cohen's d is a standardized effect size used to compare the difference between two group means; herein, we used Cohen's d to compare each hypothermic treatment group to the normothermic control group. A meta-analysis was not completed due to the limited number of studies, methodological heterogeneity, and risk of bias, as per Cochrane guidelines (McKenzie and Brennan, 2022). Data are presented as mean ± 95% confidence interval (CI).
Quality of evidence
We analyzed the quality of each study based on the reporting of exclusions, sample size calculations, randomization, monitoring and control of temperature, behavioral endpoint used, and outcome blinding. We acknowledge that there is no one-size-fits-all guide for acceptable scientific reporting, as such, we cannot expect all studies to meet each criterion. This analysis is meant to provide insight into the state of our current evidence for dose–response research for TH. Thus, we had selected a few important determinants of the quality and reliability of evidence based on published work from experts in preclinical experimental design (Dirnagl, 2006; Kilkenny et al., 2010; Lyden et al., 2021; Macleod et al., 2009).
Risk of bias
Studies were assessed for bias by considering each guiding question provided by the SYRCLE Risk of Bias tool, and deciding if the answer is “yes,” indicating a low risk of bias, “no,” indicating a high risk of bias, and “unclear” indicating an unclear risk of bias (Hooijmans et al., 2014). The studies were not given a total score as recommended by Hooijmans et al. (2014).
Results
Search
Of the 3605 studies returned in the PubMed search, 13 articles were subject to full-text review by two independent reviewers (M.E. and A.T) as illustrated in Figure 1. After evaluation, six studies were included.

PRISMA diagram showing the selection processes for studies during our systematic search. We included only English language, preclinical, controlled studies, which tested two or more durations of TH for focal ischemia. We excluded any studies, which began TH less than 1 hour after ischemia onset. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; TH, therapeutic hypothermia.
Study designs
An overview of each study's design is provided in Table 1. In the six studies, there were seven total experiments in which durations were compared. Of the seven experiments, all seven used healthy young, male rats within 270 and 400 g. The models of focal ischemic stroke used are as follows: three experiments used intraluminal occlusion of the middle cerebral artery, and four experiments performed a craniotomy and distal middle cerebral artery occlusion (MCAO). Stroke durations varied from brief (45 minutes) to permanent ischemic insults. Target TH temperatures were within 30–35°C, with five experiments within 32–33°C, one at 30°C, and one at 34–35°C (Table 1). The method of hypothermia varied among studies; four studies performed only systemic cooling, one study performed only focal brain cooling, and one study compared systemic and focal cooling (Table 1). All experiments reported temperature monitoring during surgery and in the following hypothermic or normothermic period.
Summary of Studies that Investigated Two or More Durations of Therapeutic Hypothermia in Focal Ischemia Models
Cohen's d scores and average temperatures estimated from data provided in each study.
C, control; FC, focal control; FH, focal hypothermia; MCAO, middle cerebral artery occlusion; P, permanent; SC, systemic control; SDR, Sprague-Dawley rats; SH, systemic hypothermia; TH, therapeutic hypothermia; WKY, Wistar Kyoto rats.
In terms of temperature monitoring (Table 2), five experiments directly measured brain temperature and the remainder inferred brain temperature from core (one experiment) or rectal (one experiment) temperatures. The rate of rewarming was controlled in three experiments, spontaneous in one experiment, accelerated in one experiment, and not reported in two experiments; the rewarming rates ranged from 1°C/h to 14°C/h. There was discussion of temperature device monitoring calibrations in three of seven studies.
Temperature Data and Monitoring in Studies that Varied the Duration of Therapeutic Hypothermia in Focal Ischemia Models
Cerebroprotective efficacy
In six studies, six of seven experiments found greater infarct reduction at the longest durations tested compared with the shorter durations tested; percent reductions in infarct volume ranged from 31.2% to 83.9% after 1–3-hour delays in TH (Fig. 2), with survival times ranging from 1 to 30 days. Cohen's d was large (≥0.8) for infarct reduction at the longest durations in all seven experiments (Fig. 3). The average effect size (mean ± 95% CI) was 0.4 ± 0.3 at the shortest TH duration and 1.9 ± 1.1 at the longest.

Percent reduction in infarct volume at each duration in each study. In all studies, longer duration corresponded to greater neuroprotective efficacy. A dashed line and square indicate an experiment that tested focal TH (FH), a solid line and circle indicate an experiment that tested systemic TH (SH). Clark et al. (2009) comprised two experiments, differing by use of focal or systemic TH delivery; there was a duration-dependent effect for focal but not systemic cooling in this study. FH, focal hypothermia; SH, systemic hypothermia.

Behavioral recovery as an additional endpoint was employed in four of seven experiments and compared among durations of TH in three experiments. Examples of the tests used were Neurological Deficit Scales (NDS), horizontal ladder test, staircase reaching task, and cylinder task. Of the three experiments, all found functional improvement compared with normothermic rats, but did not find statistically different results between treatment groups (i.e., across varied durations), except for stepping rate in the horizontal ladder test, which was significantly improved with 48 hours of cooling (Clark et al., 2008). Three experiments calculated mortality and found no significant difference between normothermia and TH.
Quality of evidence
The findings for quality of evidence are presented in Table 3. These observations are based on what the authors reported. Of seven experiments, four reported their exclusion criteria, and no studies reported an a priori sample size determination, with sample sizes ranging from 6 to 27 and median of 12 animals per group. Six experiments used randomization, seven experiments monitored temperature throughout both surgery and TH, five experiments performed behavioral testing as an additional endpoint, and three studies reported outcome assessment blinding.
Quality of Evidence for Studies that Varied the Duration of Hypothermia
Risk of bias
The results of the risk of bias analysis are presented in Figure 4. Most experiments presented an unclear risk of bias in all domains, due mostly to a lack of explicit reporting on each factor. One study reported how they generated their randomization; there was an unclear risk of selection bias in the remaining five studies. Performance bias is difficult to eliminate given the nature of hypothermia experiments; random housing and treatment blinding are largely not possible (e.g., due to equipment constraints for the delivery of cooling). As such, all studies presented high or unclear performance bias. Detection bias was accounted for by three studies that reported blinded outcome assessment; this risk was unclear in the remainder of the studies. A risk of attrition bias was present in three studies due to a lack of reporting on exclusion criteria and subsequently if any animals were excluded from the analysis.

Assessment of the risk of bias in studies that varied the duration of TH using the SYRCLE Risk of Bias Tool (Hooijmans et al., 2014).
Each study reported on all measured outcomes and endpoints, and as such, was given a low risk of reporting bias. In terms of other biases, there were statistical concerns that we identified as at risk of bias or error. For example, one study did not establish noninferiority of the different durations, no studies which used analysis of variance (ANOVA) confirmed that the assumptions of ANOVA were met (e.g., normality, homogeneity of variance), and unclear use of post hoc tests in one study. Concerns were detected in four studies, and as such were given an unclear risk of other biases.
Discussion
In six of the seven experiments that directly compared two or more durations of TH in focal ischemia, cerebroprotective efficacy increased with longer durations. All included studies had a large effect at the longest duration of cooling (i.e., Cohen's d ≥ 0.8). The average effects among the longest and shortest durations of TH in each study were 1.9 and 0.28, respectively, with the longest cooling reducing infarction by >30%. This finding counters meta-analytic findings that there is no difference, or an inverse relationship between effect size and duration (Dumitrascu et al., 2016; Van Der Worp et al., 2007).
Using a between-study design (i.e., meta-regression) to investigate the impact of hypothermic duration may generate a low signal-to-noise ratio, given the stark heterogeneity in methodological design. For example, longer MCAO duration, treatment delays, and survival times would all lead a study toward weaker demonstrations of efficacy as compared with brief, immediate cooling with short survival times. Such biases would be difficult to account for in the extant literature. Comparing TH duration within a study may produce a higher signal-to-noise ratio and less systematic bias, and this type of analysis has not been previously done; this approach almost consistently demonstrates the correspondence between TH duration and cerebroprotective efficacy. However, this is across a few studies, with some quality and bias concerns.
As well, there are large gaps in dose–response research. Specifically, there are limited investigations at target temperatures >33°C, few studies explored prolonged durations (i.e., ≥24 hours), and no study established an ideal treatment dosage. Furthermore, these studies do not address differing factors related to the dosage (e.g., depth), stroke severity, or subject (e.g., comorbidities).
Following the publication of many of the studies herein, there have been numerous calls to action for more transparent reporting in animal research (Kilkenny et al., 2010; Landis et al., 2012; Percie du Sert et al., 2020). There are notable underreported aspects of the studies included in our analyses, including a priori sample size calculations, exclusions, blinding, and randomization.
Some analyses have found a correlation between a lack of reporting and overstated results; for example, studies on NXY-059 for focal ischemia that did not report blinding and randomization found significantly higher cerebroprotection than those that reported using these bias reduction measures (MacLeod et al., 2008). Similarly, among experimental stroke treatments, those who did not blind to ischemic induction found greater effect sizes than those who used blinding (O'Collins et al., 2006). Treatment blinding is nearly impossible with TH application, although outcome blinding and blinded induction of ischemia can and should be used, but it was only reported in half of the studies. Another concern was that many authors may have exercised flexibility in “researcher degrees of freedom” (Wicherts et al., 2016) given that adequate statistical analysis reporting was lacking in most (66%) of the studies of our analysis. In all, insufficient reporting and the risk of bias presented in these studies raises some concerns; while their results should not be ignored, they should be interpreted cautiously.
Considering the challenges of studying dose–response effects in clinical TH trials, and the implications of further clinical failures, we strongly believe in the need for further parametric animal research, which should be guided by existing clinical findings to broadly set the limits on methods and treatment parameters to evaluate. Specifically, investigations into the optimal duration of TH in the context of varied insult severities, treatment delays, and TH modalities are desperately needed. According to our search, there are only seven experiments, which have investigated dose–response for TH duration after focal ischemia, five of which are nearly 20 years old.
Currently, it is impossible to ascertain the plateau point in efficacy or point of diminishing returns using data from the existing literature. Only one experiment compared more than two durations within their study, the remaining six experiments compared two durations, which limits the scope of the dose–response relationship that could be interpolated. Pharmacological literature suggests that at least three doses (low, medium, high) be tested to establish a proper dose–response curve, and this design conceivably would allow TH researchers to establish TH protocols that optimize benefit while reducing risks associated with TH (Ting, 2006). Furthermore, the limited number of studies meant that more meaningful syntheses (i.e., meta-analysis) could not be performed owing to heterogeneous methods and experimental designs (e.g., endpoints, widely variable rewarming rates, ischemic durations, TH methods, etc.). Perhaps, however, we missed some articles as either because of our search strategy or because we only considered English language articles.
The STAIR and RIGOR guidelines recommend that dose–response research is undertaken for all putative cerebroprotectants to determine the maximum and minimum doses for a given therapeutic window to improve translation efforts (Lapchak et al., 2013; Lyden et al., 2021). There have been some TH investigations into alternate dose–response data such as depth after focal ischemia. For example, Kollmar and colleagues' (2007) investigation of six different temperatures at a 4-hour duration allowed them to estimate that 34°C is an optimal depth. The rate of rewarming is a key consideration, which may have been a confounding factor in the aforementioned study, and certainly in preclinical TH literature as a whole. For example, temperature–duration effects may be negated if rewarming differentially weakens the benefits of longer cooling. The recommended rate of rewarming is 0.1–0.25°C/h (e.g., to rewarm to 37°C from 34°C, a patient with high ICP should be rewarmed over 30 hours), meaning that cooling may also need to be prolonged out of necessity for controlled rewarming (Choi et al., 2012). No experiments in our analysis had controlled rewarming to this extent.
Contrary to these findings, Lyden and colleagues (2019) in vitro investigation into optimizing the depth and duration of TH for ischemia revealed more benefit from brief TH. The authors subjected primary cell cultures of Sprague-Dawley rat neurons, astrocytes, and endothelial cells to oxygen–glucose deprivation, and manipulated the temperature, delay, and duration of TH. One finding in agreement with previous literature was in cases of long delays to treatment; longer TH is necessary to exert cerebroprotection, which has been established in global ischemia (Callaway, 2023; Colbourne and Corbett, 1995; Colbourne et al., 1999). However, based on closer cellular analysis, Lyden et al. (2019) found that TH may interfere with innate astrocyte-mediated cerebroprotection. Thus, the authors suggest deep, stepwise hypothermia for short periods to provide protection while not stunting astrocyte activation nor promoting deleterious side effects (e.g., pneumonia, infection).
Stepwise hypothermia had been used in global and focal ischemia models, wherein the goal was to avoid the complications of prolonged moderate cooling, and serve as a controlled rewarming process in order to achieve persistent cerebroprotection (Colbourne et al., 2000; Colbourne et al., 1999; Nakamura et al., 1999). Nonetheless, the discrepancy between the effect of duration in vitro and in vivo merits further investigation at both the cellular and functional level of treatment.
Given the extensive literature on TH, we are in a “confirmatory” era of temperature management research, wherein, all studies should have a hypothesis and predetermined statistical analysis to prevent outcome switching, as well as having an a priori sample size calculation (Dirnagl, 2020). The studies analyzed herein seem to fall into the exploratory category of research design and as such their results should be interpreted critically.
According to a recent meta-analysis of 60 studies, there is little doubt that TH is a potent treatment for focal ischemic stroke, at least in animal models (Dumitrascu et al., 2016). In other words, exploratory research has found a signal for the efficacy of TH in MCAO models. Now, in line with modern translational concepts, we should shift our attention to optimize methods and modalities (e.g., pharmacological vs. physical hypothermia, focal vs. systemic hypothermia, short vs. long cooling, deeper vs. shallower cooling, and attention to rewarming rates), along with investigating long-term outcomes.
In general, now that exploratory research has demonstrated cerebroprotective efficacy, we must do high-powered confirmatory research with rigorous experimental designs to establish the boundaries under which TH is effective. For example, based on the results of the studies presented here, to detect a 30% treatment effect with 80% power likely requires 18 rats per group in a MCAO model. Furthermore, in the same model, 164 rats would be required to see 10% differences. This size may justify multicenter preclinical randomized controlled trials (Lyden et al., 2019).
In conclusion, the extant literature shows strong support for more prolonged cooling, but critical questions about dosage and concerns about quality remain unanswered. High-quality research into treatment parameters is needed in models that are representative of stroke patients (e.g., using both sexes, and with attention to age and comorbidity) with adequate statistical power to help guide TH dosage and patient characteristics for future clinical trials. In other words, future research must determine the lengths we need to go to optimize cerebroprotection and patient outcomes.
Footnotes
Authors' Contributions
F.C. and L.L. generated the concept for and supervised this project. M.E. and A.T. performed the systematic review. L.L. and F.C. resolved conflicts and approved the included studies. M.E. and A.T. wrote the article. All authors edited the article.
Author Disclosure Statement
M.A. reports being a member of the scientific advisory board of Palmera Medical, Inc., which is developing a technology for brain cooling. All other authors declare no conflicts of interest.
Funding Information
L.L. is supported by a Canadian Graduate Studies Doctoral Award from the Canadian Institutes of Health Research. Research supported by Canadian Institutes of Health Research (grant number 166087) awarded to F.C., M.A., and others.
