Abstract
Preclinical studies show that therapeutic hypothermia (TH) effectively reduces cerebral ischemic injury. In contrast, TH has not been consistently beneficial in clinical trials of stroke and cardiac arrest, perhaps from suboptimal dosing (e.g., delay, depth, and duration), among other factors. This systematic review aimed to find an optimal depth of TH from in vivo adult preclinical studies of global and focal ischemia. To study depth, without other confounds, we examined studies that compared ≥2 depths of TH versus normothermic controls. Our primary outcomes were infarct size (focal ischemia) and hippocampal cell death (global ischemia), while secondary outcomes were behavior, edema, and striatal cell death. Studies were assessed with the SYRCLE Risk of Bias tool (e.g., use of blinding) and additional indices of translational rigor (e.g., use of aged animals). Thirty studies were included from a search of the PubMed database in 2025. Many studies were rated as exhibiting a high risk of bias with low translational rigor. Overall, TH provided considerable protection on all endpoints, sometimes up to 100%, but no consistent dose–response patterns emerged, nor was an optimal depth of cooling readily evident. To explore the latter finding, specifically sampling variability, we conducted Monte Carlo simulations using the pooled standard deviation of the preclinical studies to generate three populations based upon a theoretical 5% protection per 1°C relationship (37°C vs. 32°C vs. 27°C groups run 75 times). Dose-dependent effects were statistically detectable in only 36% of comparisons, which showed comparably noisy patterns of protection. Thus, the variable dose-dependent effects in the reviewed animal studies likely arise, at least partially, from sampling error owing to using small samples from variable populations (average n = 8/group in focal ischemia). Overall, these findings highlight weaknesses in the extant dose–response literature that limit our ability to precisely guide clinical trials.
Keywords
Introduction
Cerebral ischemia, including global and focal events such as stroke, is a leading cause of death and disability worldwide (Vos et al., 2020). Ischemia engages numerous pathological processes, such as excitotoxicity and inflammation, that ultimately culminate in brain injury evolving over minutes to days, depending upon many modifiers, such as the degree and duration of ischemia (Rehman et al., 2024). While every deleterious process is temperature sensitive, some are highly influenced by even small changes in temperature, especially when occurring during ischemia or soon thereafter. The potent protective effects of therapeutic hypothermia (TH) have been suspected for millennia and established for decades (van der Worp et al., 2007). The harmful effects of hyperthermia and fever are also well known (Subramaniam et al., 2024). Despite a plethora of animal data establishing the cytoprotective potential of TH, only a few clinical successes have emerged, such as for moderate-severity hypoxic-ischemic events in neonates (Mathew et al., 2022). Finding consistent benefit against cardiac arrest and stroke has proven to be more challenging (Kuczynski et al., 2020; Whitelaw and Thoresen, 2023). Numerous experts have identified potential reasons for clinical failure, including the use of suboptimal treatment regimens, intentionally or otherwise (e.g., delays with inefficient cooling methodology), and the presence of side effects, including pneumonia and shivering (Beekman et al., 2024; Lyden, 2021; Seyhan, 2019).
Dose–response work in animal models can guide clinical trials to use more effective and safer interventions, ultimately increasing the probability of successful translation. However, despite decades of research, the best methods and dosing parameters of TH have yet to be established, leading to considerable methodological heterogeneity among trials and inconsistent findings (Beekman et al., 2024; Kuczynski et al., 2020; Taccone et al., 2020). Establishing an optimal dose requires the consideration of intervention delay, depth and duration of cooling, rate of rewarming, and method of cooling, as well as injury type and severity that speak to the amount of salvageable tissue. Patient characteristics, including age, sex, comorbidities, cerebral perfusion, etc., also require careful consideration. This complexity makes preclinical dose–response work especially challenging. Animal studies can help set the parameters for clinical studies (e.g., range in depth and duration to be studied), provide information on guiding principles (e.g., if longer cooling is needed with greater intervention delays), and offer mechanistic insights. Given the variability among studies, systematic reviews and meta-analyses play an essential role. However, meta-analyses of animal (Dumitrascu et al., 2016; van der Worp et al., 2007) and clinical data (Kuczynski et al., 2020) have not provided clear answers. Thus, we have started to restrict our analyses to studies that specifically vary treatment parameters. For example, we systematically reviewed animal studies that directly compared two or more durations of cooling, which showed that most studies report that longer periods (e.g., 24–48 hours) of delayed cooling provide superior cytoprotection in models of focal ischemia (Eberle et al., 2024). Still, the optimal parameters for TH remain largely unknown, evident by the high variability in protocols across trials and necessitating complex clinical evaluation, such as how the duration of TH is currently being evaluated in patients with cardiac arrest (Beekman et al., 2024).
Deeper cooling should hold greater potential, up to a point; however, lower temperatures often cause unacceptable side effects, such as increased risk of infection and arrhythmias. Thus, most clinical studies have used milder cooling protocols (Chiu et al., 2023; Kuczynski et al., 2020). Considering feasibility and safety for human intervention, preclinical literature has suggested depths ranging from 32°C to 36°C; indeed, clinical trials have employed target temperatures within this range, with 33°C being a common target (Bernard et al., 2002; Geurts et al., 2017; Kammersgaard et al., 2000; Lyden et al., 2014; Nielsen et al., 2013; Schwab et al., 1998; The Hypothermia after Cardiac Arrest Study Group, 2002; van der Worp et al., 2014). However, the relationship between depth and efficacy remains unclear, an idea well illustrated by the Targeted Temperature Management trial, which found no difference in efficacy between patients with cardiac arrest cooling to 33°C or 36°C (Nielsen et al., 2013).
The purpose of this review is to explore the dose–response relationship between depth of TH and treatment outcomes after global and focal ischemia. To do so, we systematically reviewed in vivo adult animal studies that compared two or more depths of TH versus normothermic controls within a single experiment. Primary endpoints included hippocampal CA1 cell death (global ischemia) or infarct volume (focal ischemia). Additional secondary outcomes were striatal cell death, edema, and behavioral assessments. Depth of TH was divided into three groups: >33°C (mildest cooling), 33–30°C (moderate cooling), and <30°C (deepest cooling). We hypothesized that lower temperatures would show the best protection. We only included papers that tested two or more depths of TH to allow an examination of dose–response profiles within each study without the confounds that come with cross-study comparisons, notably the use of variable methods of cooling, durations of cooling, severity of ischemia, etc. As such, we hoped to identify emerging dose–response patterns that could be used to guide further animal and clinical study. These relationships were investigated in the context of global and focal ischemia, and whether treatment was initiated immediately during ischemia or delayed. The quality of included studies was assessed according to the SYRCLE (Systematic Review Centre for Laboratory Animal Experimentation) Risk of Bias tool (Hooijmans et al., 2014) and supplemented with a custom scale assessing translational relevance. Further, using parameters extracted from studies included in the review, a Monte Carlo simulation was run to investigate the influence of sampling error on TH dose–response work in an idealized theoretical context.
Methods
Systematic review
Systematic search
A systematic search of the PubMed database (June 2024 and January 2025) was conducted and supplemented with Google Scholar and a citation review of relevant studies. These were performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (Page et al., 2021) utilizing the following search terms:
((stroke OR “ischemic stroke” OR “ischaemic stroke” OR “cerebral ischemia” OR “cerebral ischaemia” OR “focal ischemia” OR “focal ischaemia” OR “global ischemia” OR “global ischaemia” OR “MCAO” OR “middle cerebral artery occlusion” OR “4VO” OR “four vessel occlusion” OR “2VO” or “two vessel occlusion” OR “CCAO” OR “common carotid artery occlusion” OR “BCCAO” OR “bilateral common carotid artery occlusion”) AND (hypothermia OR cooling OR temperature) AND (animal OR preclinical OR “in vivo”)) NOT (neonatal[Title]) NOT (newborn[Title]) NOT (“traumatic brain injury”[Title]) NOT (renal[Title]) NOT (kidney[Title]) NOT (hypoxia[Title]) NOT (hypoxic[Title]) NOT (“subarachnoid hemorrhage”[Title]) NOT (“in vitro”[Title]) NOT (“piglet”[Title]))
Search results were imported into Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia; available at www.covidence.org) and reviewed by two independent reviewers (M.R.P. and T.K.). In accordance with the PICO (Population, Intervention, Comparison, Outcome) framework for systematic reviews, article inclusion criteria consisted of P: preclinical, adult, in vivo models of global and focal ischemia; I: testing two or more depths of TH within one experiment; C: a normothermic control group; and O: quantification of injury, such as infarct volume or cell survival. Secondary outcomes of edema and functional outcome were additionally extracted. Exclusion criteria consisted of neonatal or newborn stroke models, clinical or in vitro studies, hemorrhagic stroke or traumatic brain injury models, and models of hypoxic injury. Only full texts available in English were included. Additional concerns within the papers, such as statistical concerns, were noted and reviewed by a third reviewer (F.C. or L.J.L.) to decide final inclusion.
Data extraction
Two reviewers (M.R.P. and T.K.) extracted study characteristics. Intervention characteristics consisted of the cerebral ischemia model used, the duration of ischemia, the time of TH onset in relation to ischemia, the method of cooling, duration of cooling (i.e., from onset to return to normothermia or death), location of temperature monitoring, and target temperatures reached during TH. Additional translational characteristics that were extracted were the animal species, age, and sex of animals used, sample sizes per group, and survival time.
Risk of bias and translational rigor
All included studies were assessed according to the SYRCLE Risk of Bias tool (Hooijmans et al., 2014) for animal experiments by two reviewers (M.R.P. and T.K.). Selection, performance, detection, attrition, and reporting bias were rated as being low risk, some concern of risk, or high risk of being present in each study. For a comprehensive explanation of each category, refer to the SYRCLE Risk of Bias tool (Hooijmans et al., 2014). Due to the nature of TH cooling methods (ice bags, cooling fans, etc.), blinded treatment may not be plausible. As such, studies were rated as being at some risk for concern in this category unless explicitly stated otherwise by the author.
In addition to SYRCLE, a custom scale was employed to assess the translational rigor of included studies. The scale assessed whether studies: (1) included functional outcomes, (2) performed a priori sample size calculations, (3) stated exclusion criteria and any exclusions or mortality, (4) considered long-term outcomes, (5) used aged animals, (6) used female animals or animals with comorbidities.
Statistical analysis
The primary outcome in global and focal ischemia studies was percent infarct reduction and hippocampal cell protection, respectively, which was calculated for each TH depth relative to corresponding controls. Percent striatal cell protection was also calculated when given. Where possible, Cohen’s d (presented as d = effect size) was calculated to allow for a standardized comparison of effect size for edema reduction and functional improvement for each TH depth relative to corresponding controls (Goulet-Pelletier and Cousineau, 2018). Further, studies were combined according to the target temperature (>33°C, 33–30°C, and <30°C) to acquire a mean effect size across studies for each outcome, reported as Cohen’s d ± 95% confidence intervals (CI). In calculating Cohen’s d, we coded effect sizes favoring TH treatment as positive and those favoring controls as negative to ensure consistent directionality of effects.
If exact values were not available, values were extracted from figures using the figure calibration plugin for ImageJ (version 1.54 g; Java 1.8.0_421 [64-bit]) (Schneider et al., 2012). Where possible, standard error of the mean and interquartile ranges were converted to standard deviation, and median converted to mean. When group sizes were given as a range, the midpoint value was used. For hippocampal cell survival, data were extracted according to the following hierarchy: (1) CA1 measurements, (2) otherwise, whole hippocampal measurements. Striatal cell survival data were extracted according to the following hierarchy: (1) whole striatum measures, (2) otherwise, dorsal striatum or combined caudate and putamen measurements. For secondary outcomes, all behavioral tests (i.e., neurological deficit scores and individual sensorimotor tasks) and all edema measures were included. Only the latest time point was extracted for both primary and secondary outcomes.
All statistical analysis including sample size calculations and figure creation was completed using R Studio software (R version 4.4.1, R Core Team [2024]) (R Core Team, 2024). As per Cochrane guidelines, we did not conduct a meta-analysis due to methodological heterogeneity across studies and the limited number of studies fitting inclusion criteria (Cumpston et al., 2023).
Monte Carlo simulation
A Monte Carlo simulation was run using R Studio (R version 4.4.1, R Core Team [2024]) (R Core Team, 2024) to investigate the influence of experimental design in the context of a simplified, idealized dose–response relationship between temperature and infarct volume reduction in focal ischemia. This was a primary outcome and likely representative of all endpoints to some extent. The intention of the Monte Carlo was not to estimate the true relationship between temperature and infarct volume, but instead to model the relationships among treatment groups and how they are impacted by sample size. For this, populations were generated for the simulation. A linear relationship between temperature and infarct volume was created based on a simplified model of brain metabolism, assuming that a 1°C decrease in temperature results in a 5% decrease in metabolic rate and infarct volume (Yenari et al., 2008). Using this simplified relationship, three populations were created with n = 10,000 each: a normothermic ischemic control group (100% of maximal infarct volume, equivalent to normothermic stroke controls), a “32°C” ischemic TH group (75% of maximal infarct volume, equivalent to 25% savings), and a “27°C” ischemic TH group (50% of maximal infarct volume, equivalent to a 50% savings). A pooled standard deviation of infarct volume measurements across all included in vivo focal ischemia studies was calculated and used as the standard deviation for all the simulated populations (σ = 32.76).
To mimic the experimental design of studies included in this review, samples of n = 8 (i.e., the average sample size across included focal ischemia studies) were drawn from each population, and the mean and standard deviation of each sample were calculated. This resulted in three samples of n = 8 each: one from the normothermic population, one from the 32°C TH population, and one from the 27°C TH population, mimicking a single “study.” This entire process was then repeated 15 times (i.e., the number of focal ischemia studies included in our review). This resulted in 15 “studies” (for a total of 45 individual samples), mimicking the totality of the “literature base” obtained in this review. Lastly, we repeated this process five times (i.e., obtaining a total of 75 sample means per population, or 225 total samples overall), chosen arbitrarily to allow for a more robust investigation of the variability between simulated “studies.”
After obtaining the samples, the means of all groups were compared (i.e., the 37°C samples were compared with the 32°C and 27°C samples, and the means of the 32°C and 27°C samples were compared against each other). Each run (i.e., one sample of n = 8 for each of the three populations) was treated as a “study” comparing two TH depths to a normothermic control group. The three groups were compared within “studies” using analysis of variance with post hoc Tukey comparisons (as this is a commonly used post hoc test) to obtain a liberal estimate of statistical significance. Two-tailed probability values of p < 0.05 were considered statistically significant. We then recorded when samples within a “study” were statistically distinguishable from one another (i.e., when the “study” was able to identify a dose–response relationship).
Results
Search
Our search terms yielded 3458 titles and abstracts and 67 relevant full texts. Of these, 40 full texts were excluded for one of the following reasons: not measuring infarct size or cell survival, not comparing two or more depths of TH within the same experiment or collapsing results across TH depths, not having a comparable control group, being an in vitro study, not having extractable data for any specified outcomes, or not being available in English. Finally, three additional relevant full texts were identified from Google Scholar and reference searches. This resulted in a total of 30 studies including 31 experiments in the final review (Fig. 1).

PRISMA flowchart depicting the selection of studies during our systematic search. We included full texts available in English, studying in vivo, animal models of cerebral ischemia that tested two or more TH depths. We excluded neonatal or newborn stroke models, hemorrhagic stroke models, traumatic brain injuries, or hypoxic injuries. Of the final 30 studies included, 15 induced focal ischemia and 15 induced global ischemia. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; TH, therapeutic hypothermia.
Study characteristics and risk of bias
Fifteen studies used global ischemia models (Table 1) (Busto et al., 1987; Clifton et al., 1989; Colbourne and Corbett, 1995; Conroy et al., 2001; Hong et al., 2021; Kuluz et al., 1992; Leonov et al., 1990; Li et al., 2011; Minamisawa et al., 1990; Nakamura et al., 1999; Ooboshi et al., 2000; Weinrauch et al., 1992; Welsh and Harris, 1991; Xu et al., 2022; Yamashita et al., 1991). Rats were the most commonly used species (46.7%). The most-used global ischemia model was a bilateral common carotid artery occlusion (2-VO), and duration of occlusion ranged from 5 to 30 minutes among studies.
Descriptive Summary of Global Ischemia Studies
Summary of relevant study characteristics. Temperatures in italics are normothermia (i.e., control group temperature). Notably, Kuluz et al. (1992) had no normothermic animals survive past 24 hours. Survival times also differed in the Colbourne and Corbett (1995) paper and Conroy et al. (2001).
BCCAo, bilateral carotid artery occlusion; CPB, cardiopulmonary bypass; SD, Sprague–Dawley rats; SHR, spontaneous hypertensive rats; SPF, specific-pathogen free; VF, ventricular fibrillation; VO, vessel occlusion; WKY, Wistar–Kyoto rats.
Fifteen studies used models of focal ischemia (Table 2) (Barone et al., 1997; Campbell et al., 2013; Chang et al., 2015; Goto et al., 1993; Huang et al., 1999; Huh et al., 2000; Kader et al., 1992; Kim et al., 2022; Kollmar et al., 2007; Kurasako et al., 2007; Lee et al., 2018; Maier et al., 1998; Omileke et al., 2021; Wainwright et al., 2002; Wu et al., 2009). Rats were also the most commonly used species (93.3%). All but three studies used transient middle cerebral artery occlusion (tMCAo) models ranging from 70 to 240 minutes.
Descriptive Summary of Focal Ischemic Studies
Summary of relevant study characteristics. Average sample sizes were used if only ranges for groups size was provided (noted with *). Temperatures in italics are normothermia (i.e., control group temperature). Notably, Lee et al. (2018) used different regimens for animals undergoing behavioral testing (b) then for infarct assessment. In addition, Chang et al. (2015) reported normothermic temperatures outside of expected range, potentially due in part to concurrent craniectomy procedures.
pMCAo, permanent middle cerebral artery occlusion; SHR, spontaneous hypertensive rats; SD, Sprague–Dawley rats; tBCCAo, transient bilateral carotid artery occlusion; tCCAo, transient common carotid artery occlusion; tMCAo, transient middle cerebral artery occlusion; WKY, Wistar–Kyoto rats.
Overall, these studies were generally at some or high risk in most SYRCLE domains (Tables 3 and 4). For example, selection bias was concerning as 50% of studies did not state or use a method of randomization. As noted, all studies also had some risk of performance bias resultant from the lack of blinded treatment due to the nature of TH induction. Our custom translational rigor scale found the overall quality of the included studies was poor (Tables 3 and 4). Only 46.7% of studies assessed functional outcomes, and only 40% of studies reported exclusion criteria and exclusions. Only two global ischemia studies used animals with comorbidities, with one inducing hyperglycemia in female Yorkshire pigs (Conroy et al., 2001) and one using aged female spontaneously hypertensive rats (SHRs) (Ooboshi et al., 2000) (Table 3). Only one global ischemia study assessed long-term outcome assessments (Colbourne and Corbett, 1995). Three global ischemia studies used female animals (Colbourne and Corbett, 1995; Conroy et al., 2001; Ooboshi et al., 2000). Four studies of focal ischemia used comorbid models (male SHRs) (Barone et al., 1997; Campbell et al., 2013; Huang et al., 1999; Kurasako et al., 2007), and no studies used aged or female animals (Table 4).
Risk of Bias and Translational Rigor of Global Ischemia Studies
SYRCLE Risk of Bias tool rating:
high risk of bias;
some risk of bias;
low risk of bias.
Translational rigor scale:
not present;
present.
Calc., calculation; Comorbid., co-morbidities; Detect., detection; Excl., exclusions; Funct., functional; L.T., long term; Out., outcomes; Perf., performance; Report., reporting; S.S., sample size; Select., selection.
Risk of Bias and Translational Rigor of Focal Ischemia Studies
SYRCLE Risk of Bias tool rating:
high risk of bias;
some risk of bias;
low risk of bias.
Translational rigor scale:
not present;
present.
Calc., calculation; Comorbid., co-morbidities; Detect., detection; Excl., exclusions; Funct., functional; L.T., long term; Out., outcomes; Perf., performance; Report., reporting; S.S., sample size; Select., selection.
TH characteristics
Temperature measurements were most commonly from the core/rectum (36.7%) or brain (30%). Almost all studies induced systemic TH (90%). Target temperatures ranged from 15°C to 36°C in global ischemia studies (Fig. 2A) and from 27°C to 36°C in focal ischemia studies (Fig. 2B). One study reported a normothermic temperature well outside of the expected range (32.7°C; Chang et al., 2015), which may have been due to simultaneous craniectomy procedures. Overall, most studies compared two (73%) or three (27%) depths of TH (vs. controls), with one study testing five depths (Fig. 2A, B). Many studies used intergroup gaps of 2°C or more, with intervals ranging from 1°C to 15°C in global ischemia studies (Fig. 2A) and 1°C to 5°C in focal ischemia studies (Fig. 2B). In global ischemia studies, onset of TH ranged from 120 minutes before ischemia to 120 minutes after reperfusion, with total cooling durations ranging from 15 to 1490 minutes (Table 1). In focal ischemia studies, onset of TH ranged from 70 minutes before ischemia to 60 minutes after ischemia onset, with total cooling durations ranging from 60 to 1580 minutes (Table 2). In global ischemia studies, 53% delayed TH onset, and in focal ischemia studies, 33% delayed TH onset. Additional details are provided in Tables 1 and 2.

Temperature groups assessed in the
Cerebroprotective efficacy in global ischemia
Hippocampal cerebroprotection (% injury reduction expressed as a % of sham controls) was the primary outcome of global ischemia studies (Fig. 3A). Six experiments assessed intra-ischemic TH and all found TH to be protective; two experiments found lower temperatures to be more protective (Kuluz et al., 1992; Ooboshi et al., 2000), while three reported little to no depth dependency (Clifton et al., 1989; Conroy et al., 2001; Yamashita et al., 1991). The other experiment displayed a nonlinear depth–dependent relationship (Kollmar et al., 2007). Seven experiments assessed delayed TH onset; two experiments found lower temperatures to be more protective (Leonov et al., 1990; Li et al., 2011), one favored higher temperatures (Welsh and Harris, 1991), one displayed a nonlinear depth-dependent relationship (Weinrauch et al., 1992), and three showed little to no depth-dependent effect (Colbourne and Corbett, 1995; Hong et al., 2021; Nakamura et al., 1999).

Impact of intra-(open symbols) and delayed TH (closed symbols) on hippocampal and striatal cell death, and functional recovery in models of global ischemia. In
Five experiments assessed striatal cell survival (Fig. 3B). Three experiments found lower temperatures to be more protective (Busto et al., 1987; Kuluz et al., 1992; Leonov et al., 1990), and one displayed a nonlinear depth-dependent relationship (Minamisawa et al., 1990). The other experiment found higher temperatures to be more protective, identifying a harmful effect of hypothermia on striatal cell survival at 15°C (39.67% increase in injury) (Weinrauch et al., 1992).
Five experiments assessed functional outcomes, which are only shown as Cohen’s d values owing to the inclusion of multiple behavioral tests (Fig. 3C). Two experiments found higher temperatures to be more protective (Hong et al., 2021; Xu et al., 2022), one favored lower temperatures (Kuluz et al., 1992), one found little to no depth-dependent effect (Leonov et al., 1990), and one displayed a nonlinear depth-dependent relationship (Weinrauch et al., 1992).
Generally, hippocampal cell protection appears greatest at temperatures between 33°C and 30°C (d = 5.61 ± 4.08) when compared with temperatures >33°C (d = 2.93 ± 1.40) and <30°C (d = 2.72 ± 5.77; Fig. 3D). In experiments assessing both hippocampal and striatal cell survival, three found similar protective effects of TH across both regions (Kuluz et al., 1992; Leonov et al., 1990; Minamisawa et al., 1990), whereas one found opposite efficacy (Fig. 3) (Weinrauch et al., 1992). Generally, striatal cell protection appears greatest at temperatures >33°C and between 33°C and 30°C (d = 2.35 ± 1.39 and d = 3.11 ± 7.43, respectively; Fig. 3D). In experiments assessing both hippocampal survival and functional outcomes, two studies showed the same trend across both outcomes (Kuluz et al., 1992; Weinrauch et al., 1992) and two found a conflicting trend (Hong et al., 2021; Leonov et al., 1990). Generally, functional outcome appears greatest at temperatures >33°C and between 33°C and 30°C (d = 5.07 ± 4.92 and d = 6.07, respectively; Fig. 3D).
Cerebroprotective efficacy in focal ischemia
Percent infarct reduction was the primary outcome of focal ischemia studies. Ten experiments assessed intra-ischemic TH studies (Fig. 4A): six found lower temperatures to be more protective (Barone et al., 1997; Goto et al., 1993; Kurasako et al., 2007; Lee et al., 2018; Omileke et al., 2021; Wu et al., 2009), two favored higher temperatures (Kim et al., 2022; Maier et al., 1998), and two showed little to no depth-dependent effects (Kader et al., 1992; Wainwright et al., 2002). Six experiments assessed delayed TH: two slightly favored lower temperatures (Campbell et al., 2013; Kollmar et al., 2007), one favored higher temperature (Huh et al., 2000), and two showed little to no depth-dependent effects (Chang et al., 2015; Huang et al., 1999). The other study displayed a nonlinear depth-dependent relationship (Kollmar et al., 2007).

Impact of intra-(open symbols) and delayed TH (closed symbols) on infarction and edema reduction, and functional recovery in models of focal ischemia. In
Six experiments assessed edema, which is shown as Cohen’s d values owing to the use of different reporting methodologies (Fig. 4B). The edema results mirrored those of infarct volume, where four studies favored lower temperatures (Goto et al., 1993; Huang et al., 1999; Kurasako et al., 2007; Wu et al., 2009) and one favored higher temperature (Huh et al., 2000). The other study displayed a nonlinear depth-dependent relationship (Kollmar et al., 2007).
Eight experiments assessed functional improvements, which are shown as Cohen’s d values (Fig. 4C). Five experiments found higher temperatures to be more protective (Huh et al., 2000; Kim et al., 2022; Kollmar et al., 2007; Lee et al., 2018; Maier et al., 1998), one favored lower temperatures (Wainwright et al., 2002), one showed little to no depth-dependent effect (Campbell et al., 2013), and one displayed a nonlinear depth-dependent relationship (Kollmar et al., 2007).
Generally, infarct reduction appears greatest at temperatures <30°C (d =2.32 ± 1.57) when compared with >33°C (d = 1.61 ± 0.39) and 33–30°C (d =1.31 ± 0.31; Fig. 4D). Edema reduction appeared greatest at temperatures <30°C (d = 1.72 ± 2.12) and was comparable at temperatures both >33°C and between 33°C and 30°C (d = 0.71 ± 0.62 and d = 0.76 ± 1.04, respectively; Fig. 4D). Three experiments showed the same trend across infarct volume and functional outcomes (Huh et al., 2000; Kim et al., 2022; Maier et al., 1998), while five displayed inconsistent depth effects (Campbell et al., 2013; Kollmar et al., 2007; Lee et al., 2018; Wainwright et al., 2002). Functional outcomes were comparable at temperatures >33°C and between 33°C and 30°C (d =1.28 ± 0.72 and d =1.55 ± 2.59, respectively; Fig. 4D).
Monte Carlo
Given the wide range in absolute treatment effects and dose–response trends among studies (even at the same temperatures), we used Monte Carlo simulations to get a sense of the contribution of sampling error to explain the in vivo findings, and for this we focused on infarct volume in focal ischemia models. Three populations were generated (with the same variance as our reviewed focal ischemia studies) by estimating a simplified linear dose–response relationship between temperature and infarct volume, such that a 1°C drop in temperature corresponded to a 5% decrease in infarct volume relative to normothermic controls. In our simulations (Fig. 5), the 27°C group (50% protection) was reliably different from the normothermic group (95% of comparisons). In contrast, the normothermic and 32°C groups (25% protection) were only statistically distinguished from each other in 39% of comparisons, and the two TH groups were only statistically different in 36% of comparisons. Thus, despite the true population differences in our simulation (using effect sizes expected to be of high clinical significance), with sample sizes of n = 8, only 36% of “studies” were able to identify a statistically significant dose-dependent effect of TH depth on infarct volume reduction. Although these findings do not determine how much variability within the in vivo studies arises from sampling error, they do suggest that it is considerable.

A Monte Carlo simulation run using experimental parameters mimicking the in vivo focal ischemia studies included in our review. As described in the Methods, three populations were generated for which the mean and SD are shown as the first black square of each group. Group samples of eight each (denoted by circles, mean ± SD) were randomly obtained from each simulated population and then statistically compared (Tukey tests). Black shading in the “32°C” TH group denotes those statistically significant comparisons from the “normothermia” group (i.e., cytoprotection). Likewise, black shading in the “27°C” TH group denotes those that had significantly less injury than the “32°C” TH group. Gray circles in the “27°C” and “32°C” groups denote nonsignificant differences. Almost all (95%) of the “27°C” and “normothermia” comparisons were significant (not shown on the graph). SD, standard deviation; TH, therapeutic hypothermia.
Discussion
Our review examined the dose–response patterns within studies to ascertain whether any consistent findings emerged in global and focal ischemia models, and with comparing early versus delayed cooling. With certainty, we conclude that the overarching result of these studies is that many depths of TH are cerebroprotective to varying amounts, at least when applied during and soon after global and focal ischemia. This is consistent with numerous single-dose studies and previous meta-analyses (Dumitrascu et al., 2016; van der Worp et al., 2007). Generally, effect sizes were largest when target temperatures were between 33°C and 30°C for global ischemia and deeper than 33°C for focal ischemia. Interestingly, mild TH protocols (>33°C) were least effective. However, the certainty of this dosage evidence is weak because, while the extent of hippocampal protection and infarct size reduction was large, there were no consistent depth-dependent effects. Indeed, only 20% of global and 12.5% of focal experiments found a statistically significant depth-dependent effect on our primary outcomes. Of these five significant results, three indicated that lower temperatures were more effective (∼28–33°C) (Conroy et al., 2001; Goto et al., 1993; Yamashita et al., 1991) and two found that intermediate temperatures (33–34°C) were more protective (Kollmar et al., 2007; Minamisawa et al., 1990). These contradictory findings are not dissimilar from previous meta-analyses, one of which identified deeper cooling to be more neuroprotective, whereas the other failed to find a depth-dependent effect (Dumitrascu et al., 2016; van der Worp et al., 2007).
Study design factors certainly impact dose–response studies, but the effects of these factors are impossible to tease apart with the limited number of studies in this review. For instance, ceiling effects (flat dose–response curve) may have occurred with mild insults where all TH protocols are maximally protective. Conversely, floor effects may have occurred with very severe insults where all TH protocols are ineffective. Visually, it appears that these effects occurred in several studies. Limited separation of treatment dosages may also make it difficult to observe important trends within and beyond selected dosages owing to insufficient statistical power (false negatives). On that issue, highly variable data (e.g., focal ischemia models) and small sample sizes are likely to lead to wide estimates of population parameters (e.g., means). The average sample size of the studies reviewed here matches well with what others use in neuroscience and with what has been reported in the rodent stroke literature (Schmidt-Pogoda et al., 2020). Therefore, by chance alone, one might expect considerable inaccuracies within studies and inconsistency among studies. Indeed, our Monte Carlo simulation, which was modeled on the focal ischemia literature, supports that claim. In our simulation, a statistically significant difference between the two TH depths on infarct volume reduction was only identifiable in 36% of comparisons when employing the average sample size of our reviewed literature (n = 8). In addition, power calculations based on the reviewed literature and these parameters (80% power, α = 0.05, σ = 32.76) show that the required sample sizes of 169, 27, and 8 are needed to identify effect sizes of 10%, 25%, and 50%, respectively. Thus, the focal ischemia literature in our review generally had the power to identify large effects (unlikely to be seen clinically), but not smaller effects that are often sought out in dose–response work. In order to achieve adequate sample sizes, future dose–response studies with TH will likely require a team approaches as has been done for some candidate cerebroprotective drugs (Morais et al., 2023). In summary, a considerable amount of variability within and among studies likely arose from sampling error, which undoubtedly is further compounded by study design issues, including biases.
The dose–response studies, in general, were at some or high risk of bias. For example, 50% of studies did not perform (or report) randomization, 33% of studies did not perform (or report) blinding, and only 40% of studies reported exclusion criteria; a lack of blinding, randomization, and incomplete outcome reporting have all been associated with inflated effect sizes (Holman et al., 2016, 2015; Macleod et al., 2008; O’Collins et al., 2006; Schmidt-Pogoda et al., 2020). While we cannot discern whether and how such biases impacted the treatment patterns observed here, they are concerning. Many studies were also of low translational quality as compared with STAIR guidelines (Fisher et al., 2009; Lyden et al., 2021; Stroke Therapy Academic Industry RoundTable (STAIR), 1999). For instance, 73% of studies used young, healthy male animals. All studies also used very early intervention, with the longest delay to starting TH being 60 minutes. Such designs lead to inflated effect size estimates (Lourbopoulos et al., 2021; O’Collins et al., 2006; Pulvers and Watson, 2017; Schmidt-Pogoda et al., 2020) that do not reflect well on the clinical situation, thereby impeding translation (Hirst et al., 2014; McCann and Lawrence, 2020). Lastly, only 43% of studies assessed functional outcomes, and only one study assessed long-term outcomes. Functional outcomes are the primary endpoints in clinical studies, and since histological measures alone are not always well correlated to functional recovery, the use of neurobehavioral assessments in translational research is key (Fisher et al., 2009; Narayan et al., 2021). Indeed, 32% of experiments found directly opposing trends on measures of histological injury and functional outcome, which might reflect a true discrepancy or have arisen out of chance or another factor.
Technical and safety considerations limit the depths of TH that can be used in patients. However, advances in technology, such as with focal cooling methods, may allow one to cool to greater depths in some settings (Liddle et al., 2022). Thus, it makes sense to at least consider a wide range of TH depths in animal work, which was done in some experiments (TH <30°C was tested in 35% of studies). Most studies (74%), however, only evaluated mild cooling (>33°C), potentially neglecting more efficacious treatment protocols. Another important consideration is whether studies compared a sufficient number of dosages (i.e., TH depths) to provide reasonable guidance for future research. In our review, 73% of experiments tested only two depths of TH, which makes it very difficult to ascertain an optimal TH target. Another significant concern is the interdependency of TH depth with other treatment (e.g., duration of cooling) and situation factors (e.g., insult severity), which none of the studies evaluated. We observed the same situation when we reviewed studies examining the duration of cooling (Eberle et al., 2024). While these single-variable designs were likely employed for feasibility, it means that the true depth dependency of TH has been only dimly illuminated.
The present study is not without limitations. First, as we only included in vivo studies available in English, relevant studies in other languages may have been missed, as were in vitro experiments. Second, we extracted the values for the normothermic and cooled groups as stated by the authors, which may have impacted group comparisons, as true normothermia depends on several factors such as temperature measurement location, time of day, etc. Generally, our impression was that temperature reporting in the reviewed literature was often quite limited, as noted in other reviews (Klahr et al., 2017). Third, the reviewed studies used small sample sizes and reported unequal variability across groups; thus, our effect sizes calculated using Cohen’s d may be overly liberal (Goulet-Pelletier and Cousineau, 2018; Marfo and Okyere, 2019). Fourth, the lack of null and negative findings, as well as the very large effect sizes in our dataset, strongly suggest the presence of publication bias (Mlinarić et al., 2017), which we could not formally quantify here in a systematic review. Accordingly, there may be some error in our estimates of overall efficacy, dose–response relationships, and even power calculations (e.g., if more variable studies were not published). Fifth, we extracted only the latest time point for the primary and secondary outcomes; thus, earlier time points may have shown different results. Sixth, we did not conduct Monte Carlo simulations for the global ischemia literature, which may have yielded somewhat different results. Seventh, our simulation was designed with a simple linear difference among groups that is likely much simpler than reality; thus, it is reasonable to assume that in actual experimental contexts, TH would likely provide less benefit with perhaps smaller gradations among temperatures. If so, even larger sample sizes would be required than what we estimated. Finally, while conducting a meta-analysis may have allowed us to provide an estimated optimal TH depth, we intentionally did not statistically combine studies. Primarily, we had an insufficient number of studies to meaningfully investigate potential mediating factors (e.g., insult severity, method of cooling, onset of TH), and meta-analyzing our results in light of vast differences in study design is likely to have provided misleading results (Hooijmans et al., 2022). Further, the quality of included studies was low, meaning our findings would have been prone to Type I error, with low certainty at best. Thus, we determined that our results would be more appropriately conveyed in a systematic review format.
Conclusions
Clinically, TH has proven benefit for hypoxic-ischemic events in newborns (Mathew et al., 2022). Hypothermia has also shown promising results as a treatment for cardiac arrest in adults (though not without controversy; Chiu et al., 2023; Rout et al., 2020), and in adult ischemic stroke patients, cooling has been tested repeatedly, with conflicting and lackluster findings (Kuczynski et al., 2020). While animal studies have been instrumental in supporting and guiding these clinical trials, the decision on treatment parameters often seems to be driven as much or more by concerns about practicality and safety, which are essential considerations. Preclinical studies have the advantage of being able to systematically assess a wider range in treatment parameters to optimize treatment regimens when safety and practicality are less concerning, which can be especially relevant as technology improves. Owing to the importance of preclinical research in identifying treatment parameters, there have been numerous experiments, expert reviews, systematic reviews, and meta-analyses covering dosage issues, with contradictory results (Dumitrascu et al., 2016; Van Der Worp et al., 2007). In this systematic review, we focused only on animal studies that conducted dose (depth)–response experiments. Nonetheless, despite identifying 31 relevant experiments from 30 studies, we were not able to confidently select the best depth(s) of TH to guide future clinical studies, although our analysis is encouraging in that many different depths of TH were found to be beneficial. Notably, the current dose–response literature is highly variable in both model and treatment factors (insult severity, duration of cooling, etc.), making cross-study comparisons difficult. To truly improve our understanding of TH efficacy, research simultaneously co-varying multiple parameters of TH (e.g., depth and duration) is needed. Further complicating the issue, patient characteristics (e.g., stroke severity and extent of salvageable tissue, sex, and comorbid status) are likely to influence optimal TH parameters as well and may require further investigation. To help ensure translational success, future experiments must use sufficient sample sizes with translationally relevant time frames and populations known to affect treatment efficacy (Fisher et al., 2009; Lyden et al., 2021).
Authors’ Contributions
F.C. conceptualized and supervised the study. A.B.T., F.C., L.J.L., and M.R.P. planned the study. Screening and extraction were done by M.R.P. and T.K. while M.R.P. and T.F.C.K. conducted all formal statistical analysis. M.R.P. and F.C. wrote the original draft which was reviewed, edited, and approved by all authors. A.C.K., T.F.C.K., and L.J.L. provided additional expert review of the article.
Footnotes
Acknowledgments
The authors would like to thank Dr. Pete Hurd for his help and advice with statistical analysis.
Author Disclosure Statement
All authors declare no conflicts of interest.
Funding Information
L.J.L. and T.F.C.K. were supported by a Canadian Graduate Studies Doctoral Award from the Canadian. Institutes of Health Research. The research was supported by Canadian Institutes of Health Research (grant number 166087) awarded to F.C.
