Abstract
The likelihood of translating therapeutic interventions for stroke rests on the quality of preclinical science. Given the limited success of putative treatments for ischemic stroke and the reasons put forth to explain it, we sought to determine whether such problems hamper progress for intracerebral hemorrhage (ICH). Approximately 10% to 20% of strokes result from an ICH, which results in considerable disability and high mortality. Several animal models reproduce ICH and its underlying pathophysiology, and these models have been widely used to evaluate treatments. As yet, however, none has successfully translated. In this review, we focus on rodent models of ICH, highlighting differences among them (e.g., pathophysiology), issues with experimental design and analysis, and choice of end points. A Pub Med search for experimental ICH (years: 2007 to 31 July 2011) found 121 papers. Of these, 84% tested neuroprotectants, 11% tested stem cell therapies, and 5% tested rehabilitation therapies. We reviewed these to examine study quality (e.g., use of blinding procedures) and choice of end points (e.g., behavioral testing). Not surprisingly, the problems that have plagued the ischemia field are also prevalent in ICH literature. Based on these data, several recommendations are put forth to facilitate progress in identifying effective treatments for ICH.
Introduction
Intracerebral hemorrhage (ICH) is a devastating stroke that leads to significant disability in survivors and a high mortality rate (Sacco et al, 2009). Spontaneous nontraumatic ICH usually stems from vessel damage owing to hypertension and amyloid angiopathy (Adeoye et al, 2011; Broderick, 1994; Kirkman et al, 2011; Pezzini et al, 2009), among other causes. The underlying pathology determines the ICH location. For example, brainstem and subcortical structures (e.g., basal ganglia) are commonly affected in hypertensive ICH. Some treatments, including antihypertensive medications, significantly reduce the risk of ICH, but there are no proven therapies for treating an ICH once it has occurred. Nonetheless, aggressive medical management improves outlook (Morgenstern et al, 2010), and there is considerable preclinical evidence to suggest that neuroprotective interventions may one day lessen the burden of this disease.
Those in the field of cerebral ischemia realize all too well that neuroprotection is not a concept easily translated to patient care. Despite identifying many hundreds of putative neuroprotective agents in animal models, none of the dozens tested has statistically improved survival or reduced disability after ischemic stroke in patients. Many authors have identified serious concerns with preclinical and clinical research, which at least partially explain the dismal clinical findings with neuroprotectants (Dirnagl, 2006; O'Collins et al, 2006; Stroke Therapy Academic Industry Roundtable (STAIR), 1999). For instance, O'Collins et al (2006) showed that the treatments taken to clinical trial were not necessarily the best candidates among those evaluated in preclinical studies. With regard to these preclinical studies, there are numerous potential problems, which we categorize into: (1) animal modeling, (2) study design, and (3) choice of end points. An example of the first issue is the oft-cited problem of solely relying on healthy, young, male rats without use of females and older animals, especially those with appropriate comorbidities (e.g., hypertension, amyloid angiopathy). A recent study of meta-analyses showed that the use of healthy animals overestimated effect size (infarct volume reduction after focal ischemia) by 11.5% (Crossley et al, 2008). The lack of appropriate physiological controls is another modeling issue that has plagued the ischemia literature. Indeed, the fact that many ‘neuroprotective' drugs affect temperature, often causing hypothermia, has been an especially problematic issue in the ischemic stroke field. Yet, many investigators have ignored the problem despite compelling preclinical and human data showing that hypothermia is neuroprotective in cardiac arrest (Bernard et al, 2002; The Hypothermia After Cardiac Arrest Study Group, 2002) and after hypoxic-ischemic injury in neonates (Shankaran et al, 2005). With regard to study design, it is clear that there are fundamental design weaknesses such as lack of randomization and treatment blinding. Indeed, Crossley et al (2008) demonstrated that experiments that did not mask ischemia treatment identity reported effect sizes 13.1% greater than those that used blinding procedures. Interestingly, they found that lack of randomization and blinding for measuring infarct size did not bias results. This may not be the case for other end points such as behavior. Statistical problems also seem common in experimental studies (e.g., insufficient statistical power). Finally, a common end point problem is the over-reliance on histological measures of neuroprotection without evaluating behavioral outlook, which is not always equivalent to measures of cell death or lesion size (Corbett and Nurse, 1998). The use of short survival times is another serious concern as neuroprotectants may simply delay instead of preventing cell death (Colbourne et al, 1999; Dietrich et al, 1993; Valtysson et al, 1994).
Despite the evidence for these particular factors contributing to the failure to translate putative antiischemia neuroprotective drugs, it should not be assumed that similar problems exist in the ICH field; nor should one assume that each of these potential problems actually affect ICH models. Recommendations were made to improve the quality of preclinical ICH studies and increase the likelihood of developing effective therapies. These recommendations include development of large animal models that better mimic the pathological processes of spontaneous ICH, the occurrence of rebleeding (Andaluz et al, 2002), and hemorrhagic transformation (NINDS ICH Workshop, 2005). Additionally, researchers should use models that involve gray and white matter injury, and characterize how the latter contributes to ICH outcome. The degree to which inflammation contributes to ICH injury must also be defined (NINDS ICH Workshop, 2005). As in experimental ischemia models, the potential for stem cells to improve outcome and their mechanisms should be evaluated for ICH. Finally, it was recommended that knockout or transgenic models be used to explore the effects of cellular inflammation, cytokines, and hemostasis in ICH-induced injury (NINDS ICH Workshop, 2005). Unfortunately, while some issues raised in the STAIR report (Stroke Therapy Academic Industry Roundtable (STAIR), 1999) and NINDS guidelines (NINDS ICH Workshop, 2005) have been studied (e.g., age), many factors remain understudied in the ICH field. Similarly, and as noted above, other therapeutic interventions, such as rehabilitation and stem cell transplantation, have received much less attention in ICH than ischemia. Despite their promise, there is also concern that studies assessing these treatments in ICH are plagued by methodological flaws. Therefore, the purpose of this article is to examine the experimental approach in preclinical ICH studies. Specifically, we have three goals. First, we summarize the major rodent models of ICH, which includes a description of the methods, a comparison of models, as well as a discussion of the approach to evaluating treatments (e.g., end points used). Second, we objectively characterize whether potential problems identified in the STAIR report and ICH guidelines, such as lack of randomization, have been addressed in recent publications in this area, and for this we conducted a Pub Med literature search to identify experimental ICH studies, which we then summarized. Lastly, we make several recommendations for future preclinical studies in this area.
Rodent models of intracerebral hemorrhage
While several species have been used to study ICH, we focus on rodents in this review as they are most commonly used now and in the foreseeable future. Some have already questioned whether current models, especially those in rodents, adequately reflect human pathology (Adeoye et al, 2011; Andaluz et al, 2002; James et al, 2008; Kirkman et al, 2011; NINDS ICH Workshop, 2005). The relative paucity of white matter in rodents compared with humans undoubtedly weakens the predictive value of these models. Similarly, the artificial nature of creating the ICH in rodents, which often involves direct intraparenchymal injections of blood or some compound, has limited face validity compared with patients. Unfortunately, the spontaneous ICH models, such as use of stroke-prone spontaneously hypertensive rats (Iida et al, 2005; Lee et al, 2007) and mice (Wakisaka et al, 2010b), which have better face validity, are hampered by practical problems (e.g., high cost; variability in timing, location and severity of the ICH). Thus, they have rarely been used to evaluate putative therapies, but use of such clinically relevant models is expected to increase with further refinements and new developments. For now, the most widely used rodent models of ICH for neuroprotection, rehabilitation, and stem cell studies are the direct infusion of whole blood into the brain (Bullock et al, 1984) and the injection of bacterial collagenase (Rosenberg et al, 1990), an enzyme that damages the basal lamina (extracellular matrix) thereby causing bleeding. In both cases, some investigators also add heparin (Belayev et al, 2003; Del Bigio et al, 1996), but this is not typical and is a potential confounding factor (Wang et al, 2008b), albeit with some clinical relevance. Common also are simplified models that involve injecting blood components, such as thrombin and iron, into the brain (Andaluz et al, 2002). While all of these models have been indispensible in advancing our understanding of pathophysiology (Xi et al, 2006), they certainly do not replace the need for more clinically relevant models.
Genetically engineered animals, usually mice, are thought to be valuable in determining the mechanisms of injury following ICH. Several groups have recently used knockout or transgenic mice to study the role of factors that may contribute to ICH-induced injury and to identify therapeutic targets. For example, studies have identified deleterious roles of superoxide (Wakisaka et al, 2010a), heme oxygenase 1 (Wang and Dore, 2007), Toll-like receptor 4 (Sansing et al, 2011), signal transduction molecules such as Caveolin-1 (Chang et al, 2011), and heme toxicity in hemopexin knockout mice (Chen et al, 2011a) in spontaneous ICH. Furthermore, protective effects of heme oxygenase 2 (Wang and Dore, 2008), aquapoin-4 expression (Tang et al, 2010b), the hemoglobin-binding protein haptoglobin (Zhao et al, 2011), and elements of the complement system (Nakamura et al, 2004b) may represent novel targets for ICH treatment. The surge of studies using genetically engineered mice in the past several years has added to our understanding of the mechanisms of cell death after ICH, and the complex interactions between inflammation, oxidative stress, and edema in ICH-induced injury. Furthermore, they can be used to confirm and further study the gene expression profile in peri-hematomal tissue after ICH in humans (Carmichael et al, 2008). Although some have cautioned that a particular molecule may have beneficial or detrimental effects depending on the concentration (Nakamura et al, 2004b), it is hoped that these mouse models will help to uncover novel protective strategies for ICH.
Surgical Approach
For each of these models (infusing blood, its components, or collagenase), the animals are anesthetized and subjected to stereotaxic surgery, which involves drilling a cranial burr hole and inserting a cannula (needle) into the brain to inject autologous blood or some solution (collagenase, thrombin, iron, etc.). Several anesthetics appear to be commonly used (e.g., isoflurane, barbiturates), but few studies have evaluated whether and how anesthetics influence outcome after ICH (Khatibi et al, 2011; Ma et al, 2009), which has been demonstrated in ischemia models (Schifilliti et al, 2010). For example, Khatibi et al (2011) demonstrated that 1 hour of isoflurane following ICH in mice reduced edema, apoptosis, and behavioral deficits. It is likely that all anesthetics influence outcome in some way after an ICH such as by affecting temperature, blood pressure (BP), cerebral blood flow, metabolism, inflammation, etc. Given the prolonged effects of certain anesthetics (e.g., pentobarbital) and potential confounds such as drug-induced hypothermia, we are especially concerned about their use for ICH surgery in rodents without further study or proper control of physiological variables. At this time, there is no way to identify the preferred anesthetic or to reject certain ones. Therefore, proper control groups are essential.
Most studies target the striatum but some have used the cerebellum (Lekic et al, 2008, 2011), motor cortex (Belayev et al, 2005; Xue and Del Bigio, 2000), or hippocampus (Song et al, 2007, 2008). The striatum is most often used, in part because it is a common site of ICH in patients, but also for convenience. It is a large structure capable of containing hematomas (e.g., 50 to 100 μL) relatively equivalent to a large hematoma occurring in a patient, but with a low mortality rate in rats and a low risk of blood extending into the ventricle or subarachnoid space. Nonetheless, it is not uncommon for blood to back up the needle tract resulting in variably sized and shaped lesions including to the corpus callosum and cerebral cortex. Larger hematomas, whether from injected blood or collagenase infusion, can also damage thalamus and globus pallidus. Backflow can be reduced with the ‘double-injection' method (Belayev et al, 2003; Deinsberger et al, 1996) where the bulk of the blood is infused after an initial bolus is allowed to clot. Due to these potential problems, we strongly recommend publishing photomicrographs showing brain injury, a practice that is not routinely done in the ICH field.
As with the choice of anesthetic, there is a paucity of data to guide one on the use of postoperative analgesics in these models (e.g., opiates, nonsteroidal antiinflammatory drugs), which is a treatment increasingly being required by animal welfare committees. Given that they have not been carefully evaluated and are likely to influence outcome, we routinely use and recommend local anesthesia such as Marcaine (bupivacaine hydrochloride) to mitigate early postoperative pain in rats.
Physiological Variables
Given the need for anesthesia, there is some concern that variability in physiological factors could alter or confound outcome after ICH. Notably, blood gases, glucose, BP, and temperature influence ischemic injury and thus one might assume that they affect ICH. Unfortunately, there is considerably less experimental data on these topics in the ICH field. Hypothermia is an established neuroprotectant in cerebral ischemia, but data in ICH models are inconclusive (MacLellan et al, 2009a, 2010). Interestingly, hyperthermia protocols that aggravate ischemic brain injury (e.g., elevating temperature to 39°C for 3 hours) cause no harmful effects in ICH models (MacLellan and Colbourne, 2005; Penner et al, 2011), while clinical data on post-ICH fever are inconclusive (Leira et al, 2004; Schwarz et al, 2000; Szczudlik et al, 2002; Wang et al, 2000, 2008a). At least for striatal ICH, both collagenase and whole blood models do not appear to spontaneously cause postoperative changes in temperature when measured via telemetry probes (MacLellan et al, 2004, 2006b), but different results may occur in mice or in animals with more severe insults, drug treatments, or different surgical techniques (e.g., nonsterile surgery) than we used. As another example, BP clearly affects ischemic injury in animal models (Zhu and Auer, 1995) and is thought to affect outcome in ICH patients (Morgenstern et al, 2010). There is also some data that higher BP worsens outcome in rodents subjected to an ICH (Wu et al, 2011a), but a complete dose-response effect has not been established, nor it is clear whether this differs among models. We have measured BP following collagenase-induced ICH in normotensive rats (MacLellan et al, 2004). The rats showed only a transient lowering of BP as they were recovering from isoflurane anesthesia. Interestingly, induced cooling significantly and persistently increased BP by ~20 mmHg and this was associated with significantly greater bleeding in the collagenase model, but bleeding does not appear to be affected in the whole blood model (MacLellan et al, 2006b). Hemorrhage volume was also increased following needle biopsy in rats with acutely high BP (Benveniste et al, 2000). Thus, variability in physiological factors (temperature, BP, clotting factors, etc.), including that caused by neuroprotective treatments, might be confounding ICH studies and hampering progress in this area. We strongly recommend measuring these physiological variables at least during surgery and beyond this when using anesthetics known to have more prolonged effects and when using experimental drugs treatments.
Impact of Comorbidities and Gender
A comprehensive discussion of risk factors and comorbidities is beyond the scope of this article. Nonetheless, it is clear that experimental stroke studies must consider these issues. For instance, advancing age is a key risk factor for ICH, which has also been shown to affect outcome in the collagenase and whole blood models (Gong et al, 2004, 2008; Wasserman and Schlichter, 2008; Wasserman et al, 2008). As in the ischemia field, it makes sense to test putative neuroprotectants in aged animals. Similarly, the use of other comorbidities, such as hypertensive animals (Wu et al, 2011a), is warranted. Preexisting hypertension is obviously an important consideration in ICH studies, but the impact of post-ICH alterations in BP must also be considered. Finally, gender and hormones (e.g., estrogen) have been shown to affect outcome after ICH in the collagenase and whole blood models as well as after intraparenchymal iron injections when given before or soon after the insult (Auriat et al, 2005; Chen et al, 2011b; Gu et al, 2010; Nakamura et al, 2005), but for estrogen, it does not appear to affect recovery when given later (Nguyen et al, 2008a). In summary, preexisting comorbidities and the possible effects of experimental treatments on these factors should be considered in experimental ICH studies that test neuroprotective, rehabilitative, or stem cell therapies. Given that studies have not directly compared models on these factors, one should not assume that they are equally sensitive (e.g., to elevated BP).
Outcome End Points
Bleeding Profile: The location and size of the hematoma are the primary determinants of outcome in patients (Broderick et al, 1993; Castellanos et al, 2005). As well, the occurrence of hematoma growth, which occurs in up to one-third of all ICH patients, is another predictor of poor outcome (Fujii et al, 1994; Ji et al, 2009). On-going bleeding appears to be minimal in the autologous blood infusion model (MacLellan et al, 2008) whereas collagenase infusion causes bleeding that starts within minutes to last for several hours (MacLellan et al, 2008; Rosenberg et al, 1990). This critical difference means that treatments that influence bleeding via clotting factors or BP changes, for instance, may more prominently affect the collagenase model, as discussed earlier with therapeutic hypothermia. Accordingly, experimental neuroprotection studies that administer treatments before or within the first few hours after collagenase-induced ICH should determine whether bleeding is affected, as we found to occur with estrogen pretreatment (Auriat et al, 2005) and delayed hypothermia administration (MacLellan et al, 2004).
Hematoma volume can be estimated with imaging techniques (e.g., magnetic resonance imaging) (Belayev et al, 2007; MacLellan et al, 2008), image analysis software using sections of brain tissue (Tang et al, 2010a; Wasserman et al, 2008) and spectrophotometry (Choudhri et al, 1997; MacLellan et al, 2004). While the latter technique is accurate at early survival times, we caution against relying on it at late survival times (e.g., 3 to 7 days) as conclusions about bleeding rate or amount are then confounded by factors such as the speed of clot dissolution and removal. Spectrophotometry data from our laboratory suggest that brain hemoglobin content returns to normal by a few days after untreated collagenase-induced ICH (Wowk and Colbourne, unpublished data). Thus, erythrocyte rupture and hemoglobin degradation appears complete by this time, at least in this model. Of course other blood breakdown products such as heme and iron remain longer. Furthermore, measures of hematoma size, hemoglobin content, and lesion size do not necessarily concur. Thus, we do not recommend relying solely on hematoma size or hemoglobin content, however determined, to gauge final parenchymal damage.
Cerebral Edema and Blood—Brain Barrier Dysfunction: Cerebral edema and the mass effect of the hematoma compress tissue and raise intracranial pressure (ICP) thereby risking death in some patients; although edema may not notably affect outcome in many patients (Arima et al, 2009). Edema does not normally cause death in rodents, in part because investigators avoid such severe lesions that lead to death by massive edema (animal welfare concerns), but also due to other factors (e.g., skull shape). Mild-to-moderate levels of edema has not been unequivocally linked to neuronal death. Regardless, edema routinely occurs in rodents subjected to ICH (Xi et al, 2006) and it is one of the most common end points used in these studies. Edema, which resolves over several days, is preceded by BBB (blood—brain barrier) disruption (Fingas et al, 2007; Yang et al, 1994). Thrombin is generated quickly in both models and causes rapid BBB disruption and edema when directly injected into the brain. Similarly, iron, which is released from degrading erythrocytes in the days following ICH, causes prominent BBB disruption, edema, and cell death (Huang et al, 2002; Nakamura et al, 2006). The time course of edema is similar in the collagenase and blood infusion models. Edema peaks at ~3 days (earlier after injection of blood components such as thrombin) and gradually resolves over a week (Xi et al, 2004). However, in ICH patients, edema increases most rapidly in the first 2 to 3 days and peaks at ~14 days (Inaji et al, 2003; Staykov et al, 2011; Venkatasubramanian et al, 2011). Differences in the time course of edema may be at least partly explained by the paucity of white matter in rodents as compared with humans, which is a serious limitation of rodent ICH models (Adeoye et al, 2011).
The integrity of the BBB is usually measured with histological, imaging, and spectrophotometry assays. Edema can also be observed with histology, imaging, or other methods (e.g., wet—dry weight technique). Changes in ICP, however, are not typically evaluated in rodent models, but it is possible even in nonanesthetized animals (Silasi et al, 2009). Furthermore, the transient ICP changes in rodent ICH (Nath et al, 1986) may not reflect life-threatening ICP elevations that occur in some ICH patients. While these end points have good face validity, it is important to note that they do not replace the need for more definitive evidence of neuroprotection—a persistent reduction in lesion size and improved functional outcome.
Inflammation: Much progress has been made in increasing our understanding of both the potentially beneficial and harmful effects of inflammation after ICH. As in humans, ICH in rodents causes a strong inflammatory response such as an infiltration of neutrophils and macrophages and activation of microglia (Del Bigio et al, 1996; Wasserman and Schlichter, 2007), which has been comprehensively reviewed recently (Wang, 2011). Additionally, the detrimental role of matrix metalloproteinases in mediating ICH injury has been well documented (Tejima et al, 2007; Xue et al, 2006, 2009), and inhibition of matrix metalloproteinases may be a promising target for neuroprotection (Wang and Tsirka, 2005). It appears that there is a greater inflammatory response in rodents subjected to collagenase-induced ICH than humans and the whole blood model (Wang, 2011). Although collagenase is not directly toxic to brain parenchyma, excessive inflammation is one common criticism of this model, but this has been challenged (Kirkman et al, 2011; Kleinig et al, 2009; Matsushita et al, 2000; Wang et al, 2003). Findings may be confounded by the greater lesion size created in the collagenase model compared with blood injection (MacLellan et al, 2008). Regardless, these differences indicate that neuroprotectants, especially those targeting inflammation, should be evaluated in several models.
The extent of inflammation is often measured at specified times after ICH by cell counts (e.g., in defined peri-hematoma regions). The relationship between measures of inflammation and behavioral recovery, the end point of greatest importance and highest face validity for gauging treatment efficacy, is not entirely clear. Measurements at one time and in one coronal plane, commonly done, may also not accurately represent the inflammatory response over time and around the hematoma. Regardless of the measure of inflammation (e.g., cell count, cytokine level), the survival time or location of assessment, one should not use these measures to infer a neuroprotective treatment effect given the beneficial and harmful roles played by inflammatory cells. For instance, we showed that hypothermia markedly reduced the number of neutrophils and macrophages after whole blood-induced ICH; yet there was little functional benefit and no evidence of reduced tissue damage (MacLellan et al, 2006b).
Histological Profile: One advantage of targeting the striatum in neuroprotection studies is the considerable data on the nature and pattern of cell death after ICH in this region. A significant amount of injury occurs quickly as blood rapidly dissects through the tissue. This is followed by delayed neuronal injury in the peri-hematoma region (Del Bigio et al, 1996; Felberg et al, 2002), which is sometimes quite limited. Several studies have shown cell loss in distal, interconnected regions such as the substantia nigra (Felberg et al, 2002; MacLellan et al, 2008). It is clear that progressive atrophy occurs after ICH (Nguyen et al, 2008b). Studies with simplified models have identified numerous blood components that contribute to this cell death and atrophy, such as iron (Xi et al, 2006). For instance, we found that intrastriatal infusions of iron alone lead to extensive neuronal death and tissue loss over months with considerable dendritic atrophy of peri-lesion neurons (Caliaperumal, Ma, and Colbourne, unpublished data). Such data emphasize the importance of using long survival times. Unfortunately, there is considerably less data on ICH and related models in other brain regions.
We were interested in whether the whole blood and collagenase models differed with respect to extent and progression of injury when the models were matched for initial hematoma size (MacLellan et al, 2008). We found with magnetic resonance imaging that the striatal lesion size continued to mature over weeks in the collagenase, but not the whole blood model. These findings have been confirmed with time course histology studies in the collagenase model (Auriat et al, 2012; Nguyen et al, 2008b). Additionally, we observed that injury was considerably larger in the collagenase model, despite similar initial hematoma volumes. These findings also emphasize the need for using long-term assessment.
As histology is widely used to gauge neuroprotection, it is important to consider the histological techniques used (e.g., counts of dead cells, lesion volume measures) and the timing of assessment (days versus weeks). Several neuroprotective treatments have been shown to transiently reduce ischemic brain injury (Colbourne et al, 1999; Dietrich et al, 1993; Valtysson et al, 1994); thus, there is a need to use extended survival times. With regard to method, many evaluate lesion volume or total tissue lost, which is an accepted method in the stroke field. However, it is not uncommon for studies to use simpler methods that have not been as well validated (e.g., area of injury in one coronal section may not correlate well with total lesion volume). We are also concerned that measures of cell death (e.g., TUNEL (terminal deoxynucleotidyl transferase-mediated 2′-deoxyuridine 5′-triphosphate-biotin nick end labeling)-positive cells) at early survival times are relied on to gauge treatment efficacy, but these many not accurately predict the extent of final tissue loss. Others have also emphasized the need to gauge injury beyond the hematoma region (Felberg et al, 2002), and to evaluate white as well as gray matter damage. Several studies have shown that white matter injury can be quantified following ICH in rodents (MacLellan et al, 2008; Nguyen et al, 2008b; Wasserman and Schlichter, 2008), and that putative neuroprotectants may attenuate this loss (Wu et al, 2012). We believe the gold standard efficacy end point should be the total volume of tissue loss at a late survival time.
Behavioral Outlook: Many ICH survivors are left with permanent disabilities. Indeed, only about 20% regain functional independence (Broderick, 1994; Sacco et al, 2009). The nature and severity of their behavioral problems are complex, and not simply related to hematoma size and location despite the importance of these factors to outcome. Similarly, the behavioral consequences of ICH in rodents depend on the extent and location of injury, but the correlations between lesion size and functional impairment are often weak. While some interpret this as support for relying on histology or some other end point to gauge treatment efficacy, we interpret it as a reason to do more comprehensive behavioral testing. We caution against relying on a single end point whether it is lesion size, edema or a single behavioral test. Similarly, measures of neuroplasticity (e.g., dendritic growth, synaptogenesis, and neurogenesis) should not replace careful behavioral assessment.
The behavioral method most commonly used after striatal ICH in rats (MacLellan et al, 2006a) and mice (Wu et al, 2012) involves examining neurological function with a battery of subtests including, for example, spontaneous rotation in home cage, grip strength, paw placement, beam balance, and so on. Each behavior is rated, typically on a three-point ordinal scale (e.g., 0: behavior absent, 1: behavior present but abnormal, and 2: normal behavior), and then summated to give one neurological deficit score (NDS). Virtually all studies using the collagenase and whole blood models along with simpler models have shown that initial NDS scores improve markedly in the days and weeks following injury (MacLellan et al, 2006a). Interestingly, when we compared NDS scores between collagenase and blood-infused rats, we observed greater, more persistent deficits in the collagenase model where lesion volume was also significantly larger despite equivalent initial hematoma volumes (MacLellan et al, 2008). In this particular situation, the NDS scores returned to normal by 3 weeks after whole blood-induced ICH, but remained significantly elevated by 4 weeks in the collagenase model.
It is not uncommon to see considerable behavioral recovery in these rodent models, especially as edema resolves (Hua et al, 2002; MacLellan et al, 2006a; Masuda et al, 2010). This has led one reviewer to conclude that ‘animals that survive the initial ICH recover almost completely' (Adeoye et al, 2011). However, numerous studies show that many behavioral tests (staircase test, corner turn test, etc.) reveal long-term impairments (versus pre-ICH or control animals) in rat (Beray-Berthat et al, 2010; Hua et al, 2002; MacLellan et al, 2006a) and mouse (Nakamura et al, 2004b) ICH models. The fact that there is often partial or even complete recovery on some simple tasks, which mirrors that which occurs in humans, raises concern about choice of test. Thus, we compared several behavioral tests in the rat collagenase model where we varied lesion size among four groups: sham-operated (no injury), mild (30 mm3 lesion size), moderate (45 mm3), and severe injury (60 mm3). Our goals were to identify behavioral tests that were persistently sensitive to ICH-induced striatal injury and to see which tests were able to differentiate among ICH groups with varying lesion sizes. Comparing mild-to-moderate or moderate-to-severe was equivalent to a moderate neuroprotective effect whereas comparing severe to mild groups equates to a substantial neuroprotective effect. Two key findings emerged. First, some of the tests revealed transient deficits whereas others showed more enduring impairments after ICH, as expected. Second, while each behavioral test could distinguish the ICH from normal rats, at least initially, many (e.g., ladder walking task) were unable to reliably and statistically distinguish among ICH groups with moderate or even substantial differences in lesion size. More recently, we came to the same conclusion when we compared three infusion doses of intrastriatal iron that produced a range in lesion volume from ~25 to 60 mm3 in rats, but with similar behavioral scores (NDS, corner turn test, cylinder task; Caliaperumal, Ma, and Colbourne, unpublished data). It is optimistic to envision a true neuroprotective treatment effect equivalent to our ‘moderate' effect, and probably unrealistic to expect a ‘substantial' reduction in injury. Therefore, we question whether the behavioral tests used in some studies could truly detect more realistic neuroprotective effects, especially with the group sizes commonly used. However, the literature is replete with such positive behavioral findings. This apparent contradiction might be partly explained by the fact that many studies examine behavior in the early poststroke period and their positive findings could be due to a reduction in edema that may not relate to reductions in cell death or long-term functional benefit. Of course, some positive findings are simply due to chance, which increases with the number of test used and times evaluated. More rigorous and appropriate statistical testing would certainly reduce these type I errors. For instance, a common problem we have noticed is the use of parametric statistics (e.g., analysis of variance) on small-scale ordinal data (e.g., NDS).
No study has systematically evaluated behavioral tests after varying lesion size in the blood injection model. Our experience with this model suggests that there is more complete functional recovery (MacLellan et al, 2008). While many tests detect impairments in the first few days or weeks after ICH (e.g., with the corner turn test or NDS), deficits are much smaller at protracted survival times. However, impairments in the forelimb placing (Hua et al, 2002), cylinder (limb use asymmetry), and ladder tests (Fingas et al, 2007) have been found 1 month following ICH in rats (Hua et al, 2002; MacLellan et al, 2006a) and mice (Nakamura et al, 2004b). Despite this, detecting treatment effects may be difficult in this model, especially at late time points. The single pellet skilled reaching task is exquisitely sensitive to both reaching impairments after ICH and rehabilitative treatments (MacLellan et al, 2006c, 2011) out to 6 weeks. Although time consuming, skilled reaching tasks (for food reward pellets) are highly sensitive and are therefore recommended for ICH studies, including mice (Baird et al, 2001; Farr and Whishaw, 2002).
Cognitive function is rarely assessed in ICH studies, despite evidence that all cognitive domains may be affected in ICH patients (Bhatia and Marsden, 1994; Su et al, 2007). This may be due in part to the worry that sensorimotor deficits may confound cognitive testing. MacLellan et al (2009b) used a battery of tests (T-maze, radial arm maze, and the Morris water maze over 7 months) to assess cognitive function after ICH. Testing began after gross sensorimotor deficits had resolved (~8 weeks after ICH) in an attempt to avoid confounding cognitive performance with motor deficits. Cognitive deficits were not detected on any of these tests, possibly due to the late assessment times. Indeed, Hartman et al (2009) found that ICH caused learning impairments, but only in the first 8 weeks after ICH. Clearly, additional studies are needed to determine the time course of cognitive impairments after ICH in rats, and to identify tests that are sensitive to both deficits and treatment effects.
In summary, these and other findings call into question whether appropriate behavioral testing practices are commonly used in ICH studies. Generally, we are concerned with: (1) the lack of behavioral testing (e.g., relying on histology or edema alone), (2) use of short-term testing alone, (3) use of insensitive tests, and (4) the inappropriate analysis and interpretation of data (e.g., analysis of ordinal data with analysis of variance).
Recent practices
Pub Med Search Criteria
We conducted a literature review to describe current practices in rodent ICH studies that evaluate neuroprotectants, stem cell, and rehabilitation therapies. We searched the Pub Med database from January 2007 through to and including 31 July 2011. Search terms included ICH, hemorrhagic stroke, intrastriatal hemorrhage, and ICH. Articles were then obtained and manually searched to select those original articles (not reviews) that described the results of an experimental study evaluating a neuroprotective, rehabilitative treatment, or stem cell treatment in an ICH model. Studies that examined mechanisms of injury, but did not evaluate a putative therapy, were excluded. For included articles, we read the paper to determine a number of variables such as species, age, etc.
Results
We identified 5,363 papers in our Pub Med search of which 1,124 were directly on the topic of ICH (Figure 1). Of these 991 were clinical studies, while 121 were deemed to be either a typical neuroprotection study, including drug and gene manipulations, or a treatment meant to facilitate functional recovery with possible neuroprotective qualities (e.g., rehabilitation and stem cell treatments). Of the studies evaluated, 84% tested neuroprotectants, 11% tested stem cell therapies, and 5% tested rehabilitation therapies. Nearly 87% of ICH studies reported some positive effect(s) of the treatment being tested, 12.4% reported completely negative results (no difference between experimental and control groups on any outcome measure), and one study reported harmful effects of a treatment.

Summary of search strategy and selection of intracerebral hemorrhage (ICH) studies.
Model Characteristics: The collagenase model of ICH was most commonly used followed closely by the autologous whole blood model (Figure 2A). The use of multiple models or spontaneous ICH models was rare. Almost all studies used male animals (Figure 2B). Rats were most widely used followed by mice (Figure 2C). Only a few studies used another species (pig). Only 59.5% of studies reported on the age of their animals and of these 47% were vague (e.g., ‘young adult‘). Only 2.5% used older animals (>6 months). No study in our selected studies used hypertensive animals and only one study considered diabetes as a comorbidity.

Summary of recent practices in experimental intracerebral hemorrhage (ICH) studies. The collagenase and blood infusion models are most frequently used (
Physiological Variables and Anesthesia: Blood gases (pO2, etc.), glucose levels, temperature, and BP were evaluated as being measured during or after surgery (Figure 2D). A minority of studies reported (measured) data during surgery and fewer examined postoperative changes. There was considerable variability in sampling methods and rates, and control procedures making it impossible to easily summarize these methods.
Choice of anesthetic varied widely with the four most commonly used anesthetics being: pentobarbital (25.6%), isoflurane (23.1%), ketamine (22.3%), and chloral hydrate (9.9%). Ketamine was often but not always given with xylazine. Surprisingly, 11.6% of papers did not state the anesthetic they used.
Study Design and Analysis: As illustrated in Figure 3, the majority of studies did not state whether animals were randomly assigned to groups and almost half did not state that blinding procedures were used in any study aspect. Mortality rates were rarely documented. Similarly, few studies reported on whether animals were excluded (for whatever reason). Group sizes were determined from the 75.2% of studies that reported it. The average group size was determined for each study and then averaged across studies (mean: 8.3 ± 4.2 s.d./group). However, average group sizes among studies ranged from 3.8 to 21.0 animals per group with slightly more used in studies using mice (average of 9.3 mice/group) compared with rats (7.8 rats/group). Standard deviation (52.1%) was more commonly reported than standard error of the mean (38.0%).

Analysis of intracerebral hemorrhage (ICH) study design. Many studies failed to report characteristics such as mortality rate, the number of and reason for excluded animals, blinding, or random assignment. None of the studies conducted a power analysis.
Statistical power was never reported. However, it is possible that many researchers rely on experience and simply do not report power analyses.
End Point Measures: Behavioral testing followed by measuring cerebral edema were the two most frequently used end points (Figure 4). Surprisingly, these were more common than estimates of lesion size and cell death. Only 5.8% of papers specifically evaluated white matter injury, such as corpus callosum area/volume. Of the behavioral tests, the NDS was most widely used followed by forelimb placing (Figure 5A). The last time of testing was also quantified (Figure 5B). Most studies limited testing to the acute post-ICH period and only 10% used protracted testing (i.e., 2 months). The average of the latest behavioral test time was 20.7 ± 19.1 days similar to the latest assessment time of histology (26.7 ± 21.3 days).

Frequency of end points used in intracerebral hemorrhage (ICH) studies. The most common end points were behavioral outcome and edema.

Behavioral tests used to assess functional outcome following intracerebral hemorrhage (ICH). The neurological deficit score (NDS) was used most frequently, whereas tests of skilled reaching and cognition were rarely used (
Recommendations and general discussion
Before discussing the findings of our literature search, it is important to acknowledge limitations with our approach. First, we did not evaluate all published papers in the field or on a given topic. This was because our goal was not to evaluate any particular treatment, but instead to provide a general summary of recent approaches and methods used in the rodent ICH literature. We sought to determine whether the recommendations made for preclinical ICH studies (NINDS ICH Workshop, 2005) have been addressed in subsequent experiments. Second, we did not formally (statistically) evaluate study quality as some have done (Frantzias et al, 2011). We avoided this approach because of the difficulty with creating and validating an ordinal quality scale and then assigning scores for each paper in light of the oft failure to report such details. Specifically, some factors have been shown to influence effect sizes (e.g., blinding; Crossley et al, 2008), but many others have not been evaluated or are inconclusive (e.g., temperature). As well, we feel that it is problematic to assign equal weighting to each potential factor (blinding, randomization, use of aged animals, etc.) given that their impact likely differs considerably. Furthermore, for each factor deemed important, there is often no simple method to judge quality. For example, simply assigning a score for having ‘measured temperature’ ignores the complexity of the topic (intra- versus postsurgical monitoring, method and frequency of measurement, method of control, etc.). Third, since we did not evaluate any one treatment, problems identified in the literature survey may not necessarily apply to all treatments. No doubt some treatments have been evaluated more thoroughly than others. A clear example of this is deferoxamine, a Fe3+ chelator, which has been studied in multiple models, young and old rats, in several species and with many end points (Auriat et al, 2012; Hua et al, 2006; Huang et al, 2002; Nakamura et al, 2004a; Okauchi et al, 2010; Warkentin et al, 2009; Wu et al, 2011b), but interestingly not all studies come to the same conclusions. As well, ‘weaknesses' in any given paper, such as using only young animals or not measuring temperature, may be addressed in other studies, as with deferoxamine noted above, and thus we may somewhat overestimate such problems. Finally, we include rehabilitation and stem cell treatments in our evaluation because these treatments often have positive behavioral effects and may also be neuroprotective. Increased testing of stem cell strategies was recommended by the NINDS workshop. Unfortunately, a relatively small proportion of studies evaluated rehabilitation or stem cell studies. More data would be needed to accurately compare among treatment types, which was not the purpose of this survey.
Our review of the literature reveals several general findings. First, most studies (~87%) report positive treatment effects on at least one outcome measure, whereas only 12% of studies reported no significant treatment effect on any measure. The small number of ‘negative' studies may reflect a publication bias, as has been recently demonstrated in animal models of ischemic stroke (Sena et al, 2010). Such bias likely overestimates effect size by approximately one-third. Experiments with negative results, or those that contradict previous literature may not be published (Dwan et al, 2008), or may be published in journals with lower impact factors (Littner et al, 2005). The unfortunate result is that these studies may not contribute to our understanding of ICH pathology, or to identifying truly effective treatments for ICH. Several journals, including the Journal of Cerebral Blood Flow and Metabolism have taken steps to address publication bias by implementing a section for negative results (Dirnagl and Lauritzen, 2010). Having negative data available allows one to make more informed choices to ensure that only the best treatments will be selected for clinical trials.
Second, many studies fail to adequately report methodological details (e.g., whether animals were randomized, their age, etc.). Authors, reviewers, and journals must strive to ensure that certain details are always provided (e.g., subjects age, weight, group sizes, and exclusions) as some journals are now doing. However, descriptions of experimental methods and techniques are often limited by strict word counts imposed on most manuscripts. This makes it more difficult for investigators to follow and test whether another laboratory's data replicates. Better transparency, sharing of standardized procedures, and multicenter animal studies are clearly needed (Kirkman et al, 2011; Macleod et al, 2009).
The experimental design flaws and weaknesses documented in the ischemia literature also appear to occur commonly in the ICH field. For instance, the low reporting and control of physiological variables is a notable concern. Similarly, the significant number of studies that did not report use of blinding and random assignment is concerning. However, it is possible that many of these studies used these procedures but failed to document it. Variance should be reported as 95% confidence intervals or standard deviation (Macleod et al, 2009), which was performed only in approximately half of the studies. Another concern was the complete lack of reporting of statistical power coupled with small group sizes overall (N = 8) and very small group sizes (e.g., N = 4) in some studies where efficacy was being determined. We suspect that this is due to an under-appreciation of the need for adequate statistical power, and the fact that many simply rely on past experiences. Acknowledging a problem with low statistical power may also require significantly more work to generate adequate group sizes necessary for publication. It should also be noted that power likely varies considerably within a study. For example, behavioral measures often appear considerably more variable than edema data. Adding to this problem is the fact that effect sizes vary considerably among end points. Again, edema can be mostly or entirely eliminated, but it is quite unlikely to see such a behavioral effect. Thus, behavioral experiments are more likely to be underpowered overall.
While there were encouraging signs, most studies failed to use models with better face validity (e.g., aged rodents, hypertension-induced ICH, etc.) including factors known to affect efficacy, as suggested (NINDS ICH Workshop, 2005). Likewise, few studies directly compare treatment efficacy in multiple models of ICH or used nonrodent models. Given the aforementioned differences among models and in comparisons to patients, we feel that this may seriously hamper progress in translating neuroprotective treatments. For instance, the relative paucity of white matter in rodents (versus patients or pig models), and the fact that few studies assess it, is a serious concern.
We were struck by the diversity among studies (e.g., lesion severity) including in choice of primary end points and actual assessment techniques (e.g., type and extent of behavioral testing). While there are advantages to this diversity, it makes comparisons among studies difficult. With regard to choice of end point, we were encouraged to see that the majority of studies used some form of behavioral evaluation, which was most commonly a NDS or forelimb-placing task, of which the latter is sometimes included in NDS testing. Unfortunately, more demanding and sensitive tests, such as tests of skilled reaching (Beray-Berthat et al, 2010; MacLellan et al, 2006a), were infrequently used. Another problem was that some studies completed testing only in the acute period (first 6 days) when edema potentially confounds the interpretation of behavioral data. For instance, any treatment that lessens edema would likely result in significantly better behavioral scores compared with controls experiencing greater swelling. However, controls will eventually have similar scores once edema resolves. One must therefore consider the possibility that lasting behavioral benefit may not occur. Unfortunately, only about 10% of studies assessed behavior at a chronic survival time. As well, the use of behavioral testing during edema may confound later testing. For instance, the mere act of behaviorally testing animals can help rehabilitate animals after ICH and those with less edema may simply benefit more than those with edema (e.g., more likely to participate in testing).
It was surprising that only a minority of studies directly measured or estimated lesion size and few evaluated white matter specifically. Instead, many used peri-hematoma cell counts that provide only a limited estimate of neuroprotective treatment efficacy compared to determining the total loss of brain tissue encompassing local and distal cell death and atrophy. Importantly, many histological studies did use longer survival times to gauge efficacy. As discussed, edema is not a well-validated estimate of neuroprotection; yet, edema was the most common end point by which investigators decided whether a treatment was neuroprotective. Accordingly, these studies may have overestimated the extent of their treatment's potential functional benefit to patients.
Finally, the extent of monitoring and controlling physiological variables was of key concern before completing the literature review. Our findings reinforce the concern. For instance, while body temperature was commonly controlled during surgery, far fewer studies evaluated whether their treatment affected postoperative temperature. This is especially concerning when one considers that commonly used anesthetics, such as pentobarbital, often affect temperature for hours after surgery is complete. Numerous ‘neuroprotective' drugs affect temperature in ischemia models (e.g., NBQX; Nurse and Corbett, 1996) so similar effects after ICH would not be surprising. While studies that induced hypothermia or hyperthermia after ICH in rodents have not revealed the dramatic effects seen in ischemia, we nonetheless recommend that temperature be measured at least during surgery and postoperatively to rule out confounding effects. The various other potential effects of commonly used anesthetics on outcome after ICH have been recently reviewed (Kirkman et al, 2011). Only a minority of studies examined other physiological variables either during or after ICH surgery. We speculate that researchers rarely measure physiological variables because they are thought to be unimportant (e.g., temperature), and because doing so significantly increases the cost, time, and technical demands (e.g., measuring BP or ICP) of ICH studies. This leaves the possibility that many putative neuroprotectants may be affecting outcome by unknown influences on temperature, BP, clotting, etc. Further study is needed to determine the extent by which these factors affect ICH outcome, including interactions with model, age, and others. Without those data, investigators cannot be sure which factors are actually important. Thus, further study in this area may lead to greater measurement and reporting of physiological variables.
An ICH is a devastating stroke for which there are no perfect animal models. The most widely used models are the collagenase and whole blood models in rodents. Each has advantages and limitations and thus the need to use both along with models of spontaneous ICH and in other species. As noted in the ischemia field by numerous reviewers, there are indeed several serious problems, which also occur in the ICH field. We wish to emphasize that such concerns also apply to our own studies included in this review. In spite of these issues with animal modeling, study design and choice of end points, there are many positive findings in our survey that bode well for identifying and translating effective neuroprotective, rehabilitative, and stem cell treatments, as recommended (NINDS ICH Workshop, 2005). For instance, we observed that many investigators were using behavioral testing and longer survival times. Novel stem cell and rehabilitation treatments, which benefit ischemic stroke, are currently being tested in ICH. Finally, considerable progress has been made in understanding the role of inflammation after ICH, and genetically engineered mice are now commonly used to study pathophysiology of ICH and identify putative therapeutic targets. Enhancing and translating this knowledge will depend on further improvements to our research approach.
Footnotes
The authors declare no conflict of interest.
