Abstract
Successful clinical translation of prospective cytoprotectants will likely occur only with treatments that improve functional recovery in preclinical (rodent) studies. Despite this assumption, many rely solely on histopathologic end points or the use of one or two simple behavioral tests. Presently, we used a battery of tests to gauge recovery after a unilateral intracerebral hemorrhagic stroke (ICH) targeting the striatum. In total, 60 rats (N = 15 per group) were stereotaxically infused with 0 (SHAM), 0.06 (MILD lesion), 0.12 (MODERATE lesion), or 0.18 U (SEVERE lesion) of bacterial collagenase. This created a range of injury akin to moderate (from SEVERE to MODERATE or MODERATE to MILD lesion size ∼30% reduction) and substantial cytoprotection (SEVERE to MILD lesion size—51% reduction). Post-ICH functional testing occurred over 30 days. Tests included the horizontal ladder and elevated beam tests, swimming, limb-use asymmetry (cylinder) test, a Neurologic Deficit Scale, an adhesive tape removal test of sensory neglect, and the staircase and single pellet tests of skilled reaching. Most tests detected significant impairments (versus SHAM), but only a few (e.g., staircase) frequently distinguished among ICH groups and none consistently differentiated among all ICH groups. However, by using a battery of tests we could behaviorally distinguish groups. Thus, preclinical testing would benefit from using a battery of behavioral tests as anything less may miss treatment effects. Such testing must be based on factors including the type of lesion, the postoperative delay and the time required to complete testing.
Introduction
Ischemic and hemorrhagic stroke are among the leading causes of death and disability. Accordingly, there is a great need for stroke models to assess putative cytoprotectants (neuroprotection). Unfortunately, despite much progress elucidating the pathophysiology of stroke, there has been little translation from experiments to patient treatment. These failures have been attributed to limitations with both basic and clinical studies (Hunter et al, 1995; DeGraba and Pettigrew, 2000; Drummond et al, 2000; Grotta, 2002). For instance, long delays between stroke onset and treatment occurred in many clinical trials. Of the several major concerns raised with experimental studies (Stroke Therapy Academic Industry Roundtable (STAIR) 1999), the oft failure to gauge functional therapeutic efficacy stands out (Corbett and Nurse, 1998; Hunter et al, 1998; Hudzik et al, 2000; DeBow et al, 2003a). Functional recovery is the more important clinical end point, yet investigators often quantify cell death as their sole index of treatment efficacy (DeBow et al, 2003a).
While a reduction in cell death will likely improve functional recovery, there are several reasons why this is not always true. For instance, tissue may remain viable, but function abnormally (Squire and Zola, 1996). Furthermore, damage to regions distal to an infarct, such as axonal degeneration and atrophy, may go undetected, but contribute to impairment (Corbett and Nurse, 1998). Likewise, spontaneous recovery and compensation further obfuscate the relationship between behavior and cell death. Thus, effective assessment of any therapy necessitates a rigorous examination of behavioral recovery in addition to histologic measurements of injury because only those cytoprotectants that improve both in animal models are likely to improve outcome in humans.
Given the need for functional assessment, it is not surprising that many tests sensitive to functional deficits after ischemic stroke have been identified for rodents (Hunter et al, 2000; DeVries et al, 2001; Roof et al, 2001). While most strokes are ischemic, approximately 15% are hemorrhagic and many ischemic strokes undergo hemorrhagic transformation (Lyden and Zivin, 1993). Accordingly, studies have identified tests that are sensitive to striatal intracerebral hemorrhage (ICH) in rodents (Chesney et al, 1995; Hua et al, 2002). These studies, however, have not thoroughly assessed whether such tests are widely appropriate for gauging variations in the volume of injury (e.g., cytoprotection). Functional tests must not only act as lesion detectors, they must also distinguish among gradations in injury and do so over time in order to track recovery.
This study examined whether a large battery of tests (e.g., skilled reaching, walking; see Materials and methods), which are each sensitive to ischemic and traumatic damage to the motor system, could predict histologic outcome after ICH in rats. We created a range of lesion sizes by varying the dose of collagenase used to produce the ICH, and then assessed performance over 1 month. The gradations in lesion size from large to medium to small represent sizeable and statistically significant reductions in damage that would presumably have great clinical benefit. In this regard, we aimed to evaluate whether marked differences (reductions) in brain injury, such as those afforded by cytoprotective agents, could be detected with a single test or combination of tests.
Materials and methods
Animals
In total, 60 male Sprague–Dawley rats weighing approximately 250 g (∼ 10 weeks old; obtained locally) were used in this study. All procedures were in accordance with the Canadian Council on Animal Care guidelines, and were approved by Biological Sciences Animal Policy and Welfare Committee at the University of Alberta.
Training
Baseline performance for the horizontal ladder, forelimb use asymmetry (cylinder), adhesive tape removal, and beam walking tests was measured the day before surgery. Neurologic deficits were also assessed. Other tests required more extensive training as detailed below.
Staircase test: Rats were food deprived to 90% of their free-feeding weight over 3 days and then trained in the staircase test of independent forelimb reaching ability (Montoya et al, 1991). Food restriction took into account the natural gain in body weight during this period. For the test, each rat is placed into a plexiglass box (length: 30 cm, width: 6.8 cm, height: 12 cm) and the rat rests on an elevated platform with seven stairs descending on each side. Each stair has a food well baited with three food pellets (45 mg each, Bio-Serv, Frenchtown, NJ, USA). Pellets on the left stairs may only be retrieved with the left paw, and on the right stairs using the right paw. Recording the number of pellets retrieved with each forelimb assesses skilled reaching ability. Rats were trained twice daily for 5 days a week for 3 weeks, and were excluded from this test if they did not retrieve an average of eight pellets per side or more out of a possible 21 on 3 consecutive days.
Single pellet test: Rats were maintained at 90% of free feeding weight and trained in the single pellet test 5 days per week for 3 weeks with 20 reaches per day. Briefly, rats were placed in a 14 × 60 cm box and trained to reach through a 1 cm wide opening to retrieve a food pellet (45 mg each, Bio-Serv, Frenchtown, NJ, USA) placed on the ledge in front of the opening (Whishaw, 2000). A reach was considered a success if the rat grasped the pellet, brought it inside the box using its paw, and placed the pellet into its mouth. A reach on which an animal advanced the paw through the slot but missed the pellet or knocked it off the ledge was considered a failure. Performance on the last day of training was videotaped for kinematic analysis of reaching movements. The first three successful reaches for each rat were each analyzed qualitatively (Metz and Whishaw, 2000). Briefly, a reaching movement was rated on a scale of 11 movement components: (1) orient to pellet, (2) limb lift, (3) digits close, (4) aim, (5) advance, (6) digits open, (7) pronation, (8) grasp, (9) supination I, (10) supination II, and (11) release. These movement components were further broken down into a total of 35 subcategories, which were graded 0 (loss of normal movement), 0.5 (impaired movement pattern), or 1 (normal movement). A score of 35 indicated a perfect reach.
Forelimb inhibition test (swimming): Rats were trained for over 2 days (10 trials/day) to swim to a visible, abovewater platform located at one end of an aquarium (length: 123, width: 46, height: 57 cm). Three baseline trials were videotaped on the third day. Rats normally inhibit the forelimbs as they swim, propelling themselves with only their hind limbs (Whishaw et al, 1981; Gonzalez and Kolb, 2003). A striatal ICH causes asymmetry in forelimb inhibition and use (positive sign) of the impaired forelimb during swimming (MacLellan and Colbourne, 2005). A rat was excluded from analysis if after repeated trials it failed to swim directly to the platform without paddling along the walls of the tank (e.g., escape or circling behavior). The number of strokes made with each forelimb was recorded for three trials and an asymmetry score for forelimb inhibition was calculated as
Surgery and Experimental Groups
Rats were anesthetized with isoflurane (induction 4%, maintenance 1.5% to 2% in 70% N2O and 30% O2) and placed in a stereotaxic frame. Body temperature was maintained near normothermia (37°C) throughout surgery using a rectal probe and heating blanket. Under aseptic conditions, a midline scalp incision was made and a small hole drilled at 3.0 mm lateral to Bregma in the hemisphere contralateral to the preferred paw as determined during the single pellet task training. We created a range of insult severities using the well-characterized bacterial collagenase model of ICH (Rosenberg et al, 1990; Del Bigio et al, 1996; DeBow et al, 2003b; MacLellan et al, 2004). Briefly, a 26-gauge Hamilton needle (Hamilton, Reno, NV, USA) was lowered 6.0 mm ventral to the surface of the skull and 1.0 μL of sterile saline containing 0.06 (MILD; n = 15), 0.12 (MODERATE; n = 15) or 0.18 U (SEVERE; n = 15) of bacterial collagenase (Type IV-S, Sigma Chemical Co., Oakville, Ontario, Canada) was infused into the striatum. A control group (SHAM; n = 15) received saline infusion only. A metal screw (Model MX-080-2; small parts, Miami Lakes, FL, USA) sealed the hole and Marcaine (Sanofi Canada, Markham, Ontario, Canada) was infiltrated into the area. The wound was closed with staples, and treated with antibiotic ointment. Anesthesia lasted ∼30 mins.
Testing
On days 1, 3, 5, 7, 14, 21, and 28 after surgery, we assessed all rats on the horizontal ladder, forelimb use asymmetry (cylinder), adhesive tape removal, beam walking, and forelimb inhibition (swim) tests (Figure 1). Neurologic deficits were evaluated on a Neurologic Deficit Scale (NDS) at these times and on days 2, 4, and 6 postsurgery. Skilled reaching was assessed in the staircase and single pellet tests on days 7 to 10 and 21 to 24.

Behavioral testing (days relative to surgery) schedule. During testing, all rats were assessed on each of the behavioral tests. See Materials and methods for description of tests. Timeline for behavioral training (pre-ICH) is not shown.
Horizontal-ladder walking test: Rats were videotaped crossing the middle 0.5 m segment of a 1-m long horizontal ladder with variably spaced rungs ranging from 3 to 5 cm. The total number of steps and number of slips made with each limb was recorded for four trials per test day. A detailed analysis of stepping was performed for baseline, day 7, and day 28 (Metz and Whishaw, 2002). Briefly, for each limb, each step was rated on a 7 Point Foot Fault Scale: (0) total miss, (1) deep slip, (2) slight slip, (3) replacement, (4) correction, (5) partial placement, and (6) correct placement.
Forelimb use asymmetry test (cylinder): Rats were placed in a transparent cylinder (20 cm diameter, 45 cm high) for 10 mins and videotaped from below. Spontaneous forelimb use during rearing movements, wall exploration, and landings was analyzed. Briefly, a push-off is the independent use of either forelimb or simultaneous use of both when rearing. Wall exploration is the initial placement of a forelimb on the wall and contact during lateral movements. A landing is the use of either limb (or both) to land after rearing. Rats that made fewer than six independent wall touches were excluded from analysis, as this was considered too few to be a reliable measure of movement frequency. Independent forelimb use was expressed as (Schallert and Woodlee, 2005)
Adhesive tape removal test: Adhesive dots (0.64 cm diameter, Avery; Pickering, Ontario, Canada) were placed on the medial aspect of the rat's forepaws. The order of placement (e.g., left then right) was randomized, and the paws were touched simultaneously before the rat was returned to its cage. The time taken to remove the adhesive dot from each paw was recorded on three trials. An asymmetry score (Schallert et al, 1982) for this sensory or attention impairment (neglect) was calculated as
Beam walking test: Rats were videotaped crossing the beam (1.10 m long; 3.20 cm wide), and hind limb use was analyzed according to Feeney et al (1982). Briefly, performance was graded as 0 (rat fell off the beam within 10 s), 1 (rat remained on the beam for more than 10 s but could not place the affected limb on the beam), 2 (rat was unable to cross but could place the affected limb on the beam and maintain balance), 3 (rat traversed beam while dragging the affected limb), 4 (rat crossed the beam and placed the affected limb on the beam at least once), 5 (rat crossed with more than 50% foot slips with the affected limb), 6 (rat crossed with fewer than 50% foot slips with affected limb), or 7 (rat crossed with 2 or fewer foot slips). Performance each test day was expressed as a median score of three trials.
Neurologic Deficit Scale: Neurologic deficits were repeatedly measured (Peeling et al, 2001). Tests included: (1) spontaneous circling, graded from 0 for no circling to 3 for continuous circling; (2) hind limb retraction, graded from 0 for immediate replacement to 3 for no retraction after the limb was displaced laterally; (3) bilateral forepaw grasp, graded from 0 for normal grasping to 3 for a rat unable to grasp the bar at all; (4) contralateral forelimb flexion, graded from 0 for uniform extension of forelimbs to 2 for full wrist flexion and shoulder adduction when the rat was lifted by the base of the tail; and (5) beam walking ability, graded from 0 for a rats that readily crossed the beam to 3 for a rat unable to stay on the beam for more than 10 s. Scores for each component were added for a maximum of 14 (greatest impairment).
Staircase test: Rats were food deprived to 90% of their free-feeding weight 4 days prior to testing in the skilled reaching tests. Rats received two 15 min trials separated by ∼ 4 h on each of the 4 days of testing. We analyzed the number of pellets successfully retrieved out of a maximum of 21 per side.
Single pellet test: The number of pellets successfully retrieved (out of 20) was recorded on each test day. Performance was videotaped on days 10 and 24, and the movement components of three successful reaches for each rat were analyzed as described earlier.
Forelimb inhibition test (swim): Three trials were videotaped as each rat swam directly to the platform. Rats that swam along the wall or those that did not reach the platform after a maximum of 10 trials were excluded from analysis. The number of strokes made with each forelimb was counted and expressed as an asymmetry score for forelimb inhibition.
Histology
Thirty days following surgery, rats were euthanized with an overdose of sodium pentobarbital (80 mg/kg) and transcardially perfused with 0.9% saline and then 10% neutral buffered formalin. Forty μm sections were taken every 200 mm, starting at + 1.7 from Bregma and ending at – 4.8 mm to Bregma. Sections were then stained with cresyl violet. The volume of lesion (e.g., cavity, cellular debris) plus atrophy (e.g., ventriculomegaly) was calculated manually using Scion Image J 4.0 (Scion Corporation, Frederick, MD, USA) as follows and as routinely performed (DeBow et al, 2003b; MacLellan et al, 2004):
Statistics
All behavioral and histologic analyses were performed by experimenters blind to group identity. Most data were analyzed by ANOVA and subsequent group comparisons (usually Fisher LSD tests). In cases of a significant Levene's test (i.e., heterogeneous variance), we used independent samples t-tests (equal variances not assumed). For nonparametric data (e.g., rating scales such as the NDS), we used the Kruskal–Wallis test followed by Mann–Whitney U comparisons. X2 tests were used to assess dropout rate. Regression analyses determined which combination of behavioral tests predicted lesion volume the best. For all ICH rats, we ranked lesion volume and performance on each test (from best to worst), and used Spearman's rank to determine how each test independently related to histology. Lesion volume was assessed by ANOVA and conservative Scheffe post hoc tests to help ensure that lesion volumes were truly different. However, we used the LSD test for multiple comparisons with the behavioral data to maximize our chances of detecting significant functional differences among our groups. This is also why we used relatively large group sizes and repeated testing (i.e., to improve statistical power). In all cases a P-value of < 0.05 was considered statistically significant.
Results
Lesion Volume
No mortality occurred in this study. One SHAM rat had damage in addition to the needle tract, and was thus excluded from analysis. This was likely due to a needle-induced hemorrhage. Otherwise, the lesions occurred as expected in the ICH groups with the striatum primarily affected but with damage sometimes occurring to the globus pallidus, corpus callosum, and thalamus. More extensive injury including other structures and a greater rostrocaudal involvement of the striatum occurred more commonly in the SEVERE group. A one-way ANOVA (P < 0.001) followed by Scheffe tests showed significant differences in lesion volume among all groups (P < 0.044; Figure 2) with the greatest injury in the SEVERE group and less in the MODERATE and MILD groups in that order.

Volume of tissue lost (mean ± s.e.m.) at 30 days after ICH/SHAM surgery (A). All groups were significantly different from each other. Photomicrographs represent a typical lesion (e.g., cavity, cellular debris, and ventriculomegaly) in the MILD (B), MODERATE (C), and SEVERE (D) groups. The hematoma was nearly completely reabsorbed by 30 days making the demarcation of injury easy.
Behavioral Assessment
Baseline performance in each test was similar among all groups (data not shown). The time needed to conduct training, testing and analysis for each behavioral test is given in Table 1.
Estimated time needed to conduct training, testing, and analysis for each behavioral test
Numbers listed represent the time required for one assessment per rat, except for the staircase and single pellet tests, which require multiple sessions. The number in parentheses denotes time needed for kinematic analysis of movements. Estimates do not include time needed for weighing and feeding rats during period of food deprivation, data entry, statistical analyses, etc. In the staircase test, multiple rats can be trained and tested concurrently.
Horizontal ladder walking test: Many ICH rats failed to cross the ladder on days 1 (60.0%; P = 0.023) and 3 (35.6%; P = 0.001), and thus data for these days were not analyzed further. One-way ANOVAs revealed significant effects of GROUP at all test days for the contralateral forelimb (P ≤ 0.007; Figure 3). The percentage of slips versus SHAM was greater in the ICH groups at all times (P ≤ 0.026), except for MILD on day 14 (P = 0.177), and MODERATE on day 7(P = 0.058). Large differences in error rates were sometimes detected among ICH groups (e.g., between MILD and SEVERE groups on days 14 to 28; P < ≤ 0.036); however, smaller effects were not significant (e.g., between MILD and MODERATE groups; P ≤ 0.135). Analysis of the contralateral hind limb revealed significant GROUP effects at each test day (P ≤ 0.018). The MILD group made significantly more errors than the SHAM group on days 21 (P = 0.019) and 28 (P = 0.008) only, whereas the MODERATE and SEVERE groups consistently made more errors than SHAM (P ≤ 0.031). Differences in error rates were not always detected between the MILD and MODERATE or MODERATE and SEVERE groups; however, the SEVERE group made significantly more errors than the MILD group on days 5, 7, and 21 (P ≤ 0.017). Ipsilateral forelimb and hind limb error rates were not significantly different among groups (P ≤ 0.113).

Contralateral forelimb error rate (% slips through bars) in the horizontal ladder-walking test from baseline (BL) to 28 days post-ICH/SHAM operation (A). Data for days 1 and 3 are not shown due to a significant dropout rate difference among the groups (some ICH rats would not cross the ladder). Differences among ICH groups were frequently detected. The relationship between ranked slip rate and ranked lesion size for ICH rats was statistically significant (B; Table 2).
A more detailed analysis of each step (graded from 0 for a total miss, to 6 for correct placement) on days 7 and 28 revealed that the ICH groups made more slight slips with the contralateral forelimb compared with the SHAM group on day 7 and day 28 (P ≤ 0.018 and < 0.024, respectively; data not shown). Furthermore, the MILD group made more replacements (P = 0.023), and the SEVERE group made more deep slips (P = 0.041) and fewer correct steps (P = 0.034) compared with the SHAM group on day 7. There were no differences among the ICH groups (P ≤ 0.082) on day 7, but the SEVERE group made more slight slips than the MILD (P < 0.001) and MODERATE groups (P = 0.035) on day 28. Data for the contralateral hind limb were similar (not shown). The ipsilateral forelimb and hind limb steps were similar among groups (P ≤ 0.061, not shown).
Forelimb use asymmetry (cylinder) test: Data from 15 test sessions (out of 472) were excluded because rats made fewer than six independent wall touches during the videotaped session. The dropout was not significantly different among groups (P = 0.090). For push-off and landing, differences in independent contralateral forelimb use were rarely detected (data not shown). Contralateral forelimb use during wall exploration revealed a significant effect of GROUP at all times (P ≤ 0.014; Figure 4). All ICH groups used their contralateral forelimb less than SHAM (P ≤ 0.003), except for the MILD group on day 3 (P = 0.111), and the SEVERE group on days 1 (P = 0.778) and 3 (P = 0.184). Interestingly, the SEVERE group used their contralateral forelimb significantly more than MILD (P = 0.028) and MODERATE groups (P = 0.008) on day 1 and more than the MODERATE group on day 3 (P = 0.040). This was likely due bilateral deficits in the SEVERE group. Otherwise, significant differences among the ICH groups did not occur (P ≤ 0.071).

Spontaneous contralateral forelimb use ((number of contacts with the contralateral limb + 1/2 both ((ipsilateral + contralateral limb use + both) × 100) during exploration of walls in the cylinder test
Adhesive tape removal test: There were significant GROUP main effects on all days (P ≤ 0.025, Figure 5). Differing rates of recovery were detected with this test. For example, the MILD group was impaired versus SHAM rats only on days 1 and 3 (P ≤ 0.019), whereas impairments were detected in the MODERATE group until day 21 (P ≤ 0.041 versus SHAM). Compared with SHAM or MILD groups, the SEVERE group took longer to remove the dot on the contralateral forelimb on all days (P ≤ 0.030) except day 1 (P ≤ 0.065). Large differences among the ICH groups (e.g., between MILD and SEVERE groups) were statistically significant on all days (P ≤ 0.030) except day 1 (P = 0.065). Smaller effects were rarely detected. For instance, the MODERATE and SEVERE groups were different only on days 14 and 21 (P ≤ 0.041) and not on other days (P ≤ 0.110).

Difference score (± s.e.m.; time to remove dot from contralateral forelimb—time to remove dot from ipsilateral forelimb) in the adhesive tape removal test from baseline (BL) to 28 days after ICH/SHAM operation
Beam walking test: There was a significant effect of GROUP on all test days (P ≤ 0.018) except day 28 (P ≤ 0.072; Figure 6). Differences in contralateral hind limb errors were usually not detected among ICH groups with the exception of the MILD and SEVERE groups on days 5 and 7 (P ≤ 0.023). Interestingly, the rate of recovery varied among the ICH groups. For example, the SEVERE and MODERATE groups were persistently impaired versus SHAM. until day 21 (P ≤ 0.014) and day 28 (P ≤ 0.033), respectively, whereas the MILD group had fully recovered by day 7 (P ≤ 0.134 versus SHAM).

Contralateral hind limb deficit score (median group score) in the Beam-Walking Test. Differing rates of recovery were detected.
Neurologic Deficit Scale: Each ICH group had significant neurologic impairments (higher NDS) from 1 to 28 days after ICH (versus SHAM; P ≤ 0.001; Figure 7). Small group differences were rarely detected. For instance, the MODERATE group had significantly greater deficits than the MILD group only on day 5 (P = 0.037), and differences between the SEVERE and MODERATE group were detected only on days 5 and 7 (P ≤ 0.029). However, the MILD and SEVERE groups were different from each other on days 1, 2, 3, 5, 6, 7, and 21 (P ≤ 0.024).

Staircase test: Two rats were excluded from this test because they failed to meet the criterion during training. A repeated measures ANOVA for each test session revealed significant GROUP (P ≤ 0.001) and DAY (P ≤ 0.005) main effects. All groups retrieved more pellets over days. Each ICH group retrieved significantly fewer pellets than SHAM animals on days 7 to 10 (P ≤ 0.001; Figure 8), and ICH groups were significantly different from each other (P ≤ 0.040). Data were similar for days 21 to 24; however, the MILD and MODERATE groups could not be distinguished from each other (P = 0.058). Significant GROUP main effects (P < 0.001) were also detected for the number of pellets retrieved with the ipsilateral forelimb. On days 7 to 10, all groups were significantly different from each other (P ≤ 0.035) with the exception of MILD and SHAM groups (P = 0.502). However, by days 21 to 24, only the SEVERE group was impaired (P < 0.001 versus SHAM). The SEVERE group was also significantly different from the other ICH groups at this time (P ≤ 0.025).

Single pellet test: Fifty three percent of the MODE RATE group and 67.7% of the SEVERE group did not reach with the contralateral forelimb in the single pellet test on days 10 and 28 (P =0.005). Instead they reached with their initially nondominant limb (i.e., they switched limb preference) or did not reach at all. Thus, only the MILD and SHAM groups were analyzed for reaching success and quality of reaching. A three-way ANOVA revealed significant GROUP (P < 0.005) and DAY (P = 0.029) main effects. The MILD group retrieved fewer pellets (versus SHAM), and both groups retrieved more pellets on subsequent days during each test session. Qualitative rating of reaching movements for successful reaches demonstrated that both aim (P = 0.023) and grasping (P = 0.021) were impaired in the MILD group (versus SHAM) at day 10. However, only the grasping deficit persisted until day 24 (P = 0.008; data not shown).
Swim test: Data from 50 test sessions (out of 472) were excluded because some rats failed to swim directly to the platform (e.g., swam along the walls or did not reach platform). Most of this occurred on days 1 and 3; however, the dropout rates were not significant (P ≤ 0.062). There was a significant effect of GROUP on all days (P ≤ 0.027), but day 14 (P = 0.168). The ICH groups used their contralateral forelimb more frequently than the SHAM rats on most days (i.e., had significantly higher difference scores; P ≤ 0.029). However, impairments (versus SHAM) were not detected in the MILD group on day 28 (P = 0.102), the MODERATE group on days 14 and 21 (P ≤ 0.078), and the SEVERE group on day 14 (P = 0.051). No significant differences occurred among any of the ICH groups at any time (P ≤ 0.086; Figure 9).

Composite Behavioral Analyses
To assess overall performance, rats were ranked from best to worst on their average or median performance across test days for each test (e.g., mean pellets retrieved with the contralateral forelimb in staircase test). These ranks were averaged, and an ANOVA of the mean composite scores revealed a GROUP main effect (P < 0.001) and significant differences among all groups was seen with post hoc analysis (P ≤ 0.032; Figure 10A). In general, rats with smaller lesions ranked better than ones with large lesions. The overall ranked performance across all tests correlated well with the ranked volume of injury (r = 0.747, P ≤ 0.001; Figure 10B; also see Table 2 for r-values for each test). Similar findings were found with correlations between actual lesion volume and performance for individual tests or the composite score (i.e., Pearson r-values; data not shown). Regression analysis revealed that the combination of behavior tests that best predicted histology depends on the time of assessment. For example, at day 7, the adhesive tape removal, horizontal ladder, and cylinder tests had the highest relationship to lesion volume. At day 28, the staircase, cylinder, adhesive tape removal tests, and NDS strongly related to histology. When overall performance was considered, only the adhesive tape removal test and NDS significantly predicted lesion volume in the multiple regression analysis.
Relationship (Spearman's rho) between ranked lesion volume (at 30 days) and early (day 7), late (day 28), or overall ranked performance on each behavioral test for all ICH rats

Composite behavioral score (mean rank ± s.e.m.) for overall performance in all tests
Discussion
Before clinical investigation, prospective cytoprotectants should be shown to improve functional outcome in animal stroke models (e.g., rat). Accordingly, there is a need for tests that detect stroke damage as well as a cytoprotective effect. Presently, we used a range of functional tests (e.g., walking, skilled reaching) to determine whether they could detect a subcortical ICH lesion, and if testing could distinguish among gradations in lesion size akin to significant cytoprotective effects. Our results show that while behavioral testing easily detected ICH-induced subcortical injury, only a few tests frequently distinguished among groups and strongly correlated to lesion size, and of these, none consistently differentiated among all ICH groups. Therefore, we urge investigators to use a battery of tests, especially if moderate treatment effects are expected.
There are several issues to consider when selecting behavioral tests for rodent stroke studies. First and foremost, deficits depend on the location of injury, and thus the tests should be chosen with this in mind. For instance, damage to the dorsomedial striatum disrupts locomotor activity, whereas a more lateral lesion affects skilled motor control (Kirik et al, 1998; Pisa and Schranz, 1988). Accordingly, if a certain behavior were controlled by only one subsection of a damaged site, a larger lesion affecting other portions of that structure would not be expected to produce greater impairment. An ICH, however, often crosses functional boundaries as presently seen (e.g., damage to the corpus callosum, internal capsule, and striatum). Thus, a broad range of deficits (e.g., in skilled reaching, forelimb inhibition, and walking) such as occurs after an ICH is more likely to be detected using a battery of tests, rather than one or two tests. Additionally, behavioral deficits might only occur when injury has reached a critical threshold and then remain unchanged. In our study, several tests (e.g., cylinder and swim tests) were effective ‘lesion detectors,’ but did not distinguish among gradations in injury. Therefore, these tests are not recommended for evaluating modest cytoprotective effects after ICH, but they are useful for detecting the presence of mild injury (versus normal animals). The utility of testing also depends on timing and insult severity. For instance, ICH rats showed maximal deficits soon after ICH in the beam test only to later show an apparently complete recovery in all groups. Thus, this test is not recommended for assessing long-term outcome. In addition, many rats did not cross the horizontal ladder during the first week after ICH and many MODERATE and SEVERE rats refused to reach with their impaired limb in the single pellet task, even weeks after the ICH. Finally, cytoprotection studies must take into account whether food deprivation is needed (e.g., skilled reaching tests) as it may be contraindicated if the stroke or treatment produces lasting reductions in body weight and appetite.
Given these concerns, it makes sense to broadly test behavior after ICH. However, during the early (1 week) post-ICH period, we recommend the NDS owing to its ability to distinguish among groups reasonably well across a range of injuries, in addition to the simple and quick nature of testing and data analysis. If long-term deficits are sought, then the staircase test is highly recommended. While the data are easy to analyze, the test is time consuming (e.g., training is recommended) and it requires food deprivation. Neither the NDS nor staircase test, however, consistently distinguished among all groups. Therefore, other tests such as the tape test should also be used. It should be noted that the time required for each of these tests can be reduced by simply reducing the number of testing sessions from the large number we used. The use of detailed analyses of skilled reaching and walking did not produce data that distinguished groups any better than end point measures (e.g., reaching success). Given the time required for these analyses, we do not recommend these procedures for routine cytoprotection studies, which commonly use large numbers of animals. However, these tests are appropriate for studies aimed at identifying whether rats truly recover or compensate, which at some point is important in cytoprotection studies. Finally, it was clear that groups were easily distinguished by a battery of tests more so than with any particular test. Thus, multiple tests should be used in cytoprotection studies.
Our findings suggest that cytoprotection studies that have relied on a single test might have missed important treatment effects, especially if they were relatively small. However, studies have used tests, such as NDS, after ICH and have found functional improvements with experimental cytoprotectants. Interestingly, in several cases, behavioral improvements occurred despite the absence of discernable histologic protection (Peeling et al, 2001; Belayev et al, 2005). We suspect that such treatments are acting, at least in part, on residual tissue (e.g., dendritic branching and number of synapses), and thus are not simply acting to reduce tissue loss via attenuating cell death. Indeed, it is possible that such treatment strategies are, at present, a better approach to improving functional outcome than simply reducing lesion size as recently found in a study examining hypothermia and rehabilitation treatments after ICH in rats (MacLellan et al, 2005). Alternatively, beneficial effects on one or two tests may not necessarily occur with other tests. Likewise, it is possible that significant treatment effects with an NDS may be due to one subcomponent and not others.
There are several limitations to this study. First, we did not assess performance on cognitive tests, or on all of the sensory and motor tests (or their variations) known to be sensitive to striatal injury (e.g., Corner Turn Test (Hua et al, 2002) and rotarod (Chesney et al, 1995)). Such tests might be better at discerning gradations in injury. Lasting sensorimotor impairments, however, would likely confound cognitive tests and we had to limit the already extensive amount of sensorimotor functional testing. Second, we assessed performance repeatedly on all tests and it is possible that this inadvertently acted as rehabilitation, thus enhancing the degree of ‘spontaneous’ recovery and thereby lessening group differences. The fact that group trends were largely the same throughout the testing periods, however, argues against this explanation. Third, the presence of bilateral deficits (e.g., as seen with the staircase test) may have affected our results with the cylinder, swimming, and adhesive tape tests. For example, the SEVERE group was initially better than the MILD and MODERATE groups in the cylinder test. Given the severity of the insult, it is likely that the ipsilateral limb of SEVERE rats was affected. Therefore, deficits may have been masked by bilateral impairments. Accordingly, such difference scores must be interpreted cautiously. Fourth, we did not determine the utility of these tests in the autologous whole blood model of ICH. Further study is clearly needed in this model to determine if treatment effects can be reasonably determined with these tests. Likewise, behavioral testing in any stroke model with higher variability in size and location of injury should be even more problematic than that presently found. This argues for using multiple tests. The inherent variability in human ICH will likely make the demonstration of functional benefit exceedingly difficult. Fifth, we did not relate behavior to other important end points (e.g., edema) that may influence performance (Hua et al, 2002). Sixth, choice of statistical analysis clearly influences outcome. We used a stringent post hoc test to show that the lesion volume was significantly different among ICH groups, whereas no corrections were made for multiple comparisons with the behavioral tests. Significant differences in functional outcome infrequently occurred between the smaller gradations in injury (e.g., MILD versus MODERATE), and more stringent post hoc testing would only exacerbate this. We have also noticed that many stroke investigators inappropriately analyze ordinal data (e.g., NDS) by using ANOVA as well as presenting this data as mean and standard deviation or error. Finally, use of larger group sizes may improve statistical power sufficiently to detect group differences. However, we used approximately 15 rats per group, which is more than that typically used in this field where group sizes are often less than 10. Furthermore, the overlap in lesion sizes among the ICH groups realistically reflects treatment effects seen in the literature. Thus, cytoprotection studies, as currently performed, are not likely to consistently detect small functional differences.
Effective preclinical testing of putative cytoprotective agents for brain injury requires appropriate functional evaluation in addition to quantification of brain injury (Corbett and Nurse, 1998; Hunter et al, 1998; Stroke Therapy Academic Industry Roundtable (STAIR), 1999; Hudzik et al, 2000; DeBow et al, 2003a), especially because the latter is necessarily incomplete. Behavioral tests should be validated for the type of insult produced, and must be sensitive to injury as well as the effects of treatments. Most experiments test whether a behavioral test is sensitive to injury, but fail to examine whether it can distinguish among lesion sizes that occur with a cytoprotective intervention. Indeed, many of the tests we used did not consistently distinguish among sizeable gradations in injury, but did detect a lesion. Therefore, we recommend using a battery of tests (e.g., staircase, NDS, and tape tests) sensitive to a range of deficits over several testing sessions (e.g., within first week, near one month). Furthermore, we strongly encourage investigators to evaluate potential tests across a range of insult severities in their particular model because other factors will impact the utility of testing (e.g., strain, age, gender) (Roof and Hall, 2000; Wahlsten, 2001; Takaba et al, 2004). Also, when designing an appropriate test battery for cytoprotection studies, researchers should consider the type of injury, timing of testing, the time required to complete training and testing, the limitations of tests, and the sample size needed to detect differences in outcome. Only with such a battery might we truly identify effective cytoprotection in rodent models that then hopefully pass clinical scrutiny.
Footnotes
Acknowledgements
The authors gratefully acknowledge technical assistance from D Clark, S Kirkland, B Murdoch, A Nguyen, and L Smith. Research was supported by grants (to FC) from the Heart and Stroke Foundation of Alberta, NWT and Nunavut, the Canadian Institutes for Health Research, and the Natural Sciences and Engineering Research Council of Canada (NSERC). FC is supported by an Alberta Heritage Foundation for Medical Research Medical Scholar Award, and CM is supported by an NSERC Canadian Graduate Doctoral Scholarship.
