Abstract
Cognitive decline is a significant long-term consequence of stroke with no available treatments. To aid in therapy development, we sought to achieve robust detection of cognitive performance after stroke in a multi-site design. Ischemic stroke was induced in adult and middle-aged male C57BL/6 mice utilizing permanent distal middle cerebral artery occlusion (dMCAO), dMCAO with hypoxia or transient MCAO by filament insertion. Cognitive outcomes were assessed by Novel Object Recognition (NOR) and Barnes Maze (BM) tests prior to surgery, and during sub-acute (1–2 weeks) and chronic (8 weeks) phases post-stroke. Histology and immunostaining were used to assess infarct size, tissue damage and neuronal loss, and plasma neurofilament light chain levels were quantified. We did not detect a cognitive deficit using NOR, but detected significantly worse performance in dMCAO mice with the BM. Multi-site BM validation will be a key next step. Overall, our study highlights common challenges in detecting post-stroke cognitive impairment within the pre-clinical stroke community, and underscores the complexities of designing and executing these studies, particularly as applied to a multi-site structure. We provide recommendations and suggest important considerations for future stroke cognition studies, whether operating as an individual lab or a multi-site group.
Keywords
Introduction
Stroke represents a significant global health burden, contributing to high rates of mortality and disability. Post-stroke cognitive impairment (PSCI) has been identified by survivors as one of the most distressing consequences, affecting up to one-third of stroke survivors within 5 years, and greatly diminishing quality of life.1–3 This is underscored in surveys conducted among patients and healthcare professionals, which identified the foremost priority in the field to be research aimed at improving cognitive symptoms such as impairments in memory and concentration.4–6 Notably, stroke doubles the risk of subsequent cognitive impairment, irrespective of established vascular risk factors such as hypertension and obesity, or the prevention of additional infarcts. 2 PSCI represents a critical unmet need due to the absence of available treatments and lack of mechanistic understanding.2,6,7
One-way to address this need is through the development and characterization of pre-clinical stroke models that generate robust cognitive deficits. Such models would create opportunities for pre-clinical screening of candidate interventions and improve prospects for translation into clinical therapies.8,9 Previous networks have conducted therapeutic testing in multi-center, randomized pre-clinical trials utilizing stroke size as a primary endpoint.10–12 The Stroke Preclinical Assessment Network (SPAN) has made important advances by establishing a multi-center trial protocol with sensorimotor function changes as the primary readout.13,14 Nonetheless, a need still remains to consider other functional outcomes relevant to stroke survivors, including cognitive decline. Researchers could then validate potential therapeutics using robust outcome measures with high clinical relevance.15–18
Rodents have a remarkable ability for spontaneous recovery after ischemic stroke, making long-term behavioral testing challenging. 15 Cognitive testing is time intensive, can be hard to interpret and is difficult to perform reproducibly. Additionally, inter-lab, inter-investigator and inter-animal variability are major challenges for pre-clinical multicenter trials. Due to these barriers which are well-recognized within the field, there is likely an inherent publication bias against negative or neutral results from these studies, limiting the ability of the field to identify best practices.19,20
It is also important to consider that cognitive decline after stroke likely arises from multiple etiologies. Acute cognitive decline is directly related to lesion size and location.6,21 While these direct effects may persist, additional mechanisms such as neurodegeneration and/or unresolved harmful inflammation may also underlie new diagnoses of PSCI in the chronic phase.22,23 Altered brain-wide connectivity is also likely a factor in longer-term cognitive changes after focal stroke. 23 Thus, it is important that cognitive tests can reliably detect a deficit at multiple timepoints after stroke. Additionally, human ischemic stroke is a heterogeneous and complex disease, due to high inter-patient variation, which cannot be fully recapitulated using any one model of pre-clinical stroke.9,24 Thus, assessing cognition in multiple stroke models that generate different amounts and anatomical patterns of damage will increase the sensitivity for detection of PSCI.
The Stroke-IMPaCT Network, a multi-site network comprising six labs across Europe and North America, was established to understand immune mechanisms contributing to PSCI, and from this, to identify and test novel immunomodulatory interventions. An enabling primary aim was to identify protocols that robustly detect cognitive impairment after experimental stroke in multiple labs. Our multi-site project design tested cognitive behavioral outcomes during sub-acute (1–2 weeks) and chronic (8 weeks) phases of ischemic stroke. We included three stroke models, and tested two different ages with two cognitive assays to more accurately reflect the complexity of the human stroke population. While we did not identify a combination of stroke model and cognitive test for robust detection of PSCI, we expect that our findings will aid the field (1) in managing expectations for detection of pre-clinical PSCI using commonly-applied approaches, and (2) as a basis for refining methodology to reproducibly detect cognitive impairment in pre-clinical models of ischemic stroke in future trials.
Materials and methods
Animals
Animal protocols were conducted and reported in accordance with the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. Experimental procedures were approved by the appropriate animal care and use committees at each institution (CNIC: Animal Welfare Committee and Ministry of Environment and Territorial Planning of the Community of Madrid [RD 53/2013; PROEX 047/16]; CUB: Landesamt für Gesundheit und Soziales, Berlin, [G0312/16 and G0167/20]; UoE: Bioresearch and Veterinary Services Committee; UoA/SU/WC: Institutional Animal Care and Use Committees) and were performed under relevant national and institutional rules (European Community Council Directive 86/609/EEC; 2010/63/EU; Animal Scientific Procedures Act 1986; UK-Home Office; United States National Institutes of Health Guidelines; United States Public Health Service’s Policy on Human Care and Use of Laboratory Animals). A total of n = 150 adult (12–15 week old) and n = 77 middle-aged (10–12 month old) male C57BL/6J mice were used (Figure 1). Sham and naïve littermates were used as controls, and all mice were randomized among experimental cages. Exclusion criteria for behavioral testing analysis due to mortality, inability to perform cognitive tasks or tissue pathology are outlined in Figure 1(b). Additional details are available in Supplemental Material.

Study design. (a) From left to right, visual representation of points of permanent (dMCAO, DH) or transient (tMCAO) middle cerebral artery occlusion, representing the three pre-clinical models of stroke used. Study design with respective behavioral testing timepoints and times of sacrifice. This panel was created with BioRender. (b) Visual representation of the total number of animals used in two behavioral testing cohorts in Figures 3 and 5. Numbers represent the animals used in each surgical (stroke or sham) or naïve group in adult (12–15 weeks) and middle-aged (10–12 months) cohorts of male C57BL/6 mice. Animals were excluded due to surgical mortality, lack of novel object preference at baseline (discrimination index <0.1), or irregular pathology. Irregular pathology is defined as lack of stroke pathology in MCAO animals, or aberrant pathology present in sham or naïve animals. Red numbers represent the number of animals excluded from all analyses for both pathology and task performance. Figure 1A created in BioRender. Zera, K. (2025).
Stroke models
Three pre-clinical stroke models were used, as depicted in Figure 1(a): proximal transient (filament) middle cerebral artery occlusion (tMCAO) was performed at Weill Cornell Medical College (WC; 25 min occlusion) and Charité Universitätsmedizin Berlin (CUB; 60 min occlusion), permanent distal middle cerebral artery occlusion (dMCAO) was performed at Stanford University (SU), University of Edinburgh (UoE) and Centro Nacional de Investigaciones Cardiovasculares (CNIC), and dMCAO + Hypoxia (DH) surgery was performed at University of Arizona (UoA; 40 min hypoxia) and SU (60 min hypoxia), as described previously.22,25–31 Site specific surgical procedures are outlined in Supplemental Material.
Behavioral testing
Novel Object Recognition
Novel Object Recognition (NOR) was performed on adult and middle-aged cohorts, as previously described.32–34 Site-specific differences in NOR protocols used for the adult (12–15 week old) cohort are available in Table 1, while a single unified NOR protocol was used by all sites for the middle-aged (10–12 month old) cohort. Briefly, the unified NOR task consisted of two phases: a training phase (Phase I) and testing phase (Phase II). In Phase I, a single animal was placed in the arena with two identical objects and allowed to explore freely for 5 min. Phase II started 3 h after the completion of Phase I. Here, mice were exposed to one familiar and one novel object for 5 min, and their interactions with the objects were recorded. Objects were counterbalanced between animals, to avoid any innate object preference. Time spent interacting with each object was manually recorded by a blinded experimenter. To assess task performance, a discrimination index (DI) score was calculated with: (tnovel − tfamiliar)/(tnovel + tfamiliar). A score closer to −1 indicates a preference for the familiar object, a score closer to +1 indicates a preference for the novel object, and a score close to 0 demonstrates no object preference.
Laboratory specific differences in NOR protocols.
Adult (12–14 week old) male C57BL/6J mice were tested on the Novel Object Recognition (NOR) protocol prior to, and 1 and 8 weeks after sham or stroke surgery. Each site had minor variations in NOR testing protocol.
Barnes maze
A modified Barnes maze (BM) protocol was utilized as previously described,32,33 with minor modifications for middle-aged mice. Briefly, the maze consisted of a large, circular platform with 16 holes on the outer edge, which was positioned approximately 90 cm off the floor in the center of the room. All maze holes were left open to the floor, except for one which contained an escape hole. The escape hole was aligned with one of four visual cues, which were equally spaced around the testing room. Mice performed four trials per day for four consecutive days at each testing timepoint. The escape hole position was fixed for the entirety of the task, while the starting position of each trial within a day was altered relative to the escape hole position. Each trial had a maximum duration of 90 s. If a mouse failed to find the escape hole within 90 s, it was gently guided towards the escape hole by tapping its tail and hind legs. The time (in seconds) to identify the escape hole (primary latency), defined as a mouse entering the hole opening with at least their head, was manually recorded.
Tissue collection and processing
Mice were euthanized at 3 or 56 days following stroke by exsanguination and intracardiac perfusion with 0.9% saline (without anticoagulant) under isoflurane (UoE, UoA), sevoflurane (CNIC), sodium pentobarbital (WC), or xylazine/ketamine (SU, CUB). Brain tissues and blood were collected. Brains were extracted and post-fixed in 4% paraformaldehyde in phosphate buffer for 24 h, then transferred to 30% sucrose solution (with 0.1% sodium azide) until sectioning. All brain tissue was sectioned at 40 μm thickness with a freezing microtome (UoA, SU) or cryostat (UoE), and sequentially collected into 16 tubes with cryoprotective medium (30% glycerin, 30% ethylene glycol, 40% 0.5 M sodium phosphate buffer). All sections were stored at −20°C in cryoprotective media.
Blood was drawn from the vena cava or left ventricle using an EDTA-impregnated syringe. Microvette 500 K3EDTA tubes (Sarstedt #201341) containing whole blood were centrifuged at 4°C for 10–15 min at 1000 g. Subsequently, a second centrifugation for 15 min at 2000 g was performed to deplete platelets. The resulting supernatant was designated as plasma and stored at −80°C until use.
Histopathology and immunostaining
Hematoxylin & eosin (H&E) staining
All H&E and immunofluorescence staining was performed at UoE. H&E reagents were purchased from Epredia (UK, Instant Hematoxylin #12687926) and Cell Path (UK, Eosin Y #RBC-0201-00A). Sections were washed in PBS at room temperature (RT) to remove cryoprotectant. Sections were then mounted, washed in dH2O, and dried overnight prior to H&E staining, using a standard protocol. 35 Briefly, sections were dipped 10 times in 95% EtOH and rinsed in dH2O for 30 s. Slides were incubated in hematoxylin for 1 min 30 s and then rinsed in dH2O until the water ran clear. Slides were then dipped in acid alcohol for 30 s and rinsed in dH2O for 30 s. Next, slides were dipped into Scott’s tap water for 1 min and rinsed in dH2O. Then, slides were dipped once in eosin and washed in dH2O. Finally, slides were dehydrated, cleared in xylene and mounted with DPX.
H&E staining was performed on two sections per animal: one rostral (Bregma −0.70 mm to −0.82 mm) and one caudal (Bregma −1.70 mm to −2.18 mm). Percent infarct area was calculated at 3 and 56 days using the Swanson method36,37 on the rostral brain section only. This brain level most consistently captured the stroke lesion across models, and was deemed most appropriate to compute a measure reflective of infarct size for relative comparison between models. The regional distribution of gross pathology was quantified on both rostral and caudal sections (ROIs: rostral cortex, caudal cortex, striatum, thalamus, and hippocampus). Pathological evidence of stroke was defined on H&E as presence of vacuolation, pallor, and/or neuronal perikaryal shrinkage within each ROI (Figure 2(c), Supplemental Figure 4). We also quantified ventricular enlargement (a proxy for brain atrophy) and displacement of the corpus callosum. If pathology was identified within a defined ROI, the animal was given a score of 1 for that metric, and if no damage was identified the score was 0. We then calculated sums to identify the number of animals per model that exhibited damage in each region.

Acute (3 days) histopathology assessment in three pre-clinical stroke models. (a) H&E representative images from each stroke model (dMCAO, DH, tMCAO) at rostral (Bregma −0.70 mm to −0.82 mm) and caudal (Bregma −1.70 mm to −2.18 mm) brain levels. The infarct is outlined with a solid black line. Scale bar = 2.5 mm (b) Quantification of infarct size at 3 days after dMCAO, DH, or tMCAO across the six sites. (c) Anatomical locations for assessment of frequency of damage distribution in dMCAO, DH, and tMCAO stroke models. Assessment of damage was conducted on H&E staining and scored categorically as absent (score of 0) or present (score of 1) in each anatomical location. A frequency score of 1 indicates that all animals within that group and at that anatomical location showed pathological damage. (d) Plasma NfL concentration in each stroke model 3 days post ischemia, and respective sham controls. Naïve mice were analyzed from SU only. (e) Plasma NfL concentration correlation with percent infarct area in a Spearman correlation analysis (Naïve: n = 9; dMCAO: n = 20 per grp; DH: n = 17–18 per grp; tMCAO: n = 10–17 per grp). Data are presented as mean ± SD and statistics represent a two-way ANOVA (b, d) with Tukey’s post-hoc test.
NeuN immunofluorescence
Immunofluorescent staining followed a standard protocol. Briefly, sections at Bregma level −1.34 to −2.06 were washed in PBS at RT. Sections were then blocked for 2 h (10% Donkey serum in PBS-T), before overnight primary antibody incubation with NeuN (Abcam, ab177487, 1:1000; 10% Donkey serum in PBS-T) at 4°C. Next, sections were incubated with a fluorophore-conjugated secondary antibody for 2 h at RT, counterstained with DAPI and mounted in Prolong Glass Anti-Fade (Thermofisher, #P36984). Percent coverage of NeuN was calculated by thresholding hippocampal ROIs in QuPath (v0.5.1).
Evaluation of plasma neurofilament light chain (NfL) levels
NfL levels were quantified using the Simoa® NF-Light v2 Advantage Assay (Quanterix®, Cat. No. 104073) according to manufacturer instructions. Plasma samples, controls, and calibrators were measured in duplicate. The two-step immunoassay uses paramagnetic beads coated with anti-NfL antibodies to capture target molecules, which are then detected by biotinylated antibodies and labeled with streptavidin-ß-galactosidase. Signal detection was performed using the Simoa® optical system after adding substrate solution and transferring the beads to the Simoa® disc.
Statistical analysis
Experimenters were blind to surgery status throughout data collection and analysis. Statistical testing was performed in R (version 4.4.2) or Graphpad Prism 10. Normality was assessed with a Shapiro-Wilk test. Histological and immunofluorescence data were analyzed with one-way ANOVA with Tukey’s post-hoc test. Baseline behavioral data were analyzed with one-way ANOVA with Tukey’s post-hoc test, and behavioral performance over time was analyzed with a two-way repeated measures ANOVA with Tukey’s post-hoc test. Statistical variance was assessed with f-test for equal variance, manual and automated NOR scoring was compared with the Bland-Altman method comparison. Infarct size was correlated to plasma NfL concentration with a Spearman correlation analysis, and inter-rater variability was assessed with the intraclass correlation coefficient. For all statistical testing, p < 0.05 was considered statistically significant.
Results
Study design and exclusions
Size and neuroanatomical distribution of the primary damage (infarct and associated neuronal death) may impact cognition in both acute and chronic phases. Furthermore, our network laboratories had individually detected cognitive deficits when testing the same models used here at similar timepoints.22,23 We hypothesized that by incorporating stroke models which produce differing size/distributions of pathology together, this would reveal whether certain models and associated pathological patterns may be more strongly associated with PSCI. First, we analyzed acute pathology at 3 days to characterize the pattern of ischemic damage in our three models. Here, animals did not undergo behavioral testing, and exclusions were based solely on pathological findings. A total of n = 6 mice were excluded from the tMCAO group (mice with no stroke pathology), and n = 4 from the dMCAO group (aberrant pathology in sham mice). Next, to assess longitudinal cognitive changes after stroke in adult mice, we first utilized NOR in another cohort of mice. NOR is one of the most common cognitive tests in rodents, has low time commitments and was readily implemented at all sites at low cost.34,38 We tested cognitive outcomes in both sub-acute (1–2 weeks) and chronic (8 weeks) phases of ischemic stroke in adult (12–15 week old) and middle-aged (10–12 month old) mice. The BM test was introduced in the middle-aged cohort to assess spatial working and reference learning/memory. Overall study design is summarized in Figure 1(a). Figure 1(b) outlines the reasons for exclusion from behavioral analyses in Figures 3 and 5. In the adult cohort, 45 mice were excluded due to surgical mortality, no preference for the novel object at baseline (discrimination index, DI < 0.1) or lack of stroke pathology on histopathological examination. Similarly, 33 animals were excluded from the middle-aged cohort. Surgical mortality primarily affected mice assigned to the tMCAO stroke group, and a lack of novel object preference at baseline was evident across all sites and ages.

Adult mouse performance on Novel Object Recognition (NOR) task. (a) Schematic of the NOR test; 12–15 week old male C57BL/6J mice are presented with two identical objects in Phase 1, and after 3 h (inter-trial interval; ITI) one familiar object was swapped with a novel object (Phase 2). Interaction time for each object was recorded to calculate a discrimination index (DI). (b) Average DI at individual sites, categorized by the amount of handling (b) or habituation (c) (See Table 1 for details) performed at baseline (WC, n = 24; UoE, n = 26; UoA, n = 23; SU, n = 29; CUB, n = 6). (d) DI of all mice assigned to each stroke model (dMCAO, DH, or tMCAO) prior to surgical intervention (baseline performance). A DI < 0.1 merited exclusion for lack of novel object preference. Animals excluded from behavioral analyses are plotted in red. The dotted line represents DI = 0, or no object preference. (e) DI of all naïve mice assigned to a surgical model at baseline, week 1 and 8 as a measure of performance after repeated testing (dMCAO, n = 7; DH, n = 12; tMCAO, n = 6). (f) DI score in the NOR task of sham and stroke mice for each surgical model (dMCAO, n = 9–10 per grp; DH, n = 16–17 per grp; tMCAO, n = 12–16 per grp). All data are presented as mean ± SD and analyzed by one (b–d) or two-way (e, f) repeated measures ANOVA with Tukey’s post-hoc test.
Stroke model comparative pathology
We first assessed the distribution of acute stroke pathology to investigate model consistency between sites, and to establish patterns of pathological damage across models. A final cohort of n = 40 dMCAO, n = 44 DH, and n = 32 tMCAO adult (12–15 week old) male C57BL/6 mice compiled from all six sites was euthanized 3 days after surgery. Edema-corrected infarct measurements 3 days after stroke revealed that the dMCAO and DH models result in consistent infarct sizes (~12% of hemisphere) between sites (Figure 2(a) and (b)). In contrast, infarct size at sites using the tMCAO model was more variable. Infarct size at WC was 45% of the hemisphere, while at CUB it was 15% (Figure 2(b)). Infarct size in the tMCAO model was unrelated to the duration of occlusion, likely reflecting the model’s inherent variability. Infarcts generated by this model are impacted by blood flow through the Circle of Willis, which is anatomically variable in C57BL/6 mice.38,39
We next determined the neuroanatomical distribution of primary stroke pathology in rostral and caudal sections (Figure 2(c)). All three models had significant cortical damage in both rostral and caudal sections, with minimal thalamic damage. In contrast, the tMCAO model had an elevated frequency of striatal damage and hippocampal death compared to either the dMCAO or DH models.
We next quantified plasma NfL concentration, an emerging biomarker for neurodegeneration in aging and Alzheimer’s disease. We hypothesized NfL concentration may serve as a biomarker of neurodegeneration after stroke that could be quantified longitudinally to complement cognitive testing. First, we assessed the relationship of NfL levels with primary stroke damage.40–42 In all three models, plasma NfL concentration was significantly elevated 3 days after stroke compared to sham animals. There were no significant differences in plasma NfL levels in naïve mice compared to dMCAO or DH sham animals, while the tMCAO-sham animals had moderately elevated plasma NfL levels compared to naïve animals. Consistent with a larger infarct size and a higher frequency of damage in the tMCAO stroke model, plasma NfL concentration was significantly elevated compared to the dMCAO and DH models (Figure 2(d)). Spearman correlation analysis between plasma NfL levels and stroke size (Figure 2(e)) show that plasma NfL levels correlate moderately with infarct size at 3 days after stroke (Spearman rho = 0.2905; p = 0.0367).
We next assessed chronic pathological damage at 56 days post-stroke given the importance of chronic neurodegeneration to cognitive impairment. Infarct size in all models was largely similar to the 3 day timepoint (Figure 4(a) and (b)), although the histopathological appearance of the infarcts was substantially different (e.g., signs of dense inflammatory cell infiltrates). Marked hemispheric atrophy of the ipsilateral hemisphere was common, as reflected by a high frequency of ventricular enlargement (Figure 4(c)). The distribution of overt pathology at 56 days (Figure 4(c)) was also broadly similar to 3 days, with a minor reduction in the frequency of caudal cortical damage in the dMCAO model, likely reflecting infarct resorption, and more frequent striatal and thalamic pathology in the tMCAO model, potentially due to secondary remote degeneration. We also assessed NeuN coverage as a marker for neuronal death in the hippocampus, but saw no change in coverage in any model after stroke (Figure 4(d) and (e)). In contrast to the acute timepoint, at 56 days there was no difference in plasma NfL levels in dMCAO or DH stroke mice compared to their respective shams (Figure 4(f)). We did, however, observe elevated plasma NfL concentrations in tMCAO stroke mice compared to shams. Overall, plasma NfL levels were markedly lower at 56 days after stroke compared to the 3 day timepoint in all groups, and plasma NfL levels were no longer correlated with stroke size at this chronic timepoint (Spearman rho = 0.1289; p = 0.3220). Together these data capture the pathological characteristics associated with each of the models, revealing both inter-site/model consistencies and inter-model heterogeneity in infarct size/distribution, as well as acute-to-chronic evolution for evaluation of cognitive changes.

Histopathology assessment at 56 days after stroke in adult mice. (a) H&E representative images from each stroke model (dMCAO, DH, tMCAO) at rostral (Bregma −0.70 mm to −0.82 mm) and caudal (Bregma −1.70 mm to −2.18 mm) brain levels. Scale bar = 2.5 mm. Infarcted tissue is outlined with a solid black line. (b) Quantification of infarct size at 56 days post-surgery. Red circles represent mice that have been used for histopathological assessment but were not included in NOR analysis (dMCAO, n = 18; DH, n = 25; tMCAO, n = 9). (c) Anatomical locations for assessment of frequency of damage distribution in dMCAO, DH and tMCAO stroke models on H&E stained sections. (d) Representative images of NeuN staining at 3 or 56 days after sham or stroke surgery. Scale bar = 1 mm. (e) Quantification of NeuN percentage coverage in the hippocampus (dMCAO, n = 5–10 mice/group; DH, n = 5–8 mice/group; tMCAO, n = 5–10 mice/group). (f) Quantification of plasma NfL concentration in each stroke model 56 days post ischemia and respective sham controls. Naïve mice (n = 13–19/model) were pooled (dMCAO, n = 18–20 per grp; DH, n = 25–26 per grp; tMCAO, n = 9–15 per grp). Data are presented as mean ± SD and analyzed by one (b) or two-way (e, f) ANOVA with Tukey’s post-hoc test.
Subacute and chronic cognitive assessments in adult mice
We first determined whether inter-rater variability in the NOR task was sufficiently low to compare between sites, by randomly selecting two videos from each site (1 sham and 1 stroke) for manual scoring by one assessor per site (five assessments per video). Here, the intraclass correlation coefficient was 0.6991 (95% CI = 0.4548–0.894; Supplemental Figure 1). To confirm that manual assessment of the NOR recordings was accurate compared to automated AnyMaze analysis, one site (UoA) utilized both analysis methods on the same set of videos. Both manual and automatic scoring methods resulted in consistent outcomes, and the differences between methods were not dependent on the final DI value (i.e., neither method was better at detecting preference for a given object; Supplemental Figure 2). Given these results and the cost of purchasing AnyMaze software across all sites (an important consideration for utility within the wider research community), each site manually analyzed their own videos while blinded to experimental status.
We first analyzed baseline performance at individual sites to ensure all sites had a working task at baseline (Supplemental Figure 3). This revealed that half of the sites had a high percentage of mice with no novel object preference (DI < 0.1). We therefore evaluated site-specific environment and protocol differences. While certain differences that impact performance, such as anxiety and noise, cannot be quantified, we documented quantifiable differences in Table 1. We hypothesized that handling and habituation time may impact NOR performance, so we categorized sites based on none (WC), low (UoE, UoA), or high (SU, CUB) handling, and none (UoA), low (SU, CUB), or high (UoE, WC) habituation times (Figure 3(b) and (c)). The mean baseline DI was unaffected by handling time (Figure 3(b)). In contrast, the mean DI was significantly higher at sites with high habituation compared to those with low or no habituation (Figure 3(c)). We next analyzed the variance in DI across sites to determine whether these differences impact task variability and subsequent statistical power. While certain sites had significantly lower variability in DI (SU, WC) compared to others (UoA, UoE, CUB), there was no statistical relationship between the amount of handling or habituation time and the variability in DI outcomes (Supplemental Figure 3). We therefore established a uniform cutoff for exclusion across sites: a baseline DI < 0.1, which is permissive to inclusion while maintaining novel object preference (>10% more time). This led to the exclusion of 32 tested mice (Figure 3(d)).
We next asked whether re-testing affected longitudinal performance on the NOR task, as a potential confounding factor (familiarity yielding a lower DI score) for identifying cognitive impairment in stroked animals. Indeed, despite new object pairs being used at each new timepoint, naïve mice at all sites had a significantly lower DI at the 8-week timepoint compared to their respective baseline performance (Figure 3(e)).
We next determined whether any of the models caused a stroke-specific change in NOR performance during acute and chronic phases. We observed no difference in NOR performance between stroke and sham mice from any model, at any timepoint after surgery (Figure 3(f)).
Subacute and chronic cognitive assessments in middle-aged mice
Given our inability to detect cognitive impairment after stroke in adult mice, we altered our study design to increase sensitivity. We hypothesized that cognitive testing may be more sensitive in older mice, as they have exacerbated cognitive problems after stroke. 21 We therefore used middle-aged (10–12 month old) mice, and added the BM as a second cognitive test. Although BM testing requires considerably more time and experimenter skill, we hypothesized that it may be more sensitive to post-stroke deficits by probing additional cognitive domains, and it reduces exclusions due to poor baseline performance.32,33 In addition, we unified our NOR protocol to eliminate site-specific differences.
Here, we found that at baseline 37% of mice had a DI < 0.1 on the NOR test, and thus excluded these mice from subsequent NOR analyses (Figure 5(a)). We next assessed performance in stroke animals, compared to their respective sham controls at baseline, 2 and 8 weeks after surgery. At the 2-week timepoint, there was a trend (0.05 < p < 0.15) towards worse performance in the tMCAO stroke mice compared to the respective shams (p = 0.078), however this trend did not extend to the 8-week timepoint (p = 0.9998). There was no timepoint after stroke where a cognitive deficit was detected in dMCAO or DH mice with the NOR task (Figure 5(b)).

Middle-aged mouse cognitive behavior performance. (a) DI of all middle-aged (10–12 months) mice, designated by assigned surgical group (dMCAO, DH, or tMCAO). A discrimination index (DI) ⩽0.1 at baseline merited exclusion from subsequent NOR analyses, which is plotted in red. (b) DI of sham and stroke mice separated by surgical model at baseline and weeks 2 and 8 after surgery (dMCAO, n = 6–7 per grp; DH, n = 3–4 per grp; tMCAO, n = 3–5 per grp).(c) Schematic of the Barnes maze (BM) test. (d) BM primary latency(s) of naïve mice (n = 5/group). (e) Primary latency from Barnes maze testing of sham and stroke mice for each surgical model (dMCAO, n = 10/group; DH, n = 8/group; tMCAO, n = 3–10/group). (f) Primary latency (s) of individual mice on the fourth trial of day 4 of testing, during the week 8 testing timepoint. Data are presented as mean ± SD. Statistics represent results of a two-way repeated measures ANOVA with Tukey’s post-hoc test (b–e) or Student’s t test (f) within each surgical model.
In contrast to the NOR, middle-aged mice identified the escape hole in the BM at all timepoints with no exclusions (Figure 5(d)). Here, exclusion criteria was met if a mouse never identified or entered the escape hole after 4 days of training. Indeed, over the course of training, the primary latency of naïve mice significantly decreased, representing improved task performance. By the 8-week timepoint, naïve mice from all sites identified the escape hole in under 50 s. Like NOR, differences in performance of naïve animals likely represent site-specific differences in the testing environment or experimenter experience.
We next assessed whether the BM could detect a cognitive deficit after stroke. At 8 weeks after stroke, the tMCAO stroke mice performed significantly worse on the first day of testing compared to their respective shams (p = 0.0443), but performance recovered by the second day of testing (p = 0.9249) and the mice maintained equal performance for the duration of the task. In contrast, in both the dMCAO and DH stroke models, mice performed similarly on days 1–3 of testing, but stroke mice trended towards worse performance on day 4 (dMCAO p = 0.2313; DH p = 0.1145; Figure 5(e)). Given this, we analyzed the performance of individual mice on the final trial of the final testing day, where we anticipated the greatest differences in performance between sham and stroke animals (Figure 5(f)). There was no difference in the DH stroke model (p = 0.1812), and mice that underwent dMCAO performed significantly worse (p = 0.009).
Chronic pathological assessments in middle-aged mice
We next assessed infarct size in each stroke model at 56 days after stroke to determine whether aging altered the extent or distribution of ischemic damage. Lesion sizes in middle-aged mice were similar to those in adults (Supplemental Figure 5(a) and (b)). Also like adults, all dMCAO and DH stroke mice had rostral cortical damage, however most dMCAO mice and fewer tMCAO mice had caudal cortical damage. The presence of striatal and thalamic damage was also specific to tMCAO mice, consistent with the adult cohort, while corpus callosum disruption occurred in fewer middle-aged dMCAO mice compared to adult mice. Additionally, there was a higher frequency of hippocampal damage in both the DH and tMCAO models in middle-aged mice, but none in dMCAO mice (Supplemental Figure 5(c)). Similar to adult mice, there was no difference in hippocampal NeuN coverage in any model (Supplemental Figure 5(d) and (e)).
Finally, we analyzed plasma NfL levels to assess whether advanced age altered concentrations of plasma biomarkers of neurodegeneration. There was no significant difference in plasma NfL levels in naïve mice from each site, and we therefore pooled them into a single group (p = 0.112). Similarly, there was no significant difference in plasma NfL concentration between sham and stroke animals in any model at 56 days after surgery.
Discussion
Identification of pre-clinical stroke models with robust and reproducible cognitive deficits is urgently needed to further understand mechanisms responsible for PSCI, and to test novel candidate interventions. Here, we capitalized on a multi-site infrastructure embedded within the Stroke-IMPaCT network to incorporate three stroke models, three testing timepoints and two ages to assess cognition after stroke, initially using NOR. We found that low baseline DI (i.e., failure to show a preference for the novel object) led to a high number of exclusions in adults, and this was exacerbated by age and less habituation time. Protocol and site-specific differences also contributed to high variability at some sites. Overall, NOR did not detect stroke-specific changes in cognitive performance in any model at any site at either age. We therefore implemented the BM as a secondary measure of cognitive function, which achieved fewer exclusions and demonstrated stroke-induced cognitive changes in middle-aged dMCAO mice, although at a single site only. Overall, our study highlights common challenges in detecting PSCI within the pre-clinical stroke community, and the complexities of designing and executing pre-clinical stroke cognition studies, particularly within a multi-site structure. We provide recommendations and suggest important aspects of stroke cognition studies to consider in future work whether operating as an individual lab or a multi-site group.
Our multi-site structure enabled us to incorporate a greater range of stroke models than would be possible within an individual lab, while also allowing for inter-site comparisons within models. We did not attempt to standardize surgical or husbandry protocols due to regulatory and operational differences across sites, and this allowed us to assess recent observations suggesting productive heterogeneity can be an asset in multi-center studies to increase generalizability of findings. Previous multi-center pre-clinical stroke studies have had a differing purpose (i.e., to test candidate stroke therapeutics) to our current study and have adopted differing degrees of protocol harmonization across participating sites.10–14
We used three established stroke models with well-characterized patterns of pathology, but considered it important to confirm this, to assist with interpretation of behavioral outcomes. As expected, the distal MCAO models produced infarcts largely confined to the cortex with greater involvement of subcortical structures in the tMCAO model. The dMCAO and DH stroke models were consistent in infarct size and location across sites and surgeons, an important finding as it eliminates these factors as a source of variability. Both models are therefore good candidates for multi-center trials due to this consistency and because of low surgical time, fast recovery, and low mortality. 25 In contrast, higher variability in the tMCAO model was observed both within and between sites. This likely reflects both anatomical and systematic surgical differences in the tMCAO procedure. The stroke lesion in the tMCAO model is influenced by anatomical differences in the Circle of Willis which is highly variable in C57BL/6 mice. 25 The lesion size in this model is also highly dependent on filament diameter (which varied between sites) and extent and timing of reperfusion which can vary due to the “no-reflow phenomenon”.43,44 This suggests that the tMCAO model may require additional inclusion/exclusion criteria and standardization of surgical protocols to reach a desired balance between model heterogeneity and low variability. Chronic pathology in stroke models is less well documented and more likely to be associated with cognitive function, given the potential for secondary/remote brain injury to evolve over time. The distribution of gross stroke damage remained largely consistent between acute and chronic timepoints, however more frequent pathology in thalamic and hippocampal brain regions and substantial ipsilateral brain atrophy was evident 56 days after stroke. In future studies, these structural and gross pathological changes may be better captured by longitudinal MRI. We anticipated that ongoing chronic neurodegeneration could be detected by longitudinal assessment of blood NfL levels, so we measured it in both acute and chronic phases. Plasma NfL concentration was elevated in all stroke models at 3 days and correlated with 3 day infarct sizes, consistent with clinical data.42,45 However by 56 days, plasma NfL levels were moderately elevated only in the tMCAO model. Thus, while plasma NfL concentration is a sensitive biomarker for acute neurological damage after stroke (noting that tMCAO produces the largest strokes), it is uncertain whether it can reflect ongoing/delayed degeneration and/or if greater assay sensitivity may be needed to detect later degeneration, particularly in the distal MCAO models.
As in other multi-site studies, there is a balance to strike in choosing methods with potential for widespread implementation with varying resources versus assay applicability and sensitivity. We did not detect a cognitive deficit after stroke with any variation of the NOR task. This was surprising as multiple network members have successfully used this task in the past to detect a deficit in single-site experiments.22,46,47 While uniformity often improves the systematic basis of the study, it can also impose unfamiliar approaches on some partners within a network structure, and paradoxically has the potential to increase heterogeneity. Indeed, we chose to incorporate site-specific protocol differences into the experimental design for the adult cohort, with the expectation that as each protocol was optimized for each site’s testing environment, it would minimize exclusions and allow us to detect a deficit after stroke. Despite this, we still excluded 21% of adult animals for lack of novel object preference at baseline and observed no NOR cognitive deficit within any model or site. We did find that with predefined object interaction criteria, NOR can be scored consistently between observers, and does not require automated testing software. Additionally, our data suggest that higher habituation increased DI score, while any amount of handling or habituation reduced the variability in DI scores. This implies that animal anxiety may play a role in outcome variability, although we can’t exclude that environmental differences at each site may also impact these observations. Finally, our results indicate that NOR is prone to re-testing effects, and thus, may not be suitable for longitudinal studies. Overall, while NOR may be useful to implement on an individual lab basis where protocols can be tailored to maximize performance, it is not robust for use in multi-site studies in our experience.
We observed multiple patterns of damage affecting cortical, thalamic and hippocampal areas in our three pre-clinical models. Notably, executive function and processing speed are more likely to be affected in human stroke survivors and in vascular cognitive impairment and dementia generally, suggesting that cognitive tests of memory such as NOR may not be the most sensitive for detection of PSCI. In contrast, despite low sample size across groups, we saw a promising signal from the BM which primarily targets spatial working memory and processing speed. Of note, we did find evidence of hippocampal neuronal perikaryal damage, suggesting that neuronal function may be impacted after stroke, despite no overt differences in NeuN coverage. This has been observed in the DH model, where hippocampal LTP is normal at 1 week after stroke, then progressively worsens up to 12 weeks after stroke, and is associated with a cognitive deficit. 22 However, there was no relationship between the presence of hippocampal neuronal injury or changes in thalamic NeuN density and NOR performance in this study. It is possible that there may be more subtle cognitive changes in the chronic phase after stroke, which require more precise cognitive testing to capture. To this effect, future multi-site studies should test additional cognitive tasks. Where feasible, the incorporation of additional modalities such as electrophysiology, functional-MRI and calcium imaging may improve detection of changes in functional neural substrates/surrogates underpinning cognitive processes after stroke in pre-clinical models.
Experimenter experience also likely contributed to high variability across sites for all cognitive testing, which likely impacted our ability to detect cognitive changes after stroke. Anecdotal observations across network labs have supported the idea that the ability to detect a cognitive deficit after stroke can vary by experimenter, even when other variables such as surgeon, testing environment and testing protocol are consistent. This may be due to sex differences in the experimenters, or additional intangible differences in experience and demeanor which impact the variability in outcomes of cognitive testing experiments.48–50 To overcome these confounding factors when testing at multiple sites, automated tasks that reduce experimenter input such as touch-screen chambers may improve detection of pre-clinical PSCI. Nonetheless, in the specific case of NOR testing here it is unlikely experimenter differences were consequential as there was a consistent lack of PSCI detected across all sites and models.
To our knowledge, we are the first network to assess cognition after stroke with a multi-site design. We were limited, however, in our ability to incorporate both sexes and additional comorbidities such as hypertension and obesity which are common in human stroke survivors.51,52 Mouse strain and substrain have effects on behavior 53 as well as on ischemic susceptibility. Although we are unable to quantify differences in cognitive outcomes between substrains in our current study as we did not standardize across the six participating sites, this remains an important consideration for future multi-site studies, as incorporating additional (sub)strains may improve translatability.54–57 Finally, anesthetics varied between sites and likely contributed to increased variability between sites, as they may impact cognitive outcomes.58,59 Despite this, we successfully demonstrated that it is possible to reproduce consistent infarct sizes and pathological damage at multiple sites utilizing the same stroke models. Another major strength of our study is the use of two behavioral tasks that assess complementary cognitive domains across pre-stroke, sub-acute and chronic phases. Together, our results suggest that a multi-site design of this kind enables the incorporation of a greater range of stroke models and increases statistical power, generating useful heterogeneity which would be unachievable within an individual lab.
Our findings highlight a need to manage expectations for detection of pre-clinical PSCI using commonly applied behavioral paradigms, which should be carefully considered based on the study’s primary aims. We have provided a ranked list (Supplemental Table 2) of readily used cognitive tasks and compiled our recommendations into a framework for future testing of PSCI (Figure 6). This framework was designed to ensure rigor and reproducibility with a goal of improving translatability. We have prioritized tasks that mice are able to perform at baseline, prior to any surgical intervention, to reduce the required animal number for the experiment and maximize statistical power. We next consider whether the task can detect a cognitive deficit after stroke at a single site, and then whether it can be replicated at an additional site with the same effect size. Finally, our framework incorporates additional refinements to reduce experimenter time and improve statistical power such as automating the task and reducing inter-rater variability. In addition, we recommend that (a) all sites incorporate handling and habituation, (b) repeat testing is minimized, (c) exclusion criteria are defined a priori, (d) power calculations are performed after baseline testing, prior to surgical or pharmacological intervention, and (e) multiple tests are incorporated to better evaluate different cognitive domains in sub-acute and chronic phases of stroke recovery. Finally, we advise against the use of NOR in a multi-site design, particularly in aged mice, due to the high number of exclusions we observed. We instead encourage future testing of the BM, as animals successfully perform the task prior to stroke, and it detected a cognitive deficit at one tested site. Barnes maze can also be successfully undertaken in older mice and is not affected by repeated testing. Therefore, the next step in our framework is to successfully validate this task at multiple sites with a similar effect size. Other tasks listed in Figure 6 and Supplemental Table 2 should also be tested to see if they meet these same criteria.

Framework for iterative testing of candidate cognitive behavioral tasks in future testing of post-stroke cognitive impairment. To ensure robust, rigorous detection of cognitive decline after stroke, we have developed a framework for future multi-center trials.
In conclusion, we have demonstrated the inherent challenges in assessing PSCI in pre-clinical stroke using community-established models and cognitive tasks. The NOR task is highly variable, poor performance prior to any surgical interventions leads to a high number of exclusions (which is exacerbated in older mice) and did not detect PSCI in any model at any site. In contrast, the BM is less variable and exclusions due to inability to perform the task prior to stroke are negligible. Given the promising single-site data from the dMCAO model, the BM warrants testing more broadly. Our findings are informative for the development of multi-site trials to test potential therapeutic interventions and in providing individual labs with important questions to consider when choosing which cognitive tasks may be most suitable in pre-clinical stroke studies. We expect this work will enable the field to move towards well-designed pre-clinical studies that are rigorous and reproducible and lead to critically needed therapies to prevent PSCI.
Supplemental Material
sj-docx-1-jcb-10.1177_0271678X251400241 – Supplemental material for Assessing post-stroke cognition in pre-clinical models: Lessons and recommendations from a multi-center study
Supplemental material, sj-docx-1-jcb-10.1177_0271678X251400241 for Assessing post-stroke cognition in pre-clinical models: Lessons and recommendations from a multi-center study by Gaia Brezzo, Kristy A Zera, Dana E Straus, Jennifer E Goertz, Sanna H Loppi, Rachel R Crumpacker, Jennifer B Frye, Danielle A Becktel, Maria Isabel Cuartero, Alicia García Culebras, Claudia Dames, Daniel Berchtold, Jill H Fowler, Andreas Meisel, Josef Anrather, Maria Angeles Moro, Stuart M Allan, Kristian P Doyle, Marion S Buckwalter and Barry W McColl in Journal of Cerebral Blood Flow & Metabolism
Supplemental Material
sj-docx-2-jcb-10.1177_0271678X251400241 – Supplemental material for Assessing post-stroke cognition in pre-clinical models: Lessons and recommendations from a multi-center study
Supplemental material, sj-docx-2-jcb-10.1177_0271678X251400241 for Assessing post-stroke cognition in pre-clinical models: Lessons and recommendations from a multi-center study by Gaia Brezzo, Kristy A Zera, Dana E Straus, Jennifer E Goertz, Sanna H Loppi, Rachel R Crumpacker, Jennifer B Frye, Danielle A Becktel, Maria Isabel Cuartero, Alicia García Culebras, Claudia Dames, Daniel Berchtold, Jill H Fowler, Andreas Meisel, Josef Anrather, Maria Angeles Moro, Stuart M Allan, Kristian P Doyle, Marion S Buckwalter and Barry W McColl in Journal of Cerebral Blood Flow & Metabolism
Footnotes
Acknowledgements
We would like to thank Lizi Hegarty, Michael J D Daniels, Tawaun Lucas, Elizabeth Mayne and Li Zhu for their assistance in sample collection. We would also like to thank Amy Bradley for her assistance with network administration, data organization and storage. Experimental schemes were created with Biorender.com.
Author contributions
GB and KAZ designed and performed experiments, collected and analyzed data, and wrote and edited the manuscript with comments from all authors. DS, JEG, SHL, RC, JBF, DAB, MIC, AGC, CD, DB, and JHF performed experiments, data collection and analysis. AM, JA, MAM, SMA, KPD, MSB, and BWM secured funding and established the Stroke-IMPaCT network, designed experiments and edited the manuscript. All authors approved the final manuscript.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Buckwalter has worked as a paid consultant for Roche, Genentech, and EMD Serrano. Otherwise, the author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Stroke-IMPaCT network was generously supported by the funding of a Leducq Transatlantic Network of Excellence award to AM, JA, MAM, SMA, KPD, MSB, and BWM (19CVD01). This work was also supported by a Knight Brain Resilience award (MSB), the American Heart Association/Allen Frontiers Group Brain Health Award (MSB), Alzheimer’s Association Research Fellowship (KAZ), Grant PID2022-140616OB-I00 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU (MAM). GB, DS, BWM are supported by the UK Dementia Research Institute through UK DRI Ltd., principally funded by the UK Medical Research Council. Funding was also provided by the National Institute of Neurological Disorders and Stroke grant RF1NS131110 (KPD) and NS132493 (JA). This work was also supported (AM, DB, CD) by the German Research Foundation (ME1562/4-1; SFB/TRR167: NeuroMac, project B12; Clinical Research Unit KFO 5023: BeCAUSE-Y, project B2). JHF was supported by an Alzheimer’s Research UK senior fellowship (ARUK-SRF2018B-005).
Ethical considerations
All animal protocols were performed in conformity with the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. Experimental procedures were approved by the appropriate animal care and use committees at each respective institution, and were performed under relevant personal and project licenses.
ORCID iDs
Data availability statement
All data is available upon request from the corresponding authors.
Supplemental material
Supplemental Material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
