Abstract
Animal models in psychiatric research are indispensable for insights into mechanisms of behaviour and mental disorders. Distress is an important aetiological factor in psychiatric diseases, especially depression, and is often used to mimic the human condition. Modern bioethics requires balancing scientific progress with animal welfare concerns. Therefore, scientifically based severity assessment of procedures is a prerequisite for choosing the least compromising paradigm according to the 3Rs principle. Evidence-based severity assessment in psychiatric animal models is scarce, particularly in depression research. Here, we assessed severity in a cognitive depression model by analysing indicators of stress and well-being, including physiological (body weight and corticosterone metabolite concentrations) and behavioural (nesting and burrowing behaviour) parameters. Additionally, a novel approach for objective individualised severity grading was employed using clustering of voluntary wheel running (VWR) behaviour. Exposure to the paradigm evoked a transient elevation of corticosterone, but neither affected body weight, nesting or burrowing behaviour. However, the performance in VWR was impaired after recurrent stress exposure, and the individual severity level increased, indicating that this method is more sensitive in detecting compromised welfare. Interestingly, the direct comparison to a somatic, chemically induced colitis model indicates less distress in the depression model. Further objective severity assessment studies are needed to classify the severity of psychiatric animal models in order to balance validity and welfare, reduce the stress load and thus promote refinement.
Animal models in psychiatry are necessary to gain insight into the molecular and cellular mechanisms of behaviour and psychiatric disorders. Since distress is a common aetiological factor in psychiatric disorders, many animal models, in particular for depression, are based on physical, environmental or social stress. This raises the ethical dilemma of balancing scientific progress with animal welfare concerns. Therefore, it is important to grade and compare severity in scientific procedures in order to be able to choose the least discomforting alternative while maintaining efficacy. The principle to reduce, refine and replace (the 3Rs) 1 animal experiments whenever possible has been integrated into legislation in EU Directive 2010/63 and into good scientific practice (e.g. the ARRIVE guidelines). 2 Within the EU, the severity of every planned procedure needs to be classified in the project authorisation process. However, the current classification of severity levels into ‘mild’, ‘moderate’ and ‘severe’ is only partly based on scientific knowledge, since systematic attempts to assess severity are scarce, particularly for psychiatric animal models.
Therefore, we performed a multimodal severity assessment study on the stress-induced cognitive depression model of learned helplessness (LH), a widely used rodent model for depression with excellent face, construct and predictive validity. 3 It is based on exposure to inescapable electric foot shocks, which are classified with the highest severity score according to EU directive 2010/63. The reasons for this rating, however, are unclear; comparisons to somatic models are missing.
Hence, we analysed established parameters of impaired well-being in mice to detect the compromised welfare induced by the LH procedure, including physiological measures such as body weight loss and increased corticosterone release,4,5 and behavioural parameters such as nesting6–8 and burrowing.6,9 Additionally, we used the newly developed unsupervised k-means algorithm-based cluster analysis of body weight and voluntary wheel running (VWR) 10 to grade the level of severity of mice receiving foot shocks individually, and we compared these results with a somatic animal model of colitis.
Material and methods
Animals and housing
Eight-week-old male C57BL/6 N mice (Charles River Laboratories, Sulzfeld, Germany) were single housed in conventional macrolon cages (type II) with bedding, nesting material, tap water and food ad libitum (see Supplemental Material). The animals were pseudo-randomly assigned to the respective experimental group. Locomotion and pain thresholds were used to stratify mice into groups and to avoid confounding effects. All experiments had been approved by German animal welfare authorities (Regierungspräsidium Karlsruhe; 35-9185-81-G-199-17).
General behavioural assessment
All experiments were conducted within the first three hours of the dark phase – the active phase of the animals – unless specified further. Locomotion was assessed using the open field test for 10 minutes in a 50 cm × 50 cm arena, and the pain threshold was assessed according to the latency to react on a hot plate (see Supplemental Material).
Mice were assigned to the following groups: (a) home cage controls (C), which remained in the colony room throughout the testing phase; (b) handling controls (H), which entered the shock boxes but never received a shock; (c) non-trained controls (N), which were exposed to avoidable foot shocks only during the shuttle box task on day 3; and (d) trained (T) mice, which received unavoidable foot shocks during the first two days of LH training and on day 3 in the shuttle box (Supplemental Table S1). In experiment 3 (see below), we omitted the non-trained controls but introduced a chemically induced colitis group (DSS) as somatic controls. 10
Cognitive depression model
The LH paradigm was conducted as previously described. 11 Briefly, mice were transferred into chambers with stainless-steel grid floors. Trained subjects received 360 unpredictable and inescapable foot shocks (0.150 mA) on two consecutive days. On day 3, trained and non-trained control subjects were analysed for helpless behaviour in shuttle boxes. Each chamber contained a signalling light, which announced foot shocks (0.150 mA) for five seconds in one of the two compartments. The escape performance was analysed during 30 trials.
In three independent experiments, we assessed corticosterone release and typical behavioural indicators of well-being (e.g. nesting and burrowing), or we used VWR to classify severity in the different cohorts (an overview of time lines and groups in the respective experiments can be found in Supplemental Figure S1 and Table S2).
Experiment 1: faecal corticosterone metabolites
Forty-eight mice were assigned to the respective groups (see Supplemental Material). We sampled faeces to determine faecal corticosterone metabolite (FCM) concentrations before the onset (pre), during each training session (training days 1 and 2), during the shuttle box test (day 3) and one week after the LH (post LH). For each day, two samples were collected, representing (a) acute corticosterone levels during the foot shock exposure or sham treatment and (b) a delayed idle period to detect persistent effects. Samples were collected in a secondary home cage (see Supplemental Material) and were processed as described. 12 Briefly, an extract of dried and homogenised faeces was produced with 80% methanol, and an aliquot was analysed in a well-established and validated 5α-pregnane-3β,11β,21-triol-20-one enzyme immunoassay (EIA).4,5
Experiment 2: typical indicators of well-being
Twenty-eight mice were assigned to the respective groups. The nest test was performed before and three weeks after the LH procedure and the time to integrate novel material into the nest (TINT) 1 and 20 days afterwards. Burrowing was analysed 2, 8 and 21 days after the LH test.
Nest building was evaluated according to a rating scale based on cohesion and shape (see Supplemental Table S2). Additionally, we assessed the nest quality daily at 10:00 am to track this parameter throughout the experiment. In the TINT, sizzle material was introduced in the diagonally opposing corner of the nest site, and latency to integrate was measured for a maximum of 10 minutes. 6
We placed bottles (14 cm long × 5.5 cm Ø) filled with food pellets at the rear of the home cage one hour before the dark phase and observed the amount that was burrowed out of the bottle (% of total weight) after 6 and 24 hours. All mice were accustomed to the procedure one week before the LH procedure on four consecutive days. 6
Experiment 3: VWR
Forty-eight mice were assigned to the respective groups (see Supplemental Material). VWR was recorded daily at the beginning of the dark phase. To determine the steady-state running performance, an adaptation phase of 16 days was chosen before experimental onset (see Supplemental Figure S2). The LH procedure was performed on days 2–4. Due to malfunctions in the running wheel systems and the consequential imprecision, we decided to exclude unreliable results. Two cages from each group were affected.
The colitis group received 1% dextran sulphate sodium (DSS; mol wt 36,000–50,000; MP Biomedicals, Eschwege, Germany) for five consecutive days (days 1–5) and remained in the colony room.
Statistical analyses
Statistical analyses were carried out using IBM SPSS Statistics for Windows v24 (IBM Corp., Armonk, NY). The experimental unit was the single animal. Differences were considered to be significant at p ≤ 0.05. For more information, see Supplemental Material.
Results
FCMs reveal a short-lasting increase due to foot shocks
FCM samples as a physiological marker for stress varied significantly between different treatment groups (Figure 1). Acute FCM concentrations were elevated in trained mice on all three days of LH, while the merged handling group showed an exclusive effect on day 2 (acute: LH day 1, H(2) = 7.373, p = 0.025, post hoc trained vs. home cage p = 0.021; LH day 2, H(2) = 7.373, p = 0.025, post hoc trained vs. home cage p = 0.001 and merged handling vs. home cage p = 0.046; shuttle box, H(3) = 10.291, p = 0.016, post hoc trained vs. home cage p = 0.016). Overall, the FCM concentration of trained mice showed a prominent change over time in the acute sample (Friedman: N = 13, χ2(4) = 13.846, p = 0.008) with a peak on LH day 2 and normalisation after the procedure, while FCM concentrations of home cage controls remained unchanged. The persistence of the stress response was measured in the delayed samples and became significant exclusively on LH day 2 (Figure 2; H(2) = 8.482, p = 0.014, post hoc trained vs. home cage p = 0.012 and merged handling vs. home cage p = 0.066).
Concentrations of faecal corticosterone metabolites (FCMs) transiently increased after exposure to inescapable stress (trained), but also after only repetitive handling in the acute sample. Handling and non-trained were identically handled until the shuttle box test and are therefore merged until then. Data are given as boxplot diagrams showing the median (line within the box), 25% and 75% quartiles (boxes), 10% and 90% ranges (whiskers). *p < 0.05; **p < 0.01. In the delayed sample, concentrations of FCMs were transiently increased after repetitive exposure to inescapable stress, while handling and escapable shock did not cause any effects. Handling and non-trained animals were identically handled until the shuttle box test and are therefore merged until then. Data are given as boxplot diagrams showing the median (line within the box), 25% and 75% quartiles (boxes), 10% and 90% ranges (whiskers) and outliers (dots (1.5- to 3-fold interquartile range) or star (>threefold interquartile range). tp = 0.06; *p < 0.05.

Foot shocks do not affect typical well-being parameters
We analysed nesting quality, integration of nesting material, burrowing behaviour and body weight before, during and after the LH procedure (Figure 3). Repeated-measures analysis of variance did not reveal significant differences between the treatment groups or interactions over time, though some time effects became apparent for burrowing (time: F(3, 69) = 4.523, p = 0.006) and body weight (time: F(4, 92) = 125.561, p < 0.001).
Exposure to shocks did not alter behavioural parameters or body weight. Days of shocks are indicated in red on the time line. (a) Neither the nest tests one week before stress and three weeks after nor daily nesting scores during exposure revealed differences. (b) The time to integrate material into the nest (TINT) was similar in all treatment groups, as well as (c) burrowing. (e) Body weight was not affected. In (a), graphs represent the median and 95% confidence interval (CI). In (b)–(d), graphs represent the mean ± standard deviation (SD).
Severity evaluation using VWR behaviour-based k-means clustering does not indicate severe consequences of foot shocks
Daily body weight assessment and VWR performance were used to determine the current status of each mouse according to k-means clustering.
10
Mice in the DSS group mildly but significantly lost weight compared to the other groups from day 7 onwards (Figure 4). The effect was largest on day 9 (M = 6.8% (SD = 6.1%), F(3, 39) = 12.776, p < 0.001, post hoc DSS vs. home cage p < 0.001, vs. handling p < 0.001, vs. trained p < 0.001). Overall, the body weight changed during the experiment (time: F(13, 468) = 33.979, p < 0.001) and was influenced by DSS treatment (treatment × time: F(39, 468) = 5.932, p < 0.001; treatment: F(3, 36) = 4.930, p = 0.006, post hoc home cage vs. DSS p = 0.003). Foot shocks or handling did not affect body weight.
Body weight development (% change from baseline) during and after the respective treatments. Data are given as the mean ± SD.
VWR behaviour was sensitive to both the foot shock exposure and DSS treatment (Figure 5). During the three days of the LH procedure, the performance changed (time: F(2, 70) = 8.842, p < 0.001), and a general treatment effect was observed (treatment: F(3, 35) = 5.367, p = 0.004). The trained group differed significantly from the home cage (p = 0.008) and DSS (p = 0.014) group but not from the handling controls. However, the DSS effects were more distinct, with a mean reduction of 40.7% (SD = 23.1%) on day 8 compared to a mean of 64.0% (SD = 9.8%) following foot shocks on day 4. The DSS-induced reduction was significant throughout the procedure (time: F(13, 455) = 21.475, p < 0.001; treatment × time: F(39, 455) = 6.232, p < 0.001; treatment: F(3, 35) = 4.310, p = 0.011). In the pairwise comparison, a significant difference was only detected between the home cage and the DSS group (p = 0.008).
Voluntary running development (% change from baseline) during and after the respective treatments. Data are given as the mean ± SD.
The severity score determined by the k-means cluster analysis revealed that the majority of subjects only displayed a level 0 or level 1 severity during the three days of the LH procedure (Figure 6). Only 1/12 trained mice reached severity level 2 after each training session. On the other hand, the DSS treatment led into a shift to level 2 severity for up to 8/10 mice (day 8). Most mice recovered by day 12.
Voluntary wheel running plotted against body weight in k-means cluster analysis with cluster borders (dashed lines) at different dates of the experiment. Cluster borders separate regions of the graphs into severity level 0 (>87.37), level 1 (between 50.16 and 87.37) and level 2 (<50.16).
Discussion
The objective of this study was to perform an evidence-based evaluation of severity in the LH cognitive depression model. For that purpose, we used an established battery of physiological and behavioural tests to detect the magnitude of evoked stress and discomfort.
Our results demonstrate a significant increase in FCMs in the acute samples after exposure to escapable and inescapable shocks, but also a comparable elevation of FCM concentrations in non-shocked handling controls. Apparently, it was not only the foot shocks that triggered the stress response. Other influencing factors might be the transport to the experimental room, or potentially even more bearing, the introduction to the chambers per se. The chamber includes a metal grid floor, no bedding material and no possibility to build a nest, eat or drink. Although the animals are not physically restrained in the chambers and there is no illumination, we consider the chamber to be a stressful environment compared to a home cage setting. The surroundings somewhat resemble metabolic cages. The 2010/63/EU directive categorises short-term (<24 hours) exposure to metabolic cages as mild in severity. On day 2, repeated exposure to the stressful condition could have triggered a profound stress response due the previously established association.
However, only the recurrent exposure to inescapable shocks led to a significant increase in the delayed samples. FCMs have been found to be a sensitive parameter of adrenocortical activity, 13 and the lack of LH-induced effects might reflect that there was no difference in the magnitude of the acute stress response and that the treatments could be considered equal in this respect. However, caution is advised, as not every type of stressor may be reflected in measured glucocorticoids.14,15 Therefore, several physiological and behavioural parameters (i.e. nesting, burrowing or body weight) are typically measured to evaluate if well-being is compromised in stressed animals. Yet, none of the above-mentioned parameters was altered by avoidable or unavoidable foot shocks, although this procedure is typically considered a potent stressor and expected to induce key depressive-like behavioural features. Only when analysing VWR performance was a change in the trained group visible. Analysing this parameter on an individual level, two individuals reached severity level 2 according to the k-means cluster analysis during the LH procedure, although their body weight remained unaffected. Thus, the VWR-based assessment was more sensitive to compromised welfare than typical home cage observations.
Experiment 3 was designed to uncover and grade the impact of the LH model compared to the DSS-induced mild colitis model using the unbiased individual severity grading by k-means cluster analysis. 10 We chose the colitis model because this method was initially implemented in this model and allowed a direct comparison by replicating the previous study. Additionally, it offers a thoroughly described and evidence-based severity assessment8,10,16 and an approved severity classification according to the 2010/63/EU directive dependent on the doses and duration of the treatment. Here, we decided to use the moderate severity because it was sufficient to induce detectable impairments in the wheel running analysis and hence serves as a suitable positive control, although the inflicted pain is of a different modality in the respective models. While the pain is temporarily limited in the depression model, it is chronic in colitis. This might affect the running outcome, which could be a reason for the stronger impairment in colitis. The psychological load of depression could in general contribute less to the running performance than pain does. Further studies on running in mice with different emotional states would be necessary to clarify this question.
Here, we observed a larger fraction of subjects in severity level 2 in the colitis model compared to the LH depression model. The foot shock–induced effects rather resembled the milder effects of facial vein phlebotomy. 10 Hence, colitis comprises the highest strain and stress, while this LH protocol and blood sampling similarly comprise low levels of distress induction.
Assessing severity requires detailed consideration of potential stressors, which operate as confounding factors. Ideally, the stressed group of interest would be compared to a non-stressed control group. Avoiding stress in animal maintenance and handling is very challenging, especially in an experimental context. Circumstantial parameters (e.g. housing conditions) can be associated with a stress response or even impairments of well-being. 17 Single housing is often considered harmful to mice. 18 Yet, we decided to house our mice individually, since in our set-up male mice showed less burden. 11 Other factors needed to be taken into account. To ensure that the result of the LH was not confounded by altered pain threshold or locomotor differences, we needed to perform the respective experiments first. The brief exposure to a painful stimulus and the illumination during the dark phase in the tests can induce stress. Consequently, a ceiling effect might mask the differences between the treatment groups. We tried to avoid this by temporal separation of the previous test to the LH paradigm.
According to EU directive 2010/63, exposure to an inescapable electric shock is considered severe, while DSS-induced colitis is classified as mild to moderate at the dose applied here, and facial-vein phlebotomy is classified as mild. This constitutes a dilemma, since the evidence from the cluster analysis showed bigger fractions of DSS-treated mice with a level 2 burden than those subject to LH treatment. In relation, foot-shocked mice were less affected. An absolute assessment is, however, more difficult. Assuming that level 2 is equal to moderate severity, the fact that most shocked mice remained at level 1 during the LH procedure, which could be considered mild, indicates the mismatch of the current legal regulations, which are rather supported by anthropomorphic concepts than evidence-based severity. Therefore, more comparative studies are needed to assess and classify the severity of the LH model and other psychiatric animal models objectively.
Supplemental Material
LAN874831 Supplemental Material - Supplemental material for Systematic analysis of severity in a widely used cognitive depression model for mice
Supplemental material, LAN874831 Supplemental Material for Systematic analysis of severity in a widely used cognitive depression model for mice by Anne S Mallien, Christine Häger, Rupert Palme, Steven R Talbot, Miriam A Vogt, Natascha Pfeiffer, Christiane Brandwein, Birgitta Struve, Dragos Inta, Sabine Chourbaji, Rainer Hellweg, Barbara Vollmayr, Andre Bleich and Peter Gass in Laboratory Animals
Footnotes
Acknowledgements
We thank Katja Lankisch for her excellent technical support, Lydia Bussemer for her support in sample collection and Edith Klobetz-Rassam for EIA analysis.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: this work was supported by grants from the Deutsche Forschungsgemeinschaft (Forschergruppe 2591 ‘Severity assessment in animal based research’, project P05) to P.G., as well as the Ingeborg Ständer Foundation and the Research Fund of the UPK Basel to D.I.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
