Abstract
Evidence-based severity assessment is essential as a basis for ethical evaluation in animal experimentation to ensure animal welfare, legal compliance and scientific quality. To fulfil these tasks scientists, animal care and veterinary personnel need assessment tools that provide species-relevant measurements of the animals' physical and affective state. In a three-centre study inter-laboratory robustness of body weight monitoring, mouse grimace scale (MGS) and burrowing test were evaluated. The parameters were assessed in naïve and tramadol treated female C57BL/6J mice. During tramadol treatment a body weight loss followed by an increase, when treatment was terminated, was observed in all laboratories. Tramadol treatment did not affect the MGS or burrowing performance. Results were qualitatively comparable between the laboratories, but quantitatively significantly different (inter-laboratory analysis). Burrowing behaviour seems to be highly sensitive to inter-laboratory differences in testing protocol. All locations obtained comparable information regarding the qualitative effect of tramadol treatment in C57BL/6J mice, however, datasets differed as a result of differences in test and housing conditions.
In conclusion, our study confirms that results of behavioural testing can be affected by many factors and may differ between laboratories. Nevertheless, the evaluated parameters appeared relatively robust even when conditions were not harmonized extensively and present useful tools for severity assessment. However, analgesia-related side effects on parameters have to be considered carefully.
Behavioural assessment parameters should ideally be valid, specific and sensitive. Especially if applied for the statutory reporting of severity grades, as necessary in Switzerland and the EU, an intra- and inter-laboratory robustness and comparability of assessment methods is of utmost importance. Small differences between laboratories should not affect test read-outs strongly. Otherwise, general statements on the severity of specific procedures will be hardly possible. Nevertheless, recent reports indicate problems with reproducibility and replicability of behavioural tests. Several multi-centre studies that assessed laboratory mouse behaviour using standardized test protocols and comparable test conditions conclude that, while several behaviours seem to be robustly exhibited by mice in different laboratories, for example home cage activity, 1 results of certain behavioural tests differed distinctly between laboratories.2–4 However, some authors conclude that even though the absolute test results varied between laboratories, detected qualitative differences between treatment groups were consistent in different laboratories. These authors concluded that behavioural testing remains reliable if appropriate measures for standardization are applied and suitable controls are included. 2 As underlying causes for the observed discrepancies in behavioural read-outs between laboratories differences in test conditions, for example, testing order, time of test, type of test apparatus, testing room, handling methods or experimenter sex, environmental conditions (e.g. season, lighting, humidity levels) or animal characteristics such as diet and microbiome or weaning age and litter size have been identified.1,3–6
In the presented study, we tested four simple indicators, used increasingly in rodent severity assessment, for their robustness. Mouse grimace scale (MGS), burrowing performance, water intake as well as body weight progression were assessed in naïve and analgesia-treated mice in three different laboratories (Hannover (H), Rostock (RO), Zurich (Z)). The MGS is a popular and validated tool to assess pain in mice based on the changes of facial expression. 7 Based on the outcome of the initial validation, it is applied mainly in acute pain states such as the post-surgical phase. Burrowing is a more complex, spontaneous behaviour that has been shown to be an easy to apply tool to assess brain damage, neurodegenerative diseases or to monitor sickness but is also known to be affected in several stressful or painful conditions in rats and mice.8–10 The burrowing test is based on the species-specific behaviour of burrow digging rodents to displace items from tube like structures. 8 The monitoring of body weight and water intake changes is a classical tool to assess health status and overall wellbeing. Predefined reduction in weight is often used for the determination of humane endpoints (i.e. criteria of experiment termination).11,12
Side effects of analgesia on behaviour and body weight of rodents have been described repeatedly. 13 In the present study, we applied the opioid analgesic tramadol, which is administered for the treatment of moderate to severe pain, both acute and chronic, in various species, including humans and rodents. 14 Even though tramadol has low bioavailability, plasma levels of tramadol and its active metabolite mono-Ο-desmethyltramadol (M1) are stable during oral self-consumption in mice.15,16 Tramadol is, therefore, of interest for the use in stress-free, oral analgesia protocols.
For conducting the study all three laboratories agreed on a common experimental schedule, common operating procedures (SOPs) for behavioural measurements and drug delivery as well as the same sex and strain of the animal subjects. All experiments were conducted between July and September 2018. No other standardization measures were implemented. The project design therefore resembled a real-life scenario with several independent laboratories following published SOPs, rather than a systematic multi-centre approach with major efforts for prior standardization and harmonization.
We hypothesized that results of all tested severity assessment measurements are comparable between laboratories and that opioid analgesia affects test results.
Ethics statement
All experiments were in accordance with the European Directive 2010/63/EU of the European Parliament and of the Council on the Protection of Animals used for Scientific Purposes. The studies were conducted in accordance with the Swiss and German law for animal protection.
Animals and methods
A detailed description of animals and methods is given in the supplemental material.
Animals
Adult female C57BL/6J mice were used for all experiments.
Standard housing conditions
During habituation, mice were housed in groups of four. Environmental conditions differed in the laboratories (see supplemental material).
Experimental protocol
Experimental schedule after habituation. Mouse grimace scale (MGS), body weight, water intake and burrowing behaviour were assessed daily.
Data acquisition
Animals were housed in groups but separated overnight for each burrowing test during baseline and experimental measurements.
For the burrowing test each animal was provided with a tube-like apparatus filled with pre-weighed food pellets (200 ± 10 g) 2–3 h before the beginning of the dark phase. The filled tube was weighed after 2 h and at the end of the dark phase (12 h) to assess the amount of removed food pellets. 8
Body weights were assessed daily with a precision scale. Water intake was measured in group-housed as well as single-housed animals by weighing the drinking bottles on a precision scale.
For assessment of the MGS, mice were filmed in polycarbonate boxes. The mice were allowed to acclimatize in the boxes for 2 min and then filmed for 5 (Z), 12 (H) or 30 min (RO). The pictures were grabbed via screen shot by an automatic frame production and selection software, producing at least 5 (max. 8) clear pictures per animal per time point and the acquired frames were automatically randomized.17,18 Pictures were scored according to the scoring scheme described by Langford, MGS means were calculated per animal. 7
Statistical analyses
Intra-laboratory results
Intra-laboratory data were analysed using a linear mixed-effects regression model with experimental regimes (days) as fixed effects to control for repeated measures. General p values were obtained by ANOVA and post-hoc analysis. Further details on mathematical proceedings are shown in the supplemental material.
Inter-laboratory comparisons
Inter-laboratory data were analysed by group-wise comparisons of measured values. Each time frame was analysed using ANOVA and subsequent post-hoc tests. 95% confidence intervals (CIs) were plotted to show evidence-based differences in inter-laboratory findings. Further details on mathematical proceedings are shown in the supplemental material.
Results
Intra-laboratory results
Body weight
In Z, a significant loss in body weight was observed after the start of tramadol administration on day 4 (2.6%; p < 0.01) followed by an increase to bsl level by day 6 (Figure 1(a)). In RO animals also demonstrated significant weight reductions beginning from day 4 (4.7%; p < 0.001), followed by an increase and restoring bsl on day 7 (Figure 1(b)). In H, mice showed a significant drop in body weight beginning from day 5 (2.3%; p < 0.01), which further decreased until day 6 (3.2%; p < 0.001) compared to bsl. After tramadol treatment was stopped, body weights reached bsl levels on day 7 (Figure 1(c)).
Percentage changes in individual body weights during tramadol administration compared to baseline (bsl) in (a) Z, (b) RO and (c) H. Grey background: sucrose in drinking water; white background: tramadol and sucrose in drinking water. Significance: p ≤ 0.05 (*), p ≤ 0.01 (**) and p ≤ 0.001 (***).
MGS
MGS values did not differ significantly between bsl and tramadol treatment in all three groups (Figure 2).
Mouse grimace scales during tramadol administration compared to baseline (bsl) in (a) Z, (b) RO and (c) H. Black dots represent outliers. Grey background: sucrose in drinking water; white background tramadol and sucrose in drinking water. No significance was found.
Burrowing behaviour
Percentage of pellets removed after 2 h was low and variable during baseline in all three groups. During the adaption phase of mice in Z, 2 h burrowing behaviour reached only 34% (Figure 3(a)) and 95% during 12 h burrowing (Figure 3(b)). With beginning of tramadol treatment on day 3, 2 h burrowing behaviour significantly increased up to 80% (p < 0.001) and remained increased compared to bsl until day 6 (63%) of the experiment. Burrowing behaviour on days 4 (p < 0.05) and 6 (0.01) was significantly lower than on day 3 (Figure 3(a)). Burrowing behaviour overnight (12 h) showed a significant reduction on day 7 (82%) of the experiment when compared to bsl (p < 0.05), day 4 (p < 0.01) and day 6 (p < 0.05) (Figure 3(b)). In RO burrowing activity during the 2 h period was low (Figure 3(c)). The percentage of pellets removed overnight (12 h) did not exceed more than 50%. Although the animals burrowed generally less, there was a significant decrease on days 4, 5, 6 and 7 (p < 0.001) when compared to bsl (Figure 3(d)). During bsl mice in H removed 31% in the 2 h period. Similar to the results from Z, there was a significant increase of burrowing on day 3, 4, 5 and 6 (p < 0.05 to p < 0.01) when compared to bsl (Figure 3(e)). The overnight measurements showed 100% burrowed pellets with no significant differences to baseline (Figure 3(f)).
Percentage of pellets removed from burrowing apparatus during tramadol administration compared to baseline (bsl) after 2 h in (a) Z, (c) RO and (e) H and overnight (approx. 12 h) in (b) Z, (d) RO and (f) H. Grey background: sucrose in drinking water; white background tramadol and sucrose in drinking water. Significance: p ≤ 0.05 (*), p ≤ 0.01 (**) and p ≤ 0.001 (***).
Drinking water intake
The individual water intake per hour (overnight) was 0.73 ± 0.22 ml (Z), 0.4 ± 0.05 ml (H) and 0.81 ± 0.08 ml (RO) at baseline (day 1–2) and 0.66 ± 0.1 ml (Z), 0.4 ml ± 0.07 (H), and 0.68 ± 0.06 ml (RO) during tramadol treatment (day 3–6). No statistical analyses were performed as bottle weights were affected by technical problems in some laboratories.
Inter-laboratory comparisons
Percentage change in body weight was significantly different between the laboratories from day 4 to 6 (p < 0.05; p < 0.001) but not significantly different on day 7. Differences between individual laboratories were present when the CIs of the mean/median crossed 0, so that on day 4 Z to H was not different, on day 5 RO to H and Z to H and on day 6 RO to H (Figure 4(a)). Despite the fact that MGS was not different within the labs, medians were significantly different at bsl (p < 0.001) and on days 5 (p < 0.001) and 6 (p < 0.05) (Figure 4(b)). At all time points in comparison of all laboratories 2 h and 12 h measurements of burrowing testing were significantly different (p < 0.001). However, when comparing each laboratory against the other, differences were present between Z and H on days 3 and 4 for the 2 h test (Figure 4(c)) and at all other time points for the 12 h test (Figure 4(d)).
Inter-laboratory comparisons of the 95% CIs of (a) body weight (BWC), (b) median scores of the MGS, (c) burrowing 2 h and (d) burrowing 12 h. Grey: significantly different; white: not significantly different. bsl: baseline. Significance: p ≤ 0.05 (*), p ≤ 0.01 (**) and p ≤ 0.001 (***).
Discussion
Assessment methods used for the statutory reporting of severity grades should ideally deliver robust and comparable results in and between laboratories. Multi-centre approaches may provide robust validation and evidence of reproducibility of behavioural measurements as, for example, shown for the use of burrowing behaviour as a pain indicator in rats. 10 Here, we present results of a three-centre study resembling a real-life scenario with several independent laboratories following published test SOPs, rather than a systematic multi-centre approach with major efforts for prior standardization and harmonization.
As one of the most frequently used and objective clinical parameters, body weights were monitored throughout the experiments showing decreases in all three laboratories when tramadol was administered. However, body weights recovered immediately in all laboratories when tramadol was not administered anymore. Highest decrease in body weight of 4.7% was observed in RO and lowest in Z with 2.6%. Given that percentage body weight decreases are important evaluation criteria in experimental score sheets and frequently used as termination criteria or humane endpoints,19 it is advisable to consider this adverse effect of analgesia treatment when planning experiments, designing score sheets and performing severity assessment. Nausea and emetic effects constitute well-known adverse effects of tramadol in clinical use. 20 The associated discomfort and lack of appetite can result in indirect effects of tramadol administration on food intake and body-weight development as also described for other opioids in mice. 13 While we cannot rule out that the repeated testing of our mice induced a stress response and consequently reduced body weight, the rapid recovery after termination of tramadol administration renders an adverse effect of the analgesic drug more likely. Moreover, tramadol is of bitter taste and body-weight reduction can be a result of decreased water intake. Water intake was difficult to analyse due to technical problems in some laboratories (i.e. water loss during manipulation of the cages). Nevertheless, the water intake of 4.8–8.16 ml per night, suggests a therapeutic dosing of tramadol, that is sufficient for an estimated serum concentration high enough to provide pain relief. 15
Analgesia is one of many experimental interventions applied to laboratory mice and one should be aware of its potential side effects. In addition to its analgesic potency, tramadol exerts antidepressant-like effects that have been attributed to the effects on monoamine uptake. Respective effects have been reported based on studies evaluating the impact of tramadol on behavioural patterns in different paradigms, for example, forced swimming test.21,22 Thus, it was of particular interest to study the impact of tramadol on behavioural parameters. The pain-indicating MGS was analysed in order to assess a respective impact of tramadol on pain assessment parameters. In all laboratories MGS was not affected by tramadol or by repeated testing procedures, which is comparable to the results of a study investigating buprenorphine. 23 In contrast, in another study, also using tramadol via drinking water after an initial injection of tramadol, a slight increase of MGS was observed. 16
Assessment of burrowing behaviour, another common pain-indicator, showed comparable results in Z and H but not RO. In naïve animals burrowing activity is expected to be high and test bottles are normally empty or nearly empty after a 12 h test period as observed in Z and H, while animals that suffer from pain are expected to leave more material in the bottle after 12 h. 8 Animals in RO showed surprisingly little burrowing behaviour throughout the first 2 h of the testing and displayed significant lower performance over night testing than in the other locations. A factor that might contribute to this discrepancy is the burrowing apparatus. Whereas Z and H used a bottle with a volume of 250 ml, RO used a bottle with a volume of 900 ml. In the underlying SOP the length of the bottle and the diameter of the opening were fixed, but not volume or diameter of the bottle itself. Taking into account, that the diameter of the burrowing apparatus has been confirmed as a critical factor in previous studies, 8 the difference in test performance can probably be attributed to this factor. Another confounding factor arises from a low and variable burrowing behaviour during the days intended for bsl acquisition demonstrated by mice in Zurich and Hannover. Mice were not habituated to single housing prior to baseline measurements. Therefore, the adaption phase was probably not long enough and single housing may have influenced burrowing behaviour. Adequate adaptation time, considering all aspects of handling and housing, as well as adequate assessment time points seems to be crucial when using the burrowing test.
Overall, mean differences of parameters were significantly different in the inter-laboratory comparison. Whereas least differences were detectable for CIs in body weight and median scores of MGS, the most pronounced deviations were determined for CIs of the burrowing test.
However, when comparing only two out of three laboratories, there are also similarities between the datasets from Z and H and Z and RO regarding body weight and MGS. Looking at the burrowing datasets there were only similarities between Z and H.
It seems that some quantitative variation across laboratories is detected in most multi-centre studies, which has not necessarily to compromise overall differences and qualitative conclusions. 2 Additionally, some behaviours seem to be less affected by environmental factors than others.1,24 Several authors highlight the impact of even minor changes in environmental factors on the outcome of animal-based experimental research.25,26 Nevertheless, many sources of laboratory-related variability remain unidentified and the relative impact of factors is still unclear. 5 Some authors,27,28 therefore, argue for more standardization in behavioural studies. In this context, it also needs to be considered that highly standardized experiments may represent ‘local truths’ with little external validity. 3 The negative implications of this ‘standardization fallacy’ problem have been intensely discussed. 29 As we have not harmonized factors like animal breeder, housing or handling between the three laboratories, we cannot conclude on the potential impact of these factors on our results.
In conclusion, our study confirms that results of behavioural testing can be affected by many factors and may differ in various laboratories. Nevertheless, the evaluated parameter appeared relatively robust even when not harmonized extensively. One can assume that when these tests are used for the evaluation of pain or stressful experimental procedures effect sizes are increased, and inter-laboratory differences become less prominent. Therefore, these tests present useful tools for severity assessment. Furthermore, analgesia-related side effects on parameters have to be considered carefully.
Supplemental Material
LAN881481 Supplemetal Material - Supplemental material for A safe bet? Inter-laboratory variability in behaviour-based severity assessment
Supplemental material, LAN881481 Supplemetal Material for A safe bet? Inter-laboratory variability in behaviour-based severity assessment by Paulin Jirkof, Ahmed Abdelrahman, André Bleich, Mattea Durst, Lydia Keubler, Heidrun Potschka, Birgitta Struve, Steven R Talbot, Brigitte Vollmar, Dietmar Zechner and Christine Häger in Laboratory Animals
Footnotes
Acknowledgement
The authors would like to thank Margarete Arras for her valuable support.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This study was supported by the Deutsche Forschungsgemeinschaft (DFG research group FOR 2591, grant number: JI 276/1-1, ZE 712/1-1, VO 450/15-1 and BL 953/10-1).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
