Sage Journals: Discover world-class research

Abstract

German

Spanish

French

Assessing laboratory animals’ welfare – their current and/or past subjective affective states – is essential for ethical and regulatory reasons (and central to biomedical research into, for example, pain, nausea or anxiety). But this is challenging; and in the quest for quantification (and perhaps simplicity), it can be tempting to overlook construct validity. Nevertheless, that our indicators have good construct validity – that is, they accurately reflect the construct or concept of interest – is essential. This is true whether we are interested in short-term emotions like fear, longer-term mood-like states such as malaise, or markers of cumulative stress over a project or even a lifespan. Without it, welfare assessments risk being incorrect: inaccurate and unhelpful for the animals they aim to evaluate and assist. Here (summarising text from a forthcoming edited book), I introduce five validatory tests, as well as highlighting the importance of considering indicators’ responsiveness/sensitivity and selectivity/specificity. I also outline how these principles could help improve the construct validation of both humane endpoints and retrospective severity assessments. Careful construct validation can never fully solve the ‘Other Minds’ problem: that animals’ subjective experiences are private (such that we can never measure them, only infer them). However, done well, construct validation would add additional logical rigour to laboratory animal welfare assessment, increase its accuracy, and make benchmarking (e.g. severity classification) more transparent.

Keywords

Animal use ethics and welfare distress laboratory animal welfare public policy refinement

Assessing laboratory animals’ welfare – their current and/or past subjective affective states (Table 1) – is essential for ethical and regulatory reasons (and central to biomedical research into, for example, pain, nausea or anxiety).^1–3 This is challenging, however. Animals cannot tell us how they are feeling, and relevant indicators – measurable variables from which to infer non-measurable feelings – can vary across states (Table 1), as well as across species, strains, age classes and welfare challenges (e.g. specific diseases being modelled).^4-7 Consequently, any method yielding seemingly objective values with apparent ease can be appealing (be this quick cage-side checks, simple behavioural tests, automated readouts from sensors, or impressive-looking ‘composite scores’). But in the quest for quantification (and perhaps simplicity of assessment), construct validity must not be overlooked.

Table 1.

Negative affective states and related constructs relevant to laboratory animal welfare.

Negative affective state (or related construct): definitions and examples⁸			Potential indicators of presence/intensity ⁸
Affective state	Definition	Examples (as reported by humans, and inferred in other species)	Potential indicators of presence/intensity ⁸
Emotions (including homeostatic/primordial emotions induced by homeostatic needs, sickness or injury)	Acute states closely tied to specific rewarding (preferred) or punishing (aversive) external or internal events	• Fear • Pain • Hunger • Nausea • Thirst	• Avoidance/escape • Changes in heart rate • Vocalisations • Facial expressions • Postures
Moods (including long-term states induced by sickness or injury)	Longer-term states that, for human subjects, may last from hours to weeks, feel as if they have no obvious immediate cause, and change the readiness with which subjects experience positive or negative emotions	• Malaise • Anxious moods • Depressed moods	• Judgement biases • Propensity to freeze/startle • Reactivity to noxious stimuli • Apathy • Changes in coat/body condition • Susceptibility to disease, both infectious and non-infectious
Affective disorders	Prolonged, disproportionately negative affective states that are hard to reverse	• Generalized anxiety disorder • Post-traumatic stress disorder • Major depressive disorder	• As for moods, but more severe • Anhedonia • Abnormal repetitive behaviours
Temperament (affective personalities)	Biological predispositions to exhibit particular affective states: traits, potentially stable over the lifetime	• Genetic, developmental or lesion-induced models of anxiety, depression or hyperalgesia	• As for moods or affective disorders, but stable over lifespans
Cumulative affective experience, or cumulative adversity	The summed experience of emotions, moods and affective disorders across a prolonged period, e.g. a lifespan	• Poor/good quality of life • A life (not) worth living • Cumulative suffering or severity	• As for affective disorders • Hippocampal volume loss • Premature senescence • Physiological markers of cumulative ‘wear and tear’ including high allostatic load (a composite of stress-/ageing-sensitive measures) and shortened telomeres

Construct validity means how accurately a metric reflects a construct or concept of interest (such as how well a cognitive test score reflects intelligence, for instance).^9,10 When assessing animal welfare – regardless of whether our construct of interest is, say, fear, contentment, or quality of life – the indicators we use must thus reflect ‘“ground-truth” – the state that the animal is actually in’,¹¹ yielding ‘numbers that really do reflect the welfare as experienced by the animals’.¹² Significant consequences can result if they do not. Using affect indicators of questionable validity may, for instance, help explain why few new therapies for humans have emerged from animal-based biomedical research on pain^13–15 and depression.^16–18 Using affect indicators of questionable validity also risks welfare assessments being incorrect and unhelpful for animals.^10,19 Construct validation is thus essential for ensuring that metrics mean what we hope they do. Happily, validatory methods are described in affect-focused biomedical studies, animal welfare research, veterinary research, and psychological research on human and animal emotions.¹⁰ Five key methods emerge in these literatures: validation tests that potential indicators should pass (cf. the negative state indicators in Table 2).

Table 2.

Five tests for the construct validation of indicators of negative affect (poor welfare), with some laboratory animal-relevant examples. Each test relies on different assumptions, and so indicators can be used with greater confidence the more tests they pass.

Test	Methodology	Key underlying assumptions	Relevant examples
1. Using humans as models for other species	Assess measurable signs in humans self-reporting the negative state of interest	Biological homology between humans and other species	Validating some indicators of canine nausea (e.g. hyper-salivation, groaning, high plasma levels of Substance P), by studying correlates of human nausea²⁰
2. Exposure to known or aversive stimuli	Assess measurable signs in animals exposed to stimuli or contexts they avoid if given a choice	Avoidance (e.g. struggling, flight, learned aversions, learned escape responses) reflects negative affect	Validating immobility and thigmotaxis as negative affect indicators in zebrafish, by increases in these behaviours in tank designs this species finds aversive²¹
3. Exposure to fitness-threatening stimuli	Assess measurable signs in animals exposed to circumstances that would threaten their (ancestors’) fitness in the wild	Animals evolved to feel negative affect in response to threatened fitness; our subjects’ brains retain this legacy today	Validating stereotypic bar-mouthing as a welfare indicator in mice, by its elevation after premature loss of the mother²²
4. Pharmacological validation	Assess measurable signs in animals given affect-modulating drugs	The drugs influence affective states in the ways that we think they do	Validating Elevated Plus Maze behaviour (time in the closed arms ) as evidence of anxiety, by its reduction by anxiolytic drugs²³
5. Validation by other indicators	Assessing whether novel measurable signs covary with indicators of affect already validated in our species	The indicators relied upon are validated	Validating ‘grimace’ scales for scoring pain, by their correlation with prevalidated indicators (e.g. tooth-grinding, lameness, inactive hunched postures)^24,25

Along with passing validatory tests, ideal indicators should be highly responsive, sensitively reacting to all relevant changes in affect (even subtle ones) in an incremental manner.⁹ They should also be highly selective, only reflecting the specific affective states we wish to assess. Sadly, perfectly ideal indicators do not exist. But understanding the properties of those we have can help identify the best metric (or combination) for a given task. As Dawkins²⁶ put it, this is like ‘Be[ing] aware of the limitations of your materials before you start building a house’. To illustrate, weight loss can be validly used to infer suffering in clinical models, but only if we appreciate a priori when it can be insensitive (e.g. in acute conditions where animals rapidly become moribund,²⁷ or where effects are masked by reduced activity or even changes that increase bodyweight like elevated corticosteroid levels²⁸ or ascites²⁹; see also Talbot et al.⁵). Such understanding can in turn reduce errors: failing to detect changes in affective states that are present (a.k.a. false negatives/false null conclusions), or mistakenly inferring changes in affective states that are not present (a.k.a. false positives/false leads).

Formal principles of construct validation could help improve humane endpoints in terms of both their validity and their humaneness. To serve as accurate proxies (e.g. triggering a study’s end or a subject’s removal), humane endpoints should statistically predict severe suffering. Studies developing these essentially use Test 5 (see Table 1 and Figure 1): data collected from animals subsequently assessed for clinical scores warranting euthanasia are retrospectively analysed to identify which potential indicators differentiate between subjects who will live or die (and perhaps also between experimental animals and healthy controls).^4–6,27,30 However, such studies are rare; they assume that clinical scores are valid; and, furthermore, the resulting ‘humane’ endpoints may still involve much suffering^6,30 (see also Figure 1). Together, this makes further research into humane endpoints essential, and meeting this need should arguably involve new, complementary validatory tests. For example, for animal models of disease, this could involve liaising with patient groups, doctors and human clinical researchers to identify measurable signs that precede severe suffering – even desires for medically assisted dying – in relevant human patients (cf. Test 1 in Table 2).

Figure 1.

Two (imaginary) indicators that could be used to identify future suffering and consequent endpoints.

Such principles should also inform severity assessment: an even more challenging task because indicators must reflect not just relatively more or less suffering (as in Figure 1), but particular levels (and ideally even their boundaries: see Figure 2). Directive 2010/63/EU,³¹ for example, defines ‘mild’ as causing only short-term mild pain, suffering or distress (i.e. mild negative emotions cf. Table 1); ‘moderate’ as causing moderately negative emotions and/or longer-term negative states (e.g. negative moods, Table 1) that are only mild or only moderately impair overall condition (presumably via their cumulative impact); and ‘severe’ as causing severely negative emotions, and/or negative moods that are moderately to highly negative or severely impair overall condition. The Directive also lists types of procedure judged to fall within each category. However, it does not supply evidence for these judgements; for instance, it assumes that conventional caging is neutral, when instead this causes cumulative stress³²; and any such procedure-based approach risks overlooking practices that modify severity for individual animals (e.g. refinements in technique or analgesic use; animals’ temperaments; how handling styles affect fear of humans; how social buffering and housing quality affect resilience). Furthermore, even texts that include the welcome addition of animal-based welfare indicators (e.g. De Vleeschauwer et al.³³) still generally restrict these to clinical signs only (ignoring cognitive, physiological, immunological and behavioural signs of negative affect), as well as leaving opaque what makes something a sign of mild versus moderate versus severe impact. Thus as Reiber et al.⁷ summarise, the central problem is ‘we do not yet have . . . a gold standard combination of severity assessment parameters that reflects the actual truth about severity’.

Figure 2.

Two (imaginary) indicators useful for assessing severity in different ways.

Thinking formally about construct validation could again help by laying out logical, explicit frameworks to advance progress. For example, Test 1 again highlights the value of using data from relevant human patients, here for identifying measurable signs that reflect self-reported mild, moderate or severe reductions in quality of life. Tests 2 and 3 suggest merit in seeking indicators that differentiate between animals exposed to situations ranging from mildly to intensely aversive (Test 2), or from subtly to devastatingly harmful to fitness (Test 3). And Test 4’s pharmacological approach indicates another route: identifying whether affect-rectifying drugs influence indicators even at very low doses (as expected for indicators of mildly negative states) or only when doses are maximally high (as expected for indicators of very severe states).

Even the most careful construct validation will never fully solve the ‘Other Minds’ problem: that subjective experiences are private. However, done well it would add more logical rigour to laboratory animal welfare assessment by grounding this in sound biological principles, and by encouraging underlying assumptions to be made explicit. In turn this should reduce risks of false leads or false null errors, and make benchmarking (e.g. severity classification) more transparent and defensible. It could even increase the translatability of biomedical research to human patients (cf. Krock et al.³⁴, Gorman and Davies³⁵).

Footnotes

Acknowledgements

I would like to thank Mike Mendl, with whom many of these ideas were thrashed out over years (no, wait: decades); Gail Davies, Jess Cait and Aileen MacLellan for discussions on patient-centred approaches; and Anna Olsson for inviting this commentary and critiquing earlier drafts. The work was conducted on the ancestral lands of the Mississaugas of the Credit First Nation.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author thanks NSERC for progamme funding.

ORCID iD

Georgia Mason

Data availability and ethics approval

This commentary did not involve primary data (however, while still in press, copies of cited chapters can be provided on request); nor did this commentary involve animal use to require ethical approval.

References

Duncan

IJ.

Animal rights–animal welfare: a scientist’s assessment. Poult Sci 1981; 60: 489–499.

Mason

Mendl

MT.

Why is there no simple way of measuring animal welfare?

Anim Welf 1993; 2: 301–319.

Browning

If I could talk to the animals: measuring subjective animal welfare. Doctoral Dissertation, The Australian National University, Australia, 2020.

Morton

DB.

A systematic approach for establishing humane endpoints. Ilar J 2000; 41: 80–86.

Talbot

Biernot

Bleich

, et al. Defining body-weight reduction as a humane endpoint: a critical appraisal. Lab Anim 2020; 54: 99–110.

Franco

Correia-Neves

Olsson

IA.

How “humane” is your endpoint? – Refining the science-driven approach for termination of animal studies of chronic infection. PLoS Pathog 2012; 8: e1002399.

Reiber

von Schumann

Buchecker

, et al. Evidence-based comparative severity assessment in young and adult mice. PLos One 2023; 18: e0285429.

Mendl

Paul

Mason

Animal welfare, affective states and the Other Minds problem. In: GJ

Mason

Nielsen

Mendl

(eds) Assessment of animal welfare – a guide to the valid use of indicators of affective states. UFAW Animal Welfare Series. Oxford: John Wiley & Sons Ltd, In press, 2025.

Cronbach

Meehl

PE.

Construct validity in psychological tests. Psychol Bull 1955; 52: 281.

10.

Mason

Mendl

Measuring the unmeasurable: the construct validation of affective state indicators. In: Mason

Nielsen

Mendl

(eds) Assessment of animal welfare – a guide to the valid use of indicators of affective states. UFAW Animal Welfare Series. Oxford: John Wiley & Sons Ltd, In press, 2025.

11.

Mendl

Neville

Paul

ES.

Bridging the gap: human emotions and animal emotions. Affect Sci 2022; 3: 703–712.

12.

Browning

. Assessing measures of animal welfare. Biol Philos 2022; 37: 36.

13.

Cobos

Portillo-Salido

“Bedside-to-Bench” behavioral outcomes in animal models of pain: beyond the evaluation of reflexes. Curr Neuropharmacol 2013; 11: 560–591.

14.

Newell

Chitty

Henson

FM.

“Patient reported outcomes” following experimental surgery—using telemetry to assess movement in experimental ovine models. J Orthop Res 2018; 36: 1498–1507.

15.

Eisenach

Rice

AS.

Improving preclinical development of novel interventions to treat pain: insanity is doing the same thing over and over and expecting different results. Anesth Analg 2022; 135: 1128–1136.

16.

Harro

Animal models of depression: pros and cons. Cell Tissue Res 2019; 377: 5–20.

17.

Planchez

Surget

Belzung

Animal models of major depression: drawbacks and challenges. J Neural Transm 2019; 126: 1383–1408.

18.

Gencturk

Unal

Rodent tests of depression and anxiety: construct validity and translational relevance. Cogn Affect Behav Neurosci 2024; 24: 191–224.

19.

Watters

Krebs

Eschmann

CL.

Assessing animal welfare with behavior: onward with caution. J Zool Bot Gard 2021; 2: 75–87.

20.

Kenward

Pelligand

Savary-Bataille

, et al. Nausea: current knowledge of mechanisms, measurement and clinical impact. Vet J 2015; 203: 36–43.

21.

Blaser

Rosemberg

DB.

Measures of anxiety in zebrafish (Danio rerio): dissociation of black/white preference and novel tank test. PLoS One 2012; 7: e36931.

22.

Würbel

Stauffacher

Age and weight at weaning affect corticosterone level and development of stereotypies in ICR-mice. Anim Behav 1997; 53: 891–900.

23.

Pellow

Chopin

File

, et al. Validation of open:closed arm entries in an elevated plus-maze as a measure of anxiety in the rat. J Neurosci Methods 1985; 14: 149–167.

24.

Häger

Biernot

Buettner

, et al. The Sheep Grimace Scale as an indicator of post-operative distress and pain in laboratory sheep. PLoS One 2017; 12: e0175839.

25.

Paterson

O’Malley

Moody

, et al. Development and validation of a cynomolgus macaque grimace scale for acute pain assessment. Sci Rep 2023; 13: 3209.

26.

Dawkins

MS.

Animal suffering. The science of animal welfare. London: Chapman and Hall, 1980.

27.

Nemzek

Xiao

Minard

, et al. Humane endpoints in shock research. Shock 2004; 21: 17–25.

28.

Mallien

Pfeiffer

Brandwein

, et al. Comparative severity assessment of genetic, stress-based, and pharmacological mouse models of depression. Front Behav Neurosci 2022; 16: 908366.

29.

Morton

DB.

A model framework for the estimation of animal ‘suffering’: its use in predicting and retrospectively assessing the impact of experiments on animals. Animals 2023; 13: 800.

30.

Littin

Acevedo

Browne

, et al. Towards humane end points: behavioural changes precede clinical signs of disease in a Huntington’s disease model. Proc Biol Sci 2008; 275: 1865–1874.

31.

European Union. Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the Protection of Animals Used for Scientific Purposes. Council of Europe, Strasbourg. 2010.

32.

Cait

Scott

, et al. Conventional laboratory housing increases morbidity and mortality in research rodents: results of a meta-analysis. BMC Biol 2022; 20: 15.

33.

De Vleeschauwer

Lambaerts

Hernot

, et al. Severity classification of laboratory animal procedures in two Belgian academic institutions. Animals 2023; 13: 2581.

34.

Krock

Jurczak

Svensson

CI.

Pain pathogenesis in rheumatoid arthritis – what have we learned from animal models?

Pain 2018; 159: S98–S109.

35.

Gorman

Davies

. Patient and public involvement and engagement (PPIE) with animal research. Open Research Exeter, University of Exeter, http://hdl.handle.net/10871/132516 (2019, Accessed 1 June 2025).

Assessing laboratory animal welfare: the crucial importance of construct validity

Abstract

Keywords

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

Data availability and ethics approval

References