Abstract
Single outcome measures often fall short of the sensitivity and objectivity expected under European Directive 2010/63, particularly in fast progressing disease models. To address this gap, the German Research Foundation consortium FOR2591 collated data from 55 routinely used models in six species and narrowed more than 50 candidate readouts down to a 15-parameter core panel spanning behaviour, physiology, biochemistry and imaging. Quantitative tools such as the Composite Measure Schemes, the endpointR and the Relative Severity Assessment algorithm fuse these multidimensional streams into more objective severity scores that outperform singular readouts such as body weight change and clinical scoring in detecting early distress and refine humane endpoint decisions. More than 10 external laboratories have already integrated the open-source toolbox, illustrating practical scalability. A consortium survey showed that no single metric is both widely applied and consistently valued, underscoring the need for multi-parameter monitoring. Ongoing Phase III analyses extract model specific digital fingerprints that trigger real-time risk alerts in home-cage systems. By providing a shared yet adaptable framework, FOR2591 charts a feasible path toward Directive 2010/63 compliance, improved model validity and individualized animal care, establishing a path of a more evidence-based severity assessment.
The assessment of laboratory animal well-being is undergoing a fundamental paradigm shift as new technologies challenge the suitability of traditional single-parameter monitoring techniques. 1 Current approaches often fall short of capturing the multifaceted nature of animal welfare, especially in rapidly evolving disease models where traditional markers may not be able to detect subtle welfare deterioration or predict critical endpoints as required by the European Union’s Directive 2010/63. 2
Recognizing these regulatory and scientific imperatives, the German Research Foundation established Research Unit FOR2591 – Severity Assessment in Animal-Based Research as a comprehensive eight-year initiative uniting up to 18 research groups across eight German and Swiss institutions (Figure 1(a)). Coordinated by Hannover Medical School and RWTH Aachen University, this national consortium was explicitly designed to develop quantitative, model-driven frameworks for converting heterogeneous welfare readouts into actionable severity assessments. The consortium’s systematic methodology involved two distinct and one ongoing funding phases: in Phase I, 55 animal models with over 50 parameters across six species (mice, rats, pigs, sheep, non-human primates, and African clawed frogs), with particular focus on neurological and inflammation/cancer/surgery disease models, were screened. Phase II refined this extensive dataset to 32 models, incorporating 15 key measures that integrated behavioural indicators (wheel-running, burrowing and nesting), physiological metrics (heart rate, temperature and activity), biochemical markers (corticosterone) and imaging parameters (computed tomography, magnetic resonance imaging and ultrasound) through sophisticated algorithms (Figure 1(b)). This refinement process underwent several internal iterative selection processes and stakeholder discussions, and was further refined through many peer-reviewed publications.

The consortium’s findings provide compelling evidence for the necessity of multi-parameter monitoring. In stark contrast to single-parameter measurements, multi-parameter integration demonstrates superior capability for capturing subtle welfare changes through sophisticated temporal dynamics analysis. Some parameters effectively flag early distress signals while others chart progression trajectories, and, leveraging their covariance structure, significantly enhance humane endpoint prediction accuracy. FOR2591 has developed quantitative tools loosely integrated in a Severity Toolbox (Figure 1(c)) to assist with such analyses. Tools such as the Composite Measure Schemes 3 for identifying and clustering severity-related parameters, endpointR 4 for predicting humane endpoints in time-series data, and the Relative Severity Assessment (RELSA) score 5 for high-dimensional data fusion to provide flexible support for severity assessment and which can be implemented by scientists in their daily work. More than 10 workgroups within and outside of FOR2591 have already adopted these tools, providing an edge in understanding how welfare impairments are structured in their experiments.
From these experiences, FOR2591 was able to determine that animal-model specific parameter patterns characterize welfare impairments. Such digital fingerprints can be used to monitor, classify and predict time-series data in animals. This technology is still part of the ongoing research in Phase III, but already carries the promise of advancing automated home-cage monitoring to a new level. Currently, the RELSA framework serves to enhance individual animal care, rather than reducing time, costs or compliance obligations. The multi-parameter approach enables adaptive, individualized monitoring protocols that respond dynamically to each animal’s specific physiological and behavioural needs, directly supporting the 3Rs principle of refinement while maintaining rigorous scientific standards. The consortium could show that the framework is instrumental in identifying individual animals at risk.6,7
In addition to the scientific advancements, a qualitative survey across FOR2591 research groups revealed that no single parameter achieved both high effectiveness and widespread applicability across diverse experimental models. Parameters were first evaluated experimentally, and the diagnostic performance was quantified, for example, with receiver operating characteristic analyses. The principal investigators then converted these results into a score of perceived effectiveness on a 0–4 scale. When these scores were plotted against how often the parameters were used, a conspicuous empty ‘sweet spot’ emerged (Figure 1(d)), highlighting the limitations of relying on single-parameter monitoring. Body weight exemplifies this paradox: while it proves sufficient for predicting humane endpoints in specific contexts, such as intracranial glioma models, weight changes often lag significantly behind actual welfare deterioration in rapidly progressing conditions, such as sepsis. 8 Similarly, clinical scoring systems, despite providing broader assessment coverage, introduce problematic subjective bias and require extensive training of observers to ensure consistency across facilities and personnel. Voluntary wheel-running was rated among the highest in both neuroscience and models of inflammation and gastroenterology. Interestingly, it remains underutilized despite demonstrated effectiveness, 9 indicating substantial untapped potential. Parameters occupying ‘middle effectiveness’ positions include burrowing, nesting and grimace scaling. Such parameters offer practical and effective enhancements to assessment when properly integrated into monitoring protocols.
Thus, the consortium identified a practical core of broadly informative parameters as an answer to the question of what to measure in welfare assessment to include a wide range of severity-related factors (Figure 1(d)). These parameters optimize analytical depth while maintaining routine implementation and feasibility. Their real potential will unfold, however, when they are combined to monitor the severity of procedures and potential welfare impairments on a more holistic level.
Still, three primary obstacles significantly limit the widespread adoption of multi-parameter systems. Economic barriers represent the most immediate challenge: while simple behavioural assays, such as burrowing and nesting, remain inexpensive, telemetry, advanced imaging, and home-cage technologies require substantial capital investments and specialized training. Data complexity constitutes the second major hurdle, as these systems generate high-dimensional data streams that surpass traditional statistical approaches, necessitating expertise in data management, advanced analytics and machine learning. Infrastructure standardization presents the third challenge, with significant variation across commercial home-cage systems, species-specific requirements, and housing designs, effectively precluding the development of universal assessment protocols. These challenges create a concerning implementation gap between objective, data-driven assessment capabilities and the intuitive methods currently employed for severity scoring.
The integration of artificial intelligence (AI) represents the most promising avenue for realizing the full potential of multi-parameter monitoring. Machine learning algorithms can process continuous data streams in real-time, enabling the detection of immediate welfare concerns and automated intervention alerts, while predicting humane endpoints with unprecedented temporal precision. However, AI implementation also faces substantial technical and regulatory hurdles: algorithms require extensive training data that may not exist for many experimental models. Furthermore, the so-called ‘black box’ problem of AI poses transparency challenges that complicate regulatory approval processes, and systems require continual validation across diverse experimental contexts and institutional environments. 10
Successful AI implementation requires a carefully structured, phased approach prioritizing animal welfare enhancement above technological sophistication. Short-term objectives can focus on deploying FOR2591’s existing quantitative tools that effectively bridge advanced sensor platforms with familiar scoring methodologies. Within the next few years, the field is expected to adopt shared data standards across institutions, upskill animal researchers in data science and deploy semi-automated assessment tools that enhance objectivity while maintaining human overseeing. Looking even further ahead, AI-driven home-cage monitoring across species and sites, backed by modern regulations that incorporate objective, multidimensional data-based criteria, will become the new gold standard for welfare assessment. The path forward requires the collaborative integration of technological advancements with practical accessibility, ensuring that enhanced monitoring capabilities consistently serve their primary purpose of optimizing individual animal welfare while advancing scientific understanding through more precise, objective and humane research practices.
Footnotes
Acknowledgements
We extend our gratitude to the members of FOR2591 who responded to the questionnaire leading to Figure 1(d). C Häger, L Ernst, H Potschka, P Gass, AS Mallien, B Vollmar, D Zechner, F Kiessling, M Bankstahl, K Schwabe and U Lindauer.
Data availability
This work has no data to share.
Declaration of conflicting interests
The authors have no conflicts of interest to declare.
Ethical considerations
Our study did not require ethical board approval because it did not contain human or animal trials.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation (DFG) (grant numbers TA2072/1-1, TO542/5-2, TO542/6-2, TO542/9-1, BL953/10-1, 10-2 and 10-3, BL953/11-1 and 11-2).
