Abstract
The EU Directive 2010/63 on the protection of animals used for scientific purposes, as well as the Swiss Animal Welfare Legislation, demand monitoring and documentation of specific aspects of an animal experiment, including welfare-related issues and the (retrospective) assessment of the severity of the procedures that the animals underwent. A score sheet is an efficient tool for the evaluation of the burden of an animal during an experiment and, if properly designed and used, helps adhere to the 3Rs principle. It must be adapted to the specifics of each experiment and explicitly conceived for it. Several score sheet examples have been published; however, some contain fundamental flaws or are designed for specific settings only, requiring modifications to fit other experimental designs. This paper suggests an eight-step procedure to design a score sheet that can be adapted to any animal species and experimental conditions.
Introduction
Animal welfare legislations, such as the EU Directive 2010/63 1 and the Swiss Animal Welfare Law, 2 require monitoring, documentation and retrospective severity assessment in animal experiments. A key tool supporting these requirements is the score sheet, a structured form for consistent observation, standardized interventions, and structured evaluation of animal well-being.3–5 It records clinical signs, guides decisions, and defines criteria for humane endpoints, helping to ensure both ethical compliance and scientific quality. Score sheets are based on the guidelines by Morton and Griffiths 6 and further developed by others.4,7–9
A score sheet must be tailored to the experimental setup to ensure trained personnel can perform a reproducible and standardized assessment. This enables timely interventions, reduces constraints, and ensures adherence to humane endpoints. Well-designed score sheets document the animal’s burden and interventions to reduce the constraint. They also improve scientific validity, reliability, and data traceability.4,10,11 Score sheets must match the species, interventions, and model used.
A literature evaluation on score sheets revealed that no publication explains how to design a score sheet (reviewed in Bugnon et al. 2016 4 ). There are published sources on general guidance on score sheet use.7,8,12–17 Recently, score sheets have been published for several species and specific research fields and models.5,10,18–23 A collection of score sheets is accessible at https://journals.sagepub.com/page/lan/collections/score-sheets. Nevertheless, a previous analysis 4 showed that many published score sheets contain flaws that could lead to inadequate interventions and negatively affect animal welfare. Examples include incorrect sequencing of symptom monitoring, impairing proper assessment (e.g. weighing animals before observing undisturbed behaviour) or failing to cover certain ranges of body weight loss.12,16,24–26 Regular reviews and updates to the score sheet are necessary, especially if novel findings emerge during monitoring. This prevents missing important symptoms not listed on the score sheet and avoids diluting total scores by including parameters never observed in the model.4,7,12,14,20,25 A thorough revision is necessary when refinements relevant to the study are published.
Without clear instructions, it is challenging to create a score sheet that detects relevant clinical symptoms, is easy to use, and leads to meaningful interventions and humane endpoints. We developed an interactive training to establish a score sheet scaffold adjustable for all animal experiments and species (https://vsfltkreg.uzh.ch/course/m-11/en). As a result, we devised an eight-step approach (Figure 1) to help design an efficient score sheet. Since May 2015, we have successfully demonstrated this method in our interactive continuing education courses.

Flowchart outlining the steps to create a score sheet for animal experimentation.
Eight steps to create a score sheet for animal experiments
Step 1. General information
The score sheet should include key study information such as the study name, licence number, and the name and telephone number of the personnel to be contacted for questions or emergencies. It should also state the author and version number to ensure use of the latest version. Basic details include the animal identification number and group designation (e.g., control, treatment, or group code). Each score sheet must have space to record the date and time, if relevant (e.g., for multiple observations on the same day). At the bottom, space should be provided for the observer’s name and signature. To ensure traceability for future use, such as interpreting outcomes, reporting the experiment, or preparing a manuscript, the general information allows the score sheet data to be clearly linked to a specific experiment and its corresponding animal or group.
Step 2. Problem list: which deviations from the animal’s normal state are expected in the experimental setting?
This step is the most crucial part of creating the score sheet (Figure 1); however, it is often neglected when designing score sheets. This step aims to develop an extensive list of possible problems and deviations from the normal state, based on the planned experimental interventions. It is necessary to consider general as well as experiment- or model-specific deviations from the animal’s normal state.6,27,28 The problem list should comprise behavioural and physiological deviations 14 as listed in Table 1. Familiarity with the experimental model, the species in question, and the behavioural repertoire of the species is a prerequisite. 29 Furthermore, a thorough literature search is required to gain important information regarding what has been reported in similar experimental conditions. The experimenters should also consider other parameters that are not directly related to the experimental conditions. The animal’s intrinsic factors, such as sex, age, strain, or genotype, can affect its health and well-being.30–33 This further emphasizes the importance of having deep knowledge regarding the animals and ensures that the list of problems corresponds to the experimental setting. The expected deviations/problems recognized in this step are not the parameters to be assessed and listed on the score sheet, but they will serve to prepare the ground for these parameters.
Examples of deviations from normal state (Step 2) and possible questions that will help to define parameters to assess these deviations (Step 3). Note that this list is non-exhaustive, and not all parameters are suited for all species and all experimental settings.
Step 3. Parameter determination: which indicators help to detect deviations from the normal state?
Building on the list created during Step 2, it is helpful to formulate a series of questions that will support the determination of the parameters to detect deviations from the normal state (Table 1). Ideally, the parameters are robust and deliver comparable results between different raters or laboratories. For every problem detected in the previous step, researchers should consider all possible options to identify and subsequently evaluate the problem. 14 It is crucial at this stage that experimenters determine which parameters are most appropriate for their expertise and possibilities (Figure 2). Similarly, all chosen parameters should be sensitive and specific, facilitating the detection of the intended deviation with a high success rate. 34 The chosen parameters must be suited to the animal species, the experimental conditions, and the experience of the people responsible for monitoring the animals and using the score sheet. Parameter choice can also be guided by published sources. Recently, there have been some efforts to publish composite scores for specific experimental conditions.22,23

Flowchart showing the progression from Step 2 to Step 3. Once the putative problem list has been created, coming up with several questions per problem will help in a later step to decide which parameters can be used to monitor the individual animal. Each colour represents a different problem. Note that there is variability in the number of questions and parameters per problem.
Some deviations are easier to assess than others. For example, signs of pain and early infection signs are not always easy to recognize. Thus, experimenters must carefully consider which indicators might be helpful for monitoring. Furthermore, some parameters are only useful under specific experimental conditions. For example, reduced water and food intake can be easily identified when animals are single-housed; however, the interpretation of this parameter in group-housed animals is limited since such values give no information on an individual animal. 35 Similarly, some score sheets suggest using heart or breathing rate as parameters. These indicators are very helpful for assessing the animal’s state but are difficult or impossible to measure in small species, such as rodents or aquatic species.
For this step, knowing the expected behaviour and physiology of the species and the individual animal is a prerequisite.29,36 When prey animals are in pain, they may exhibit ‘normal’ behaviour as an innate response, since this may protect them from predators; this is known as ‘displacement behaviour’. 35 Furthermore, stress due to manipulation can affect the animal’s behaviour (see Step 4). For example, it is difficult to observe small rodents in their cages without moving or opening the cages; therefore, observed behaviours will be altered. 35 Similarly, if an animal has undergone surgery and cannot move properly owing to the procedure, this must be considered when assessing the animal. The same applies when analgesia is given; depending on the drug, behavioural changes such as apathy, reduced food intake, or increased activity can be observed. 37 It should further be noted that the observer can affect the behaviour of the observed animals. Specifically, it has been shown that rodents respond differently to male and female observers and experimenters, and handling by male personnel induces increased stress. 38 It has been proposed that the stress response related to males can suppress pain behaviours. 39 Similarly, it has been demonstrated that rodents respond to the affective state of the handler. 40 Therefore, during the interpretation of the scoresheet data, intrinsic factors related to the observer should be considered. Finally, the time of day when monitoring is done has to be carefully considered since it might influence the parameters to be observed. Furthermore, the light or dark phase is directly related to the activity levels of the animals; if observations are done during the inactive phase, behaviours will likely be missed (see Step 4).
Step 4. Parameter sequence
All manipulations to the cage or the animal itself may change the animal’s behaviour or stress level, especially when working with prey animals. 35 The order in which parameters are assessed is, therefore, of great importance. 7 This should be reflected in the design of the score sheet. The score sheet is intended to be read from top to bottom, like a text or list. However, if the parameter order is not adapted to this ‘reading scheme’ it might interfere with the assessment of symptoms, making it impossible to observe certain parameters/behaviours, or bias the result of the measurements and observations. Therefore, in this step, the list of parameters from Step 3 will be ordered from pure observation without any stimulus to observations and tests that would, when performed first, interfere with the next measurements. Similarly, some parameters can be influenced by stress (i.e. heart rate), whereas others remain unchanged (i.e. body mass). Therefore, it is better to start with visual observations that do not require any manipulation (e.g. assessment of social behaviour or posture) followed by responses to stimuli (e.g. provoked behaviour) to those that involve manipulation (e.g. heart rate, body temperature measurement, body weight) (Figure 3).

Parameter sequence. Once all parameters have been decided, they should be ordered in a logical sequence. Parameters that require only observation are assessed first, response to stimulus is tested afterwards and the final round includes recording parameters that require manipulation.
Step 5. Monitoring frequency
The frequency of observations should be adapted to the experimental setting. It must be frequent enough to allow for the timely detection of changes in the animals’ well-being. 7 It is important to adjust the evaluation frequency to the different experimental phases, as the monitoring requirements may vary considerably. For example, in a cancer study involving subcutaneous tumours, the animal’s state will not be impacted at the start of the experiment when the tumour is small and not metastasized. In this early phase, the monitoring frequency may be lower, to not burden the animal with unnecessary scoring stress. However, as the tumour grows or metastasizes, the frequency must be increased, as negative effects on the animal are to be expected. The opposite applies to the monitoring frequency after surgery. In this case, more frequent observation is required at the beginning (immediately after the procedure), as pain and stress occur immediately after surgery and, therefore, require close observation to assess the efficacy of the analgesic treatment. However, as adverse effects subside over time and analgesia is administered, longer intervals between observations are possible if no adverse events are observed. The researchers should reflect on the experimental design and determine when to adjust the observation frequency. For animal welfare, it is best to determine the observation frequency based on the cumulative score of the animals, increasing the frequency of monitoring if welfare is impaired.
Additionally, as discussed in Step 3, the effect of time of day on observations should be considered. 14 It is advisable to observe the animals during their active phase, that is, for most rodents, during the dark phase, to ensure accurate monitoring of all parameters and to minimize the likelihood of missing important indicators of declining well-being. If monitoring during the animals’ active phase is not possible, the choice of parameters should reflect the fact that the animals are checked during their non-active phase. Furthermore, to increase the reliability of the score sheet, the observations should be done at the same time every day.
Step 6. Scoring system and grading the severity of the signs
To develop a system to grade the severity of the observed changes, the parameters defined in Step 3 are used (Figure 1 and Table 2). This allows for a systematic assessment of the animal’s state, as well as continuous monitoring of its welfare, since scores can be compared between successive monitoring points. 14
Example of binary and numerical systems within a score sheet.
The scoring system can be either a simple binary system (yes/no, present/absent), or a numerical score that weighs/quantifies the symptoms and can account for cumulative harm by summing the individual scores.14,41 A combination of both systems is also possible. Each system, whether binary or numerical, has relative strengths and weaknesses; some publications provide tables to guide the researchers in choosing the best system (or system combination) in a specific context.14,15,42,43 A binary score can be used for parameters whose intensity is difficult or impossible to assess; only presence or absence of the parameter is noted (Table 2). Binary scores are also useful for parameters that lead to immediate decisions, such as termination criteria (humane endpoints). Numerical scores aim at quantifying the severity of signs or parameters. A score of ‘0’ means normal animal state, and increasing values indicate increasing severity and impairment of the animal’s state.14,20,25 Additionally, the use of a non-regular scoring system (e.g. 0–1–5–6) is possible, if necessary, as each score given to a situation reflects the quantification of the constraint (Table 2). If the researcher believes that one of the parameters has a greater impact on the animal’s health and well-being, a higher score could be assigned, accounting for the greater weight of a certain symptom. 14
Regardless of the scoring system, calculating a cumulative score by summing the individual parameter scores should be considered.13,14 This cumulative score reflects the overall burden on the animal at a given time point and supports decisions regarding observation frequency and predefined termination criteria. However, interventions should be based on single parameters, addressing each problem individually. When defined scores are reached, either for the individual parameters or the cumulative scores, humane intervention and endpoints, respectively, should be defined (see Step 7). 9 Furthermore, existing quantitative measurements, such as body condition score,44,45 can be included as scoring systems for specific parameters.
This step is completed by adding a designated space to record the actual severity degree based on the cumulative scores of all parameters assessed. If the severity degree of the experimental animals needs to be reported to the authorities this information can then be easily collected and provided based on the data of the individual animals. 8
Step 6.1. Unexpected events or nothing abnormal detected
The score sheet should be simple when an animal’s well-being appears normal. Thus, including a checkbox such as ‘No abnormality detected’ can streamline documentation. This feature facilitates efficient use of the score sheet while still ensuring that both the monitoring process and the animal’s normal state are properly recorded. On the other hand, symptoms might arise in an animal experiment that were not expected. Therefore, space to record detailed information of such unexpected events is needed. Recording such information is important, as it may impact the experiment and, consequently, the interpretation of the results. If such unexpected findings are recorded more often, they should be included in the score sheet.
Step 7. Definition of the interventions
The experimenter should always strive to minimize the animal’s burden as much as possible within the context of the study. In this step, the researcher determines when actions or humane interventions should be taken and when humane endpoints are reached (according to Step 6). 46 When a deviation from an animal’s normal state is identified, predefined humane intervention points must be established to prevent further deterioration of the animal’s state. 9 For each scenario outlined in Step 3, suitable interventions should be defined, with clearly described procedures to ensure that appropriate measures can be implemented promptly and effectively. Such interventions include, but are not limited to, fluid replacement for dehydration, analgesic treatment for pain, and providing easily accessible food for animals with mobility impairments or weakness. 47 The score sheet must also include re-evaluations to assess the success of the defined interventions (e.g., to assess the effectiveness of painkillers). If interventions were not successful, further actions must be predefined, such as additional interventions or experiment termination. 9 The humane interventions or termination procedures are to be included on the score sheet.
In some experimental settings, such as infection models, deviations are expected and do not grant interventions for the sake of the experiment; however, for animal welfare and for scientific reasons, monitoring should always be done, and all deviations noted.
Step 8. Testing the score sheet (‘play the game’)
The objective of this step is to identify any potential issues that may have been missed in previous steps and to make adjustments before the experimental phase begins. At this stage, all personnel involved should participate in the evaluation, ideally by rehearsing scenarios representing different levels of change in the animals’ state. This step also ensures that everyone understands how to use the score sheet. In countries where the score sheet is a mandatory part of the experimentation licence, it helps reduce the need for amendments that might arise if issues are noticed later.
We recommend reviewing each step carefully so that all deviations from the normal state are considered and few modifications are needed during the study. Nevertheless, changes to the score sheet should be made whenever necessary, and the version updated accordingly. If the score sheet is part of a licence, any new version must be approved by the authorities.
Bonus Step. Score sheet annex
As a useful addition, we recommend creating an annex to the score sheet. This annex contains supporting documents that are not required for daily evaluations but help ensure quick and accurate responses when intervention is needed. For example, in experiments where skin injuries or tumours are expected, a diagram of the animal (dorsal, ventral, lateral views) allows for drawing the location and size of lesions, making it easier to monitor progress. The annex can also include detailed procedures for interventions (Step 7), humane euthanasia protocols, and instructions for sample collection and processing in the case of premature termination. These documents ensure that important samples are not lost and support reproducibility. Any other relevant materials may also be included. The annex should be stored with the score sheet for easy access in the event of unexpected situations.
Discussion
The development and implementation of score sheets in animal experimentation are key to refining procedures, enhancing animal welfare, and ensuring scientific validity. Legal frameworks in Europe and Switzerland require monitoring and retrospective severity assessment, but do not prescribe specific methods. Score sheets are the most suitable tool to meet these requirements. Other monitoring options exist, including endpointR, 48 and RELSA. 49 However, software such as endpointR only accounts for one variable, whilst well-being deterioration is multifactorial, and it is possible that using such an approach leads to under- or over-interpretation of the animal’s state. On the other hand, a composite measures scale like RELSA is more comprehensive, while lacking validation, and it also does not include behavioural parameters. Therefore, until further development of other monitoring tools, score sheets are the most suitable tool to meet the legal requirements.
Our eight-step approach offers a structured, broadly applicable framework that enhances ethical responsibility and experimental rigour.
One size does not fit all in monitoring animal well-being. Species, interventions, and housing conditions vary, requiring individualized tools. Literature provides many examples, but few offer systematic guidance on designing score sheets effectively. Existing sheets often lack transparency in parameter selection, order, and scoring logic, leading to compromised assessments and suboptimal care. Our method emphasizes proactive planning and adaptation, starting with analysis of deviations from normal behaviour and physiology.
Our framework promotes flexibility and regular review of parameters and thresholds based on new data or unforeseen events. Score sheets are living documents evolving through observation. Including steps such as testing and maintaining an annex ensures that both routine and exceptional situations are addressed efficiently. Involving all personnel in score sheet development and rehearsal fosters shared understanding, consistent observation and quicker responses. This supports broader efforts in lab animal science to improve training and accountability.
A well-designed score sheet enables the early detection of adverse effects, allowing for intervention before serious harm. It supports the 3Rs principle, particularly refinement, by encouraging evidence-based and timely interventions. This strengthens both animal welfare and the outcomes’ reliability, crucial for scientific integrity and publication.
While grounded in practical experience and scientific review, our eight-step guide remains adaptable. New technologies, such as automated monitoring or AI-based behaviour analysis, might enhance the monitoring methodology. Berce (2024) 5 evaluated AI-generated score sheets and concluded that thorough review and editing are necessary. Future research should explore how to integrate such tools without compromising core ethical and scientific standards.
In conclusion, our framework enables researchers to develop tailored, and scientifically robust score sheets, bridging regulations, ethics and precision to improve both animal care and research quality.
Conclusion
A carefully constructed score sheet is key to promoting refinement in animal experimentation, contributing to improved animal welfare and enhanced scientific quality. As a living document, it should be reviewed and adjusted during the study in response to observed signs and unexpected findings. Systematic documentation of clinical observations and interventions ensures compliance with regulatory and ethical standards and supports more robust data interpretation and reproducibility. The eight-step framework presented here offers a flexible, scientifically grounded method for developing score sheets tailored to specific experimental models and species, providing researchers with a reliable means to detect relevant changes in well-being and implement timely, appropriate interventions.
Footnotes
Acknowledgements
We are grateful to Catalina Martinez-Guerra for her valuable assistance in designing all figures included in this manuscript. We also thank Prof. Dr Thorsten Buch and Prof. Dr Christopher Pryce for their thoughtful revisions of the text.
Data availability statement
This study did not involve the collection, generation or analysis of any quantitative or qualitative datasets. The research is based entirely on conceptual, theoretical or methodological work, and does not rely on empirical data. Therefore, there are no data available for sharing or archiving. As a result, no data repository links or contact information for data access are provided.
Declaration of conflicting interests
The authors have no conflicts of interest to declare.
Ethics statement
This project did not require an ethical board approval because it did not involve human or animal trials.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
