Abstract
Objectives
The aim of the study was to determine the effects of training on inter-rater reliability and agreement of Feline Grimace Scale (FGS) scoring by small animal practitioners.
Methods
Seven small animal veterinarians were asked to score a total of 50 images of cats in varying degrees of pain before and after training in FGS scoring. Participant scores were compared with those of an expert rater. Inter-rater reliability was analyzed using the intraclass correlation coefficient (ICC) before and after training (ICC <0.50 = poor reliability, 0.50–0.75 = moderate reliability, 0.76–0.90 = good reliability and >0.90 = excellent reliability). The Bland–Altman method was used to analyze the limits of agreement (LoAs) and bias between participants and the expert rater.
Results
After training, the ICC classification improved for each action unit (ear position, orbital tightening, muzzle tension, whiskers change and head position). The inter-rater reliability for the total FGS ratio scores before and after the FGS training session was moderate (ICC = 0.75; 95% confidence interval [CI] 0.66–0.83) and good (ICC = 0.80; 95% CI 0.73–0.87), respectively. Before training, LoAs were −0.277 to 0.310 with a bias of 0.016. After training, LoAs were −0.237 to 0.255 with a bias of 0.008. The bias was low (<0.1) both before and after training and LoAs did not span the FGS analgesic threshold (0.39).
Conclusions and relevance
Training in FGS scoring improved inter-rater reliability and agreement among seven small animal veterinarians and the veterinarians’ skills in pain assessment.
Introduction
Oligoanalgesia is the failure to recognize pain and provide analgesia and it can occur in small animal practice.1,2 The purposeful or inadvertent withholding of analgesic medications in patients with painful conditions, such as those of medical, surgical or traumatic origin, can lead to disastrous consequences, including the development of chronic, neuropathic or maladaptive pain states with a negative impact on quality of life.1,3–5 The literature suggests that cats experience oligoanalgesia more commonly than dogs.1,6 This may be due in part to their unique behavioral expressions of fear, anxiety, stress and pain in clinical settings and their reduced capacity for inter- and intraspecies social communication.7–9 Therefore, the approach to pain recognition in cats involves cat-friendly interactive techniques and validated pain assessment scales.3,4,6,10–12
Appropriate pain management is not possible without pain recognition.3,4,6,13,14 Despite the recent development and validation of species-specific pain assessment scales, the routine use of these tools is not common in small animal practice.13,15,16 Reasons for the poor implementation of these pain assessment tools in clinical practice include the absence of formal training and knowledge gaps related to pain assessment scales.13,15 It is apparent that increased education on pain assessment and management is necessary in the veterinary medical profession. It is equally important to determine how pain assessment tools can be efficiently implemented into clinical practice. This may aid in increased pain awareness as the ‘fourth vital sign’ and inspire a paradigm shift in which all veterinary patients are entitled to the rapid identification and alleviation of pain, thereby minimizing or eliminating oligoanalgesia. 17
Grimace scales have been developed for numerous laboratory, production and companion animal species. They are easy and rapid to use and provide information about complex emotive states, making them invaluable in pain and welfare assessment. 14 The Feline Grimace Scale (FGS) consists of five action units (AUs) (ear position, orbital tightening, muzzle tension, whiskers change and head position) that are individually scored (0 = AU is absent; 1 = moderate appearance of the AU or uncertainty over its presence or absence; and 2 = obvious appearance of the AU). The maximum score is 10, provided all AUs are scored. Otherwise, the sum of scores is added and divided by the maximum possible score resulting in the total FGS ratio score. The total FGS ratio score associated with the need for rescue analgesia is 0.39. 10
The FGS presented moderate to excellent reliability, construct and criterion validity, and responsiveness when used for pain assessment in cats with different types of pain.10,14,18 In addition, the FGS is reliable when applied by different raters (cat owners, veterinarians, veterinary students and nurses) and may be used in both real-time or image assessment.8,19,20 Although the need for training before using grimace scales is generally considered minimal, it has not been determined if training improves FGS scoring and the ability to recognize pain in cats. 10 Training has been shown to improve novice rater agreement when using the Rat Grimace Scale (RGS), and the positive effects of the training were shown to persist for years, regardless of the frequency with which the RGS was used in the interim. 21
The objective of this study was to determine the effects of training on inter-rater reliability and agreement of FGS scoring by small animal practitioners. The hypothesis was that inter-rater reliability and agreement would be improved among raters using the FGS after 1 h of online training when evaluating the same cat images.
Materials and methods
Ethical approval
The study protocol received ethical approval from the Human and Artefacts Ethics Sub-Committee of City University of Hong Kong (approval number HU-STA-00000554). The research study was conducted according to the Tri-Council Policy Statement ‘Ethical Conduct for Research Involving Humans’. Electronic informed consent was acquired from each participant before the study began. There was no incentive offered and participants reserved the right to withdraw via verbal or email notification without the need for justification of their decision. This study is reported in accordance with the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). 22
Participant recruitment
Participants were recruited via email invitation within a convenience sample of 12 small animal veterinarians acquainted with the researchers. Participants had to meet the following criteria: (1) be a licensed veterinarian working in small animal practice without an advanced diploma or training in feline medicine or pain management, or residency or PhD training; (2) be able to communicate in English; (3) have access to a computer with a reliable internet connection; (4) be dedicated to evaluating and assigning FGS scores for 50 images of cats on two separate occasions; and (5) participate in an online training tutorial about the FGS provided by a board-certified individual of the American College of Veterinary Anesthesia and Analgesia (PVS). A total time commitment of approximately 3 h was necessary to complete the study.
Participation was voluntary and withdrawal from the study was possible at any time point on verbal or via email notification without any justification required for the decision. Participants were warned that they could find the images of cats in pain disturbing and could skip them or withdraw from the study at any time, but they were assured that all images were acquired through previous research projects with ethical approval and that any cat documented as painful received appropriate analgesia in a timely fashion (see supplementary material). All data collected remained anonymous and access to the data was limited to one researcher (ARR) who had password-protected access to the online survey platform. Participant recruitment occurred between 1 and 15 October 2023.
Online survey
Two online surveys that contained the same content but with a different order of cat image presentation were created using LimeSurvey (version 3.28.52, LimeSurvey Community Edition). Each survey contained an informed consent statement followed by a series of demographic questions related to the following: gender (male, female, prefer not to say, other); age (20–29, 30–39, 40–49, 50–59, 60–69, >70 years); graduation year (2020–2023, 2010–2019, 2000–2009, 1990–1999, 1980–1989, other); and how many years the participant had spent in small animal practice (<5, 5–9, 10–19, 20–29, other). Each demographic question was displayed on an individual page within the survey. Participants were then provided with a hyperlink to the online FGS training manual (www.felinegrimacescale.com), which they consulted as needed during image scoring. A total of 50 high-quality images of cats were then scored using the FGS. The two surveys contained the same images, but the images were randomized (randomizer.org) to reduce memory bias. Each image of a cat was displayed on a new survey page, as shown in Figure 1.

Example of a survey page showing a facial image of a cat with possible survey response options listed below for small animal veterinarians completing a closed, online survey containing 50 images of painful (n = 25) and pain-free (n = 25) cats. Images were scored using the Feline Grimace Scale (FGS), both before and after receiving training in the use of the scale. Participants had access to the FGS training manual to use as a reference during completion of both surveys
Before launching, the surveys underwent beta testing by three individuals who were not involved with the study to ensure survey usability and functionality.13,22 Participants had access to the first survey via a unique URL sent by email from one of the investigators (ARR). The first survey was accessible for participants for 7 days (6–13 November 2023). One week after the closure of the first survey, the training session on the use of the FGS was provided (20 November 2023). After 48 h, access to the second survey was granted to participants. Participants had a 7-day period to complete the second round of image assessment (22–29 November 2023).
Image selection
The images of cat faces were acquired from a data bank from the laboratory of the principal investigator (PVS). The databank contains images of cats and kittens in various degrees of pain, either before or after the administration of analgesia or surgery. Images were collected from previous studies, all of which were performed after ethical review and approval by the institutional animal care and use committee of the Université de Montréal.10,18,19,23–26 Based on the total FGS ratio score assigned by an expert rater (PVS), images of domestic shorthair or longhair cats were separated into two groups: non-painful cats (422 images with FGS <0.40) and painful cats (112 images with FGS ⩾0.40). Twenty-five images from each group (for a total of 50 images) were randomly selected using an online randomizer (www.randomization.com). Total FGS ratio scores assigned by the expert rater for the 50 images were used as the gold standard score for statistical analysis.
Feline Grimace Scale training session
All participants involved in the study met via an online platform. A PowerPoint (PowerPoint version 16.81; Microsoft) presentation on the clinical use of the FGS was provided by one of the investigators (PVS). The session included scoring 10 images and participants could optionally share their scores for each image via group chat. Neutral and real-time feedback regarding their interpretation of the facial AUs and FGS scoring was provided by the presenter during case discussions. Participants disclosed their country of practice during the training session.
Statistical analysis
The raw data from complete survey submissions were exported from LimeSurvey into an Excel file (Excel version 16.81; Microsoft). The intraclass correlation coefficient (ICC) was used to assess inter-rater reliability before and after FGS training. Inter-rater reliability was assessed for the scoring of each AU and for the final FGS ratio scores using a two-way mixed effects model with absolute agreement for a single rater. The following values were used for the interpretation of inter-rater reliability: ICC <0.50 = poor reliability; 0.50–0.75 = moderate reliability; 0.76–0.90 = good reliability; and >0.90 = excellent reliability.26,27
Agreement between participants before and after training and an expert rater (PVS) was assessed using the Bland–Altman method, 28 with plots computed using R version 4.2.3. Based on a previous publication, 19 the authors considered that a bias <0.1 was acceptable, indicating very good agreement. A bias >0.1 (>1 unit in the FGS score) was considered unacceptable, indicating poor agreement. Bias with a negative value would suggest overestimation of the FGS score by the novice rater compared with the gold standard, whereas a positive bias would suggest underestimation of pain. The limits of agreement (LoAs) were interpreted in relation to the analgesic threshold predetermined for the FGS (0.39). The LoAs should not span the cut-off score for analgesia of 0.39.
Results
The completion rate was 100%. A total of 12 recruitment emails were distributed; 12 responses were received and seven veterinarians ultimately enrolled in the study. All the enrolled participants completed both surveys and attended the FGS training session. Of the participants, five identified as women and two as men. The age range of participants was 20–49 years, and all graduated from veterinary school between 2010 and 2019. Five participants had 5–9 years of experience in small animal veterinary practice, whereas one each had <5 and >10 years of experience. Veterinarians were licensed and practicing small animal veterinary medicine in one of three countries (Hong Kong SAR, China, Canada or Australia) at the time of this study.
Before training, inter-rater reliability for muzzle tension (ICC = 0.30; 95% confidence interval [CI] 0.19–0.43) and whiskers change (ICC = 0.48; 95% CI 0.36–0.61) was poor, while for ear position (ICC = 0.69; 95% CI 0.59–0.78), head position (ICC = 0.73; 95% CI 0.63–0.81) and orbital tightening (ICC = 0.75; 95% CI 0.66–0.83) it was moderate. After training, inter-rater reliability for whiskers change (ICC = 0.56; 95% CI 0.45–0.68) was moderate, while for ear position (ICC = 0.76; 95% CI 0.68–0.84), head position (ICC = 0.76; 95% CI 0.68–0.84), muzzle tension (ICC = 0.76; 95% CI 0.68–0.84) and orbital tightening (ICC = 0.80; 95% CI 0.73–0.87) it was good. Inter-rater reliability for the total FGS ratio scores before (ICC = 0.75; 95% CI 0.66–0.83) and after (ICC = 0.80; 95% CI 0.73–0.87) the FGS training session was moderate and good, respectively (Table 1).
Inter-rater reliability of Feline Grimace Scale (FGS) scores among veterinarians (n = 7) who scored 50 images of cats (painful, n = 25; pain-free, n = 25) both before and after FGS training
Data in parentheses are 95% confidence intervals. The following interpretation was used for inter-rater reliability: intraclass correlation coefficient (ICC) <0.50 = poor reliability; 0.50–0.75 = moderate reliability; 0.76–0.90 = good reliability; and >0.90 = excellent reliability
Before training, the LoAs were in the range of −0.277–0.310 with a bias of 0.016 (Figure 2a). The LoAs were narrower after training, with a range of −0.237–0.255 (Figure 2b). The bias was reduced after training to 0.008. The bias was <0.1 before and after training, suggesting that there was good agreement between participant scores and the gold standard at both time points. The LoAs did not span the analgesic cut-off value of 0.39 at either time point (Figure 2).

Bland–Altman plots showing the agreement of Feline Grimace Scale (FGS) scores between a group of small animal veterinarians (n = 7) and an expert rater, considered the gold standard (GS). (a) Agreement between participants and the GS before (PB) the FGS training session. (b) Agreement between participants and the GS after (PA) the FGS training session. Bias (central continuous line) and the limits of agreement (dotted lines) are indicated on each respective plot
Discussion
The aim of this study was to determine if training in pain assessment using the FGS would improve inter-rater reliability and agreement after image scoring by a sample of small animal veterinarians. The FGS training increased inter-rater reliability for all individual AUs, but particularly for muzzle tension where there was poor reliability among raters before training. This is not surprising as muzzle tension has shown poor to moderate inter-rater reliability in most studies involving the FGS.8,18,20,26 Inter-rater reliability was improved for the total FGS score after training, suggesting that it could be useful in standardizing feline pain management practices among small animal veterinarians.
A Bland–Altman analysis was used to assess the LoAs and bias before and after training. After FGS training, the LoAs and bias were reduced between participant scores and the gold standard. However, the LoAs were already narrow both before and after training, suggesting overall good agreement between the group of small animal veterinarians and the gold standard at both time points. Importantly, the LoAs did not span the established analgesic cut-off value of the FGS (0.39) at either time point. This suggests that when raters assigned a score >0.39, the patient was likely in pain and in need of analgesic administration. The bias, although small (<0.1) in both groups, was a positive value both before and after training. This suggests that novice raters tended to slightly underscore the images using the FGS when compared with the expert rater. Although the clinical relevance of this is unknown, it could ultimately result in a minority of cats not receiving analgesic therapy when necessary if scores are close to those suggestive of a need for rescue analgesia. However, use of the FGS in clinical practice will also incorporate the clinical judgment of the clinician, which might help to minimize this risk.
Reliability, the extent to which measurements are reproducible, is an important characteristic in assessment tools, such as pain scales. The inter- and intrarater reliability of the FGS has been studied in various settings and by a heterogenous selection of raters (ie, cat caregivers, veterinary students, nurses, veterinarians).8,10,18,20,26 In general, inter-rater reliability for the FGS is consistently reported to be good or excellent. However, changes in inter-rater reliability among small animal veterinarians before and after FGS training have not been reported previously. Inter-rater reliability is indicative of the variation among raters who measure the same variable. For an appropriate interpretation of results, it is important to consider the methodology of inter-rater reliability used in the statistical analysis. 27 For this study, the ICC involved a two-way mixed effects model, single measurement type for absolute agreement. The two-way mixed effects model was chosen to generalize the study results to a specific type of population (ie, small animal veterinarians). The ICCsingle was reported in this study as opposed to the ICCaverage. The ICCsingle assumes that the measurement reported is that from a single rater rather than the averaged sum of all raters, which is more relevant to clinical practice (ie, the use of the FGS would involve a single rater in clinical practice). Furthermore, results from ICCaverage are usually higher than ICCsingle, which can lead to inflation of study results. 26 In addition, the classification of reliability may vary among studies and this should be also taken into consideration when interpreting the results of this study and comparing with previous findings in the literature.26,27,29
This study has limitations. It evaluated the effects of FGS training using a small sample of small animal veterinarians without advanced training in feline pain assessment (ie, residency, PhD or board certification). In addition, the participant pool contained both male and female veterinarians with varying degrees of small animal clinical experience. It is not known if the results could be extrapolated to a larger sample with an even number of individuals of the same gender, which would include small animal veterinarians with similar levels of experience, from countries other than Hong Kong SAR, China, Canada and Australia and/or with advanced training. In previous studies, canine and feline pain assessment was affected by gender and previous experience and training. For example, veterinary students scored pain differently from graduate students and veterinary anesthesiologists. 30 In the same study, female participants were more empathetic than men and gave higher pain scores to videos of dogs and cats. Historically, female veterinarians have performed pain assessment and administered analgesics in the acute setting more often than male veterinarians. 31 However, a recent study involving a large sample of cat caregivers did not find a gender effect on pain assessment when using the FGS. 8 Our study was not designed to investigate the effects of gender, advanced training or type of training methods (eg, number of hours, webinar, face-to-face training, flipped classroom, formative assessments) on FGS pain assessment and these issues are subjects for future studies. Overall, it is known that didactic and clinical training improve the perceptions and ability of veterinarians to assess and treat pain and our results corroborate these previous findings. 32
Conclusions
Training in pain assessment using the FGS improved inter-rater reliability among seven small animal veterinarians and their skills in pain assessment. Training also improved the LoAs between novice raters and a gold standard and reduced bias. When incorporating the use of the FGS into clinical practice, it is advisable to provide basic training to improve reliability and accuracy and to help standardize patient care in feline practice.
Supplemental Material
sj-docx-1-jfm-10.1177_1098612X241275284 – Supplemental material for Effects of training on Feline Grimace Scale scoring for acute pain assessment in cats
A sample of the first survey, including informed consent, demographic variables and an example of the facial image of a cat to be scored.
Footnotes
Acknowledgements
The authors wish to individually thank each study participant (Drs Ting Hey Denise Chan, Jason Chen, Vivian Chow Yuen Hei, Jason Makar, Danielle McEachern, Angela Pantangco and Erin Patterson) for their dedication and contribution to the growing collection of scientific literature on feline pain assessment and management. In addition, the authors would like to thank Mr Tristan Juette and Dr Saad Bukhari for their assistance with the statistical analysis.
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplementary material
The following file is available as supplementary material:
A sample of the first survey, including informed consent, demographic variables and an example of the facial image of a cat to be scored.
Ethical approval
This work did not involve the use of animals and therefore ethical approval was not specifically required for publication in JFMS.
Informed consent
This work did not involve the use of animals (including cadavers) and therefore informed consent was not required. No animals or people are identifiable within this publication, and therefore additional informed consent for publication was not required.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
