Sage Journals: Discover world-class research

Abstract

Regular use of standardized observational tools to assess nonverbal pain behaviors results in improved pain care for older adults with severe dementia. While frequent monitoring of pain behaviors in long-term care (LTC) is constrained by resource limitations, computer vision technology has the potential to mitigate these challenges. A computerized algorithm designed to assess pain behavior in older adults with and without dementia was recently developed and validated using video recordings. This study was the first live, real-time evaluation of the algorithm incorporated in an automated system with community-dwelling older adults in a laboratory. Three safely-administered thermal pain tasks were completed while the system automatically processed facial activity. Receiver Operating Characteristic curves were used to determine the sensitivity and specificity of the system in identifying facial pain expressions using gold standard manual coding. The relationship between scoring methods was analyzed and gender differences were explored. Results supported the potential viability of the system for use with older adults. System performance improved when more intense facial pain expressiveness was considered. While average pain scores remained homogenous between genders, system performance was better for women. Findings will be used to further refine the system prior to future field testing in LTC.

Keywords

Pain aging technology older adults computer vision dementia

Introduction

It is estimated that over half of older adults with dementia suffer from pain.^1,2 Pain in people with dementia however, remains underassessed and undermanaged compared to pain in individuals without cognitive impairments.^3–7 One of the main contributors to underassessed and undertreated pain in people with dementia is a frequent reliance on traditional methods of pain assessment, specifically the self-report of pain which requires a certain level of cognitive and linguistic ability.⁸ As many individuals with severe dementia have difficulties in the ability to communicate verbally, self-report methods may not always be suitable for their pain assessment.^9–11 To address this problem, standardized observational approaches focusing on non-verbal expressive behaviors, such as facial expressions, have been developed and validated in the assessment of pain of older adults with dementia.⁹

Considering the high prevalence of pain among residents of long-term care (LTC) facilities,^12,13 standardized pain assessments completed regularly by health care professionals have been shown to be associated with beneficial outcomes for residents (e.g., improved pain management practices) and reduced stress for staff.^14,15 Limited staffing and resources as well as insufficient continuing staff education have often interfered with the implementation of effective pain assessment methods in LTC settings.^16,17

Technology has the potential to benefit the lives of older adults while also mitigating the impacts of staffing and resource limitations in health-care.¹⁸ Smart technologies that rely on machine learning and artificial intelligence designed for residents with dementia have shown considerable promise.^19–22 In particular, computer vision systems have been explored to address the limitations of continuous direct observation by staff, which in turn could guide interventions tailored to the specific needs of LTC residents.^23,24

In the area of pain, computer vision algorithms have been developed to detect and recognize specific pain behaviors including facial expressions of pain.²⁵ Facial analysis algorithms have been developed using datasets of young and middle-aged adults.^25–28 Consequently, such algorithms do not tend to perform well in investigations of older adults due to a lack of available training data that include older populations with facial wrinkling that may be misinterpreted as facial expressions.²⁹ Moreover, biases in algorithm performance have been found for a number of variables including: age, gender, ethnicity, and cognitive ability.^30,31 These biases largely stem from the datasets that were used to develop the algorithm as video recordings used for training purposes often are largely homogenous.³²

Recently, an automated pain behavior detection algorithm has been developed, using advanced machine learning and deep learning techniques, to detect and monitor facial expressions of pain in older adults with and without dementia.³³ In contrast to prior methods, a significant modification of the algorithm was that it uses pairwise pain detection where a target frame is compared to a reference frame from the same individual, thus reducing sensitivity to wrinkles and other idiosyncrocies.³³ This also increases the number of frames within a sample that can be used to train the algorithm and account for changes in facial expression and provides more context for analysis. Another modification included introduction of a contrastive training method and allowing for the algorithm to be trained based on multiple datasets, resulting in a more generalizable and clinically useful system.^33,34 This modified algorithm is the first fully automated system to be validated using a large video dataset depicting faces of older adults with dementia.

The performance of the Rezaei et al. algorithm³³ has been evaluated based on annotations of video datasets depicting older adults displaying facial pain expressions. Based on Receiver Operating Characteristic (ROC) curve analyses, the algorithm achieved an area under the curve (AUC) of 0.83 (per frame) for faces of older adults with dementia and an AUC of 0.86 (per frame) for faces of older adults without dementia.³³ Performance of the algorithm increased when rolling windows were considered.

The primary objective of this study was to conduct the first real time evaluation of the automated pain behavior detection algorithm in a laboratory setting. The efficacy of the system in detecting and analyzing facial expressions of pain was examined using ROC curve analyses and specifically investigated for possible gender-based differences. The relationship between the analysis of facial expressions by the system, trained coder annotations, and self-report pain ratings was also examined. An additional objective was to determine the relative ability of specific pain-related facial movements to predict system performance and self-report pain intensity ratings. Finally, this study investigated the covariation of nonverbal pain cues with continuous self-report pain intensity ratings rather than single retrospective self-report ratings.

Method

Participant selection

Participants included 65 older adults aged at least 65 years who lived independently in the community (see Table 1). Exclusion criteria included a diagnosis of any conditions that could limit facial movements (e.g., hemiparesis, Parkinson’s disease). Sample size calculations determined that a sample size of 60 participants would be more than adequate to conduct ROC curve analyses in evaluating the system. Effort was made to achieve a gender balance in the sample. Participant recruitment took place through online advertisements, community postings, and word-of-mouth. Potential participants were given a brief description of the study design and provided informed consent for the study. Participants were offered $70 as compensation for taking part in the study. This study was approved by our institutional ethics review board.

Table 1.

Summary of demographic variables.

Demographic variable	n (%)
Gender (N = 65)
Male	28 (43.10)
Female	37 (56.90)
Race/Ethnicity
White	62 (64.60)
Other	3 (12.30)
Education
Grade 12 and Under	33 (50.80)
Diploma/Certificate	8 (12.30)
Post-Secondary Degree	24 (36.90)
Chronic Pain
Present	24 (36.90)
Absent	41 (63.10)
Pain Medication Taken within 24 h
Yes	12 (18.50)
No	53 (81.50)

	M (SD)
Age	71.80 (5.83)
Male	71.11 (6.06)
Female	72.32 (5.67)

Measures

Demographic questionnaire

Participants completed a demographic questionnaire which included questions regarding age, gender, ethnicity, education, any medical diagnoses, medications, and chronic pain.

Facial action coding system (FACS)

The FACS^35,36 is a fine-grained and objective indicator of anatomically-based facial muscle movements. These movements (e.g., cheek raising, brow lowering) are categorized into discrete action units (AUs) and are systematically coded by trained coders for their intensity and frequency using specified criteria. In total, the FACS evaluates 44 facial AUs. The FACS has been used extensively in research examining nonverbal pain behaviors and is a reliable and valid approach to quantify facial pain expressions.^37,38 It has been found to successfully differentiate between genuine and exagerated facial pain expressions as well as between baseline facial activity and pain-related facial activity.^37,39 The FACS can be used to assess facial pain expressions in younger and older adults including older adults with dementia.^9,39–42

Certain AUs have been consistently found to be related to facial activity during pain including brow lowering (AU4), cheek raising/lid compression (AU6), lid tightening (AU7), nose wrinkling (AU9), upper lip raising (AU10), and eye closure (AU43).^43,44 Given the intensive and rigorous requirements of FACS coding, a scoring approach focusing on pain-related AUs has been used in previous research.^43,44 This FACS-based scoring approach has been validated and determined to be reliable.^{25,42,44–48} Using this approach, trained coders watch video recordings and code the frequency and intensity of the pain-related AUs. Certain AUs are combined into a single action due to their co-occurrence.⁴⁴ AU6 and AU7 are combined as orbit tightening. AU9 and AU10 are combined as levator contraction. AU4 and AU43 are coded independently as intended through FACS. To score a facial expression, the intensity of brow lowering, orbit tightening, and levator contraction is coded from 0 (i.e., no action) to 5 (i.e., maximal action), while eye closure is coded as either 0 (i.e., absent) or 1 (i.e., present). A global pain score composite for each facial expression is then calculated by summing the scores of the four pain-related facial actions from 0 to 16.⁴⁴

Computerized visual analogue scale (CoVAS)

The CoVAS is an electronic measurement device that uses a moveable handle that can slide along a horizontal bar measuring 100 mm in length and signifying a visual analogue scale. Visual analogue scales are single-item, continuous scales that measure an individual’s subjective experience such as pain intensity either on a line anchored by two extreme poles (i.e., no pain to worst pain) defined on a paper.⁴⁹ The CoVAS is offered by Medoc Advanced Medical Systems Ltd (Ramat Yishay, Israel) as an accessory assessment tool for the TSA-II NeuroSensory Analyzer. Differing from the conventional paper version, the CoVAS allows for the evaluation of pain intensity from “not intense at all” to “most intense possible” in real time and is measured continuously by handle placement on the bar. The CoVAS has been used in studies to measure pain intensity ratings.^50–53

Equipment and software

TSA-II neurosensory analyzer

The TSA-II NeuroSensory Analyzer is an advanced thermal stimulation device developed by Medoc Advanced Medical Systems Ltd (Ramat Yishay, Israel). This device can generate precise thermal stimulation designed to safely administer cold- and heat-induced pain. The TSA-II operates through using a thermode with a 32 mm² metal plate which is placed on the skin. It can be programmed to deliver specific thermal pain at varying levels and multiple trials. The thermode produces temperatures between 0°C and 55°C at a rate of up to 8°C per second. As a safety feature, a manual trigger allows for the immediate discontinuation of the thermal stimulation with a return to a baseline temperature. The TSA-II has been used in quantitative sensory testing examining pain^54–56 and has been safely used with older adult populations.^46,57,58 Participants underwent three thermal pain stimulation tasks including pain threshold, pain tolerance, and a 5-trial pain-induction task. Maximum temperatures were in accordance with the thermal stimulation cut-off limits used in other laboratory studies with older adults.^57,59

Automated pain behavior detection system

The automated pain behavior detection system uses an algorithm to detect and monitor pain behaviors of older adults with and without dementia.^33,42 The specific algorithm³³ that was used is a deep learning model and its training data included videos of older adults with and without dementia expressing movement-exacerbated pain. To the best of our knowledge, this model is the first fully automated system to be validated using faces of older adults with and without dementia.³³

Accuracy, based on recorded front view videos, was evaluated for single frame predictions and rolling window predictions in which a single aggregated maximum pain score was calculated. Based on previous ROC curve analyses, the model achieved an AUC of 0.83 (per frame) and 0.86 (rolling window of 20 s) for a dataset of older adults with dementia and an AUC of 0.86 (per frame) and 0.85 (rolling window of 20 s) for a dataset of older adults without dementia. The model achieved a correlation coefficient of 0.48 (per frame) and 0.82 (rolling window of 20 s) for older adults with dementia and a correlation coefficient of 0.58 (per frame) and 0.70 (rolling window of 20 s) for older adults without dementia in relation to “gold standard” manual annotations by trained FACS coders.³³

The algorithm operates by analyzing facial expressions in real-time from individual frames through pairwise pain detection such that a target frame is compared to a reference frame.³³ A target frame contains the facial expression to be analyzed while a reference frame contains a neutral facial expression of the same individual. Pain detection is achieved by identifying specific AUs associated with pain as indicated by a simplified FACS-based scoring approach.^43,44 The model is trained to directly estimate a global composite pain intensity score of 0 (i.e., no pain) to 16 (i.e., extreme pain) for each facial expression on a frame-by-frame basis.

This investigation evaluated the system to accurately detect and monitor facial expressions of pain in real time on individual frames. A high-definition video camera was used to record participants’ facial expressions using front view positioning. As this system is ultimately intended for use in clinical settings (after an additional field study), an optimal pain threshold score for staff notification was needed. This pain threshold score is derived from the FACS-based scoring approach^43,44 that the algorithm uses to calculate a pain intensity score ranging from 0 to 16. Based on prior analyses and preliminary testing, the system-generated pain threshold score that was initially used was 0.2.^33,44 ROC analyses were conducted and initial results suggested that the recommended corresponding FACS-based pain threshold to be used to classify a frame as pain from no pain would be a score of 2 or 3 out of 16. Of note, while the system uses FACS-based coding in calculations, the system-generated scores do not correspond one-to-one with FACS-based scores in magnitude but trend in a similar manner.

The system was programmed to generate a notification that a pain expression was detected when the algorithm detected a facial pain expression over the system-based pain threshold score of 0.2 in five frames within a three-second timespan. This decision was based on the findings³³ that the algorithm achieved greater accuracy for rolling windows compared to single frames. Additionally, this approach to notification is a more ecologically valid method as this is how observers in clinical settings evaluate pain.³³

Procedure

Recruited primarily through advertisements, eligible participants were invited to the Health Psychology Laboratory to participate in the study. Upon arrival, the purpose and procedure of the study were described. After participants provided written and verbal consent to take part in the study, a demographic questionnaire was completed. They were then asked to sit in front of the video recording equipment and were given a demonstration of the TSA-II. They were provided with the instructions for the three thermal pain tasks and informed that they may immediately discontinue at any point during the pain tasks by verbally indicating to the experimenter. The thermode of the TSA-II was positioned on the participant’s non-dominant forearm at approximately the mid-forearm.

Once participants were positioned to begin testing, the automated pain behavior detection system was turned on and captured three reference images of the participant’s face displaying a neutral facial expression. Participants completed a baseline measurement for 120 s in which the thermode was heated to 32°C and then three thermal pain tasks. Participants completed self-report CoVAS pain intensity ratings during baseline and all thermal pain tasks and were filmed throughout.

For the pain threshold task, the thermode was set at 32°C and increased by 0.7°C per second up to the maximum temperature of 50°C and held for 5 s. Participants were asked to discontinue the task as soon as they felt any pain (i.e., the most minimal level of pain experienced). The temperature at which the task was discontinued was measured as the pain threshold temperature. If the task was not discontinued prior to task completion, the pain threshold was measured as 50°C. This task was repeated three times to calculate an average pain threshold temperature.⁶⁰ For the pain tolerance task, the thermode was set at 32°C and increased by 0.7°C per second up to the maximum temperature of 50°C and held for 5 s. Participants were asked to discontinue the task when they could no longer tolerate the pain. The temperature at which the task was discontinued was recorded as the pain tolerance temperature. If the task was not discontinued prior to task completion, the pain tolerance was measured as 50°C. This task was repeated three times to calculate an average pain tolerance temperature.⁶⁰ For the 5-trial pain-induction task, the thermode was set at 32°C and increased by 0.7°C per second up to the peak temperature of 49°C. Once the peak temperature was reached, it was held for 10 s before decreasing to the baseline temperature at a rate of 7°C per second where it was held constant for 10 s before increasing again for a total of five repetitions (see Figure 1).

Figure 1.

Graphical representation of the thermal pain stimulus application.

During the video recording, the algorithm of the system automatically processed the facial expressions of the participants. As the algorithm employs the FACS-based protocol⁴⁴ for coding facial pain expressions, the algorithm processed scores ranging from 0 to 16. When the algorithm detected a facial pain expression over the predetermined threshold score of 0.2 (algorithm score) in five frames within a three-second timespan, a light was turned on and an email notification sent to the experimenter. After completing the thermal pain tasks, participants were debriefed on the study.

Following testing, one trained coder examined the video recordings of participants completing the baseline task and the 5-trial pain-induction task and completed frame-by-frame manual FACS-based coding. A second trained coder coded a randomly selected 20% of the video recordings to calculate inter-rater reliability for absolute agreement. For the 5-trial pain-induction task, results demonstrated a high degree of reliability, r_icc = 0.845, (95% CI = 0.841-0.849). Inter-rater reliability was assessed using percent agreement for the baseline task due to a lack of variability in the manual coding. Results demonstrated a high degree of reliability with 98.7% agreement.

Analyses

Descriptive statistics were calculated for all dependent variables. The effectiveness of the TSA-II in inducing pain was examined through paired samples t-tests (baseline vs painful stimulation). To estimate system performance, the pain intensity coding by the system was compared against the gold standard FACS-based manual coding. ROC curve analyses were conducted to evaluate the performance of the system. A binary classification was used for all facial expressions (pain or non-pain) dependent on a FACS-based cut-off pain threshold score. The video recordings were re-analyzed retroactively accounting for the adjusted pain threshold cut-off scores. Therefore, sensitivity and specificity indices were calculated using the full range of possible FACS-based pain threshold cut-off scores from 1 to 16. ROC curves were estimated for all possible cut-off pain threshold scores and the AUC was examined. The closer the area’s value is to 1, the better the performance of the system. The significance (p < 0.01) of each AUC value was compared against a null AUC value of 0.5 indicating random chance. Separate ROC curve analyses were conducted as a function of gender.

To evaluate the relationship between the various pain assessment methods used in the study and stimulus temperatures, a series of correlations were calculated from the 5-trial pain-induction task. Pearson correlations were calculated between the gold standard FACS-based manual coding, system-generated scoring, continuous self-report CoVAS pain ratings, and TSA-II temperatures. To determine the relationship between specific pain-related facial AUs with system performance and average self-report CoVAS pain ratings, regression analyses were conducted.

Results

Baseline versus pain task comparisons

To confirm that pain was not experienced during the baseline task, the means and standard deviations of the measures were examined (see Tables 2 and 3). Shown in Table 2, these scores represent the average CoVAS pain ratings (throughout the entire time period of the thermal task) and the average system-generated scores. Consistent with expectation, scores on the CoVAS were higher during thermal pain stimulation compared to baseline. Paired samples t-tests were conducted to compare the system scoring during baseline compared to the thermal pain tasks. Analyses confirmed significantly higher scores during each of the thermal pain tasks compared to baseline (see Table 3). System scoring was also analyzed between instances of pain expressions (i.e., scored as non-zero by manual coding) compared to non-pain expressions (i.e., scored as zero by manual coding). Results showed that the system scoring during instances of pain expressions generated a mean of 0.262 (SD = 0.189) and during instances of non-pain expressions generated a mean of 0.145 (SD = 0.123).

Table 2.

Summary of descriptive statistics for baseline and thermal pain tasks.

Measure	Task	Min	Max	M	SD
CoVAS (0–100)	Baseline	0.00	0.00	0.00	0.00
	Pain Threshold	0.00	32.14	2.42	5.22
	Pain Tolerance	0.00	52.87	16.10	12.72
	5-Trial Pain Induction	0.00	54.77	20.72	14.69
Automated Pain Behaviour Detection System (0–5.33)¹	Baseline	0.02	0.45	0.12	0.08
	Pain Threshold	0.03	0.58	0.17	0.10
	Pain Tolerance	0.02	0.43	0.17	0.12
	5-Trial Pain Induction	0.02	0.49	0.19	0.12
Temperature (32–55°C)	Baseline	32.00	32.00	32.00	0.00
	Pain Threshold	34.53	50.45	44.43	3.94
	Pain Tolerance	41.26	50.67	49.10	1.55
	5-Trial Pain Induction	46.67	50.06	49.60	0.45

Note. ¹Based on the current study, the maximum pain detection was scored at 5.33 by the algorithm; however, greater scores are possible with greater facial pain expressiveness. CoVAS: computerized visual analogue scale. N = 65.

Table 3.

Comparison of baseline pain scores with thermal task pain scores.

Pain score measure	Thermal pain task	Baseline task		t(64)
Pain score measure	Thermal pain task	M	SD	t(64)
Automated Pain Behaviour Detection System	Pain Threshold	0.05	0.08	5.06*
	Pain Tolerance	0.05	0.08	4.96*
	5-Trial Pain Induction	0.07	0.10	6.08*

*p < 0.001. N = 65.

ROC curve analyses

Prior to conducting ROC curve analyses, the correspondence between gold standard manual coding and system-generated pain scoring was examined. See Table 4 for a summary of the comparison during the 5-trial pain-induction task. The correspondence between system-generated and FACS-based scores differs from participant to participant as a function of variability in the anatomy of their faces. As a result, the mean system-based pain score values do not increase linearly as a function of FACS-based pain scores albeit while trending toward a linear direction. In other words, the normalization of the system-generated scores and algorithmic scores follows a similar trend over time. As such, any given FACS-based pain score will be represented by a range of algorithmic values across participants rather than a single value.

Table 4.

Correspondence of manual FACS-based pain scoring and pain scoring by the automated pain behaviour detection system.

FACS-based scores	System-generated scores		Number of frames containing each score
FACS-based scores	M	SD	Number of frames containing each score
0	0.501	0.660	211,014
1	0.443	0.519	20,966
2	0.540	0.576	5,277
3	0.654	0.548	2,500
4	0.704	0.545	1,825
5	0.708	0.575	956
6	0.776	0.616	554
7	0.942	0.660	392
8	0.927	0.639	425
9	0.869	0.507	248
10	0.932	0.583	319
11	1.636	0.721	71
12	1.927	0.557	52
13	1.862	0.847	20
14	2.370	0.469	65
15	1.638	0.401	29
16	1.404	0.114	20

Note. FACS: Facial action coding system.

Sensitivity and specificity indices were used to evaluate the performance of the system to discriminate between pain and non-pain facial expressions on a frame-by-frame basis during the 5-trial pain induction task. This involved scoring each frame according to the gold standard manual coding. A binary classification was used for all facial expressions to discriminate facial pain expressions from non-pain expressions as determined by a gold standard FACS-based cut-off pain threshold score. Sensitivity and specificity indices were calculated using the full range of possible FACS-based cut-off pain threshold scores from 1 to 16 as varying cut-off pain threshold scores may be used in clinical care dependent on setting and population.

See Table 5 for a full summary of the results of the ROC curve analyses. Sixteen ROC curves were created, one for each score of the FACS-based pain scoring approach (see Figures 2–5).⁴⁴ Results suggested that higher cut-off pain threshold scores yield AUC values that are closer to 1; however, there appears to be an inflection point at which time the values increase only marginally (see Figure 6). For each of the ROC curves, the AUC was examined to determine the optimal system-associated criterion value to distinguish between pain and non-pain expressions that would maximize system performance in balancing the sensitivity and specificity values. Thus, sensitivity and specificity indices were calculated using the full range of possible system-associated criterion values as coordinates of the ROC curve. Bar plots and line graphs were used to visually compare manual coding and system scores. For an example of the sets of bar plots and line graphs generated for each participant, see Figure 7.

Table 5.

Range of sensitivity and specificity from optimal criterion values (system-generated scores) for ROC curves corresponding to each pain score of the FACS-based scoring approach.¹

Cut-off pain threshold score	AUC	Standard error	Optimal criterion value	Sensitivity	Specificity
1	0.641*	0.002	0.154	59.92	59.92
2	0.691*	0.003	0.172	64.59	64.60
3	0.750*	0.003	0.188	69.01	69.00
4	0.798*	0.004	0.201	72.40	72.40
5	0.867*	0.004	0.225	78.23	78.25
6	0.914*	0.004	0.252	83.37	83.40
7	0.941*	0.003	0.282	87.56	87.54
8	0.953*	0.003	0.302	89.31	89.42
9	0.949*	0.004	0.303	89.33	89.40
10	0.963*	0.004	0.342	92.22	92.36
11	0.980*	0.007	0.388	94.80	95.04
12	0.998*	0.000	0.538	98.34	98.12
13	0.998*	0.000	0.554	98.46	98.24
14	0.998*	0.000	0.600	98.18	98.28
15	0.998*	0.001	0.568	97.87	98.33
16	1.000*	0.000	3.230	95.00	99.96

Note. ¹Prkachin & Solomon, 2008. AUC: area under the curve. *p < 0.01. FACS: Facial action coding system.

Figure 2.

Receiver operating curves for facial action coding system-based cut-off pain scores 1–4.

Figure 3.

Receiver operating curves for facial action coding system-based cut-off pain scores 5–8.

Figure 4.

Receiver operating curves for facial action coding system-based cut-off pain scores 9–12.

Figure 5.

Receiver operating curves for facial action coding system-based cut-off pain scores 13–16.

Figure 6.

Area under the curve values for ROC curve analyses for each facial action coding system-based cut-off pain threshold value.

Figure 7.

Visual comparison of gold standard manual pain coding and pain intensity scores by the automated pain behavior detection system during the 5-trial pain-induction task. This figure shows an example participant. The top line graph depicts the gold standard manual pain coding scores (x-axis) over the entire duration of the thermal pain task (y-axis) in frames for one participant. The corresponding top bar graph simplify the manual pain coding scores by providing a visual representation of pain (red bars) versus non-pain (green bars) scoring over the cut-off pain threshold score of 4. The bottom line graph depicts the pain scoring by the automated system (x-axis) over the entire duration of the thermal pain task (y-axis) in frames for the same participant. The corresponding bottom bar graph simplifies the system scores by providing a visual representation of pain (red bars) versus non-pain (green bars) scoring over the criterion pain score of 0.2. The greater the size of the red bar indicates the length of time pain was scored as being present during the task.

Gender-based analyses

The performance of the system was explored as a function of gender. The sample was largely balanced as approximately 57% of participants were female and 43% of participants were male. The sample was also homogenous in terms of age. Two multivariate analyses of variance (MANOVAs) were conducted to compare the facial pain expressiveness of male and female participants during the 5-trial pain induction task.

Table 6 presents participants’ mean pain scores for the 5-trial pain induction task. A 2 (gender: male, female) x 2 (pain measure: manual coding, system-generated coding) mixed model MANOVA with repeated measures was conducted to examine gender differences in average pain scores during the entirety of the task. Results showed that there was no significant difference between genders. Additionally, participants’ mean pain scores during portions of the 5-trial pain-induction task when pain was present (i.e., frames coded as non-zero by manual coding) were only considered (see Table 6). A second 2 (gender: male, female) x 2 (pain measure: manual coding, system-generated coding) MANOVA was conducted to examine gender differences in average pain scoring during pain-related portions of the task. No significant difference between genders was identified.

Table 6.

Gender differences in pain scoring by the automated pain behaviour detection system and FACS-based manual coding.

Pain measure	Gender	M	SD	95% Confidence Interval
All Video Frames
Automated pain behaviour detection system	Male	0.19	0.02	0.14–0.23
	Female	0.20	0.02	0.16–0.23
FACS-based coding	Male	0.41	0.12	0.17–0.65
	Female	0.51	0.10	0.30–0.72
Video Frames with FACS Scores ≥ 1
Automated pain behaviour detection system	Male	0.29	0.07	0.15–0.44
	Female	0.36	0.07	0.22–0.49
FACS-based coding	Male	2.42	0.36	1.70–3.15
	Female	2.53	0.33	1.87–3.18

Note. FACS: Facial action coding system.

Secondary ROC curve analyses were conducted to explore gender differences. Sensitivity and specificity indices were used to evaluate system performance as a function of gender focusing on the 5-trial pain-induction task. Similar to the primary analyses, this involved classifying each frame according to gold standard manual coding such that a binary classification was used dependent on a FACS-based cut-off pain threshold score. Due to the inflection point at which the effectiveness of the system only increases marginally, sensitivity and specificity indices were calculated as a function of gender for only the lower end of the FACS-based cut-off pain threshold scores ranging from 1 to 4 out of 16. Gender-based ROC curves were generated (see Table 7) and are shown in Figures 8 and 9 respectively. Results suggested that system performance was consistently better for female participants compared to male participants.

Table 7.

ROC curve analysis results for gender-based exploration of the performance of the automated pain behaviour detection system using pain scores of the FACS-based scoring approach.¹

Cut-off pain threshold score	Gender	AUC	Standard error	Optimal criterion value	Sensitivity	Specificity
1	Male	0.612*	0.002	0.168	57.55	57.55
	Female	0.658*	0.003	0.131	61.90	61.92
2	Male	0.621*	0.004	0.178	60.32	60.30
	Female	0.757*	0.004	0.162	69.14	69.11
3	Male	0.669*	0.006	0.191	63.92	63.91
	Female	0.818*	0.004	0.181	73.55	73.58
4	Male	0.740*	0.007	0.200	67.08	67.05
	Female	0.844*	0.004	0.198	77.28	77.29

Note. ¹Prkachin & Solomon, 2008. AUC: area under the curve. *p < 0.001. FACS: Facial action coding system.

Figure 8.

Receiver operating curves for male participants for the FACS-based cut-off pain threshold scores of 1–4. Each graph indicates an example of a criterion value with the associated sensitivity and specificity values. AUC: area under the curve. FACS: Facial Action Coding System.

Figure 9.

Receiver operating curves for female participants for the FACS-based cut-off pain threshold scores of 1–4. Each graph indicates an example of a criterion value with the associated sensitivity and specificity values. AUC: area under the curve. FACS: Facial Action Coding System.

Correlational analyses

To evaluate the relationship between the various pain assessment measures (i.e., FACS-based manual coding, system-generated pain intensity scores, continuous CoVAS self-report pain ratings) and TSA-II temperatures, Pearson correlations were calculated (see Table 8).

Table 8.

Summary of Pearson correlations for pain assessment measures during 5-trial pain induction task.

Measure	FACS-based coding	CoVAS	FaceReader	Automated pain behaviour detection system
Video Frames with FACS Scores ≥ 1
FACS-based coding	1.00	0.06*	0.34*	0.60*
CoVAS		1.00	0.01	−0.07*
FaceReader			1.00	0.27*
Automated pain behaviour detection system				1.00
High Temperature Video Frames (46–50°C)¹
FACS-based coding	1.00	−0.03*	0.33*	0.55*
CoVAS		1.00	−0.02*	−0.04*
FaceReader			1.00	0.21*
Automated pain behaviour detection system				1.00
All Video Frames¹
FACS-based coding	1.00	0.11*	0.23*	0.48*
CoVAS		1.00	0.01*	0.05*
FaceReader			1.00	0.15*
Automated pain behaviour detection system				1.00

Note. ¹Correction for attenuation was used due to frequency of frames coded as 0. FACS: facial action coding system. CoVAS: computerized visual analogue scale. *p = 0.01. N = 65.

The first series of correlations examined the relationship between the measures for all frames that were coded as depicting pain-related facial movements (i.e., frames that did not receive a manual coding pain intensity score of 0) during the 5-trial pain-induction task. Over 32,000 data frames were included across all participants. Results demonstrated a moderate positive correlation between system scoring and gold standard manual coding when participants displayed pain-related facial movements, r(32534) = 0.548, p < 0.01.

The second series of correlations examined the relationship between the measures for frames that were coded during peaks of the 5-trial pain-induction task. Peaks were defined as frames that occurred when the TSA-II temperature reached 46°C and above. Correction for attenuation⁶¹ was used when calculating the correlations due to the high frequency of frames coded as 0 by gold standard manual coding. Over 67,000 frames of data were included across all participants. Results demonstrated a moderate positive correlation between system scoring and gold standard manual coding during peaks when thermal pain was maximally induced, r(67201) = 0.550, p < 0.01.

The third series of correlations examined the relationship between the measures for all frames of the entire 5-trial pain-induction task. The primary purpose of this correlation was to determine the relationship between simultaneous pain estimation generated by the system and continuous CoVAS self-report ratings. Correction for attenuation⁶¹ was used when calculating the correlations due to the considerable frequency of frames coded as 0 by gold standard coding. Over 230,000 frames of data were included across all participants. Results demonstrated a weaker positive correlation between system scoring and gold standard manual coding across all frames, r(231844) = 0.480, p < 0.01.

Correspondence of facial AUs to system-generated scores and self-report ratings

The final set of analyses was aimed to determine the relationship between specific pain-related facial AUs with a) system performance in identifying facial pain expressions; and b) the average self-report CoVAS pain ratings. These analyses were conducted for the 5-trial pain-induction task. Peak pain events, classified as 2 s surrounding the highest scored frame by the system (i.e., maximal facial pain expression) during the trial, were examined for each participant. To test the correspondence between facial AUs and both system-generated scoring and self-report ratings, two multiple linear regression analyses were conducted. Each regression examined whether the pain-related facial AUs predicted either system-generated pain scores or self-report CoVAS pain ratings, controlling for gender and age.

Prior to conducting the multiple linear regression analyses, Pearson correlations between the specific facial AU predictors, participant gender, participant age, and system pain scores (or self-report CoVAS pain ratings) were calculated. Summaries of the intercorrelations for the two full regression models are presented in Tables 9 and 10. The results of the two regression models are presented in Tables 11 and 12.

Table 9.

Intercorrelations for regression analysis predicting pain scoring by the automated pain behaviour detection system.

	1	2	3	4	5	6	7
1. Pain scores by the automated pain behaviour detection system¹	1.00	0.58*	0.61*	0.71*	0.06	0.10	0.02
2. Brow lowering (AU4)		1.00	0.65*	0.50*	0.03	−0.07	0.08
3. Orbit tightening (AU6/AU7)			1.00	0.74*	0.24*	0.10	0.09
4. Levator contraction (AU9/AU10)				1.00	0.10	0.10	−0.02
5. Eye closure (AU43)					1.00	−0.07	0.09
6. Participant gender						1.00	0.10
7. Participant age							1.00

Note. ¹Dependent variable. AU: action unit. Gender: 1 = male, 2 = female. *p < 0.05.

Table 10.

Intercorrelations for regression analysis predicting self-report CoVAS pain ratings by participants.

	1	2	3	4	5	6	7
1. CoVAS pain ratings¹	1.00	0.07	0.01	−0.13	0.08	−0.13	−0.01
2. Brow lowering (AU4)		1.00	0.65*	0.50*	0.03	−0.07	0.08
3. Orbit tightening (AU6/AU7)			1.00	0.74*	0.24*	0.10	0.09
4. Levator contraction (AU9/AU10)				1.00	0.10	0.10	−0.02
5. Eye closure (AU43)					1.00	−0.07	0.09
6. Participant gender						1.00	0.10
7. Participant age							1.00

Note. ¹Dependent variable. CoVAS: computerized visual analogue scale. AU: action unit. Gender: 1 = male, 2 = female. *p < 0.05.

Table 11.

Regression analyses examining the unique variance accounted for by the predictors of pain scoring by the automated pain behaviour detection system.

Variable	Beta	F(5,54)	p	R² change
Brow lowering (AU4)*	0.30	6.14	0.02	0.05
Orbit tightening (AU6/AU7)	−0.00	0.00	0.99	0.00
Levator contraction (AU9/AU10)*	0.56	16.89	0.00	0.14
Eye closure (AU43)	0.00	0.00	0.97	0.00
Participant gender	0.07	0.53	0.47	0.00
Participant age	0.00	0.00	0.99	0.00

Note. AU: action unit. Gender: 1 = male, 2 = female. *significant predictor.

Table 12.

Regression analyses examining the unique variance accounted for by the predictors of self-report CoVAS pain ratings by participants.

Variable	Beta	F(5,54)	p	R² change
Brow lowering (AU4)	0.11	0.35	0.56	0.01
Orbit tightening (AU6/AU7)	0.14	0.35	0.56	0.01
Levator contraction (AU9/AU10)	−0.28	1.99	0.16	0.04
Eye closure (AU43)	0.07	0.24	0.63	0.00
Participant gender	−0.10	0.55	0.46	0.01
Participant age	−0.04	0.07	0.80	0.00

Note. CoVAS: computerized visual analogue scale. AU: action unit. Gender: 1 = male, 2 = female.

A conservative approach was adopted for the regressions first examining the full model. If the full model was significant, each variable’s unique contribution to the prediction was examined after all other variables were entered into the equation. The full model for pain scoring by the system was significant in the prediction of system-generated pain scores, F(6,53) = 11.99, p < 0.001, R² = 0.576 (see Table 11). Collinearity among predictor variables was evaluated from the Variance Inflation Factor (VIF) values. Results suggested low collinearity among the two significant predictor variables of brow lowering and levator contraction as demonstrated by VIF values of 1.87 and 2.28 respectively, suggesting that these variables are unlikely to be excessively correlated as predictors. In further examining the model, results suggested that brow lowering predicted 5% of the unique variance and levator contraction predicted 14% of the unique variance in system-generated pain scoring. The full model for self-report CoVAS pain ratings was not significant.

Discussion

This was the first study to evaluate a novel computer vision algorithm³³ designed to detect and analyze facial pain behaviors in older adults, live in real-time in a laboratory environment. This initial evaluation of the automated pain behavior detection system supported its potential viability for detecting and analyzing facial pain expressions. As the ultimate aim of the system is to be used in LTC settings, where chronic pain is prevalent, an important first step was to test the performance of the system using live participants displaying genuine pain expressions in a laboratory environment.

Performance of the automated pain behavior detection system

Findings suggest that the system was able to differentiate live in real-time between instances of pain and non-pain facial expressions using system-generated pain intensity scores. System performance improved at greater intensities of facial pain expressiveness. Similar results have been found in previous studies examining pain expression detection algorithms.⁶² However, some investigations have shown a trend of improved performance of facial pain recognition algorithms when detecting less intense pain levels.^63,64 The lack of clarity in the literature is most likely a result of the training datasets used. Video recordings displaying facial pain expressions are difficult to obtain and do not tend to include a variety of pain intensities, especially higher intensities. Such an imbalance can lead to an algorithm being more adept at recognizing lower pain scores simply due to training exposure. However, lower pain scores often correspond to more subtle pain expressions which can be difficult for the algorithm to detect and interpret accurately. In other words, characteristics of high-pain expressions are much clearer across individuals than characteristics of low-pain expressions, thus allowing for better detection performance by the algorithm of the automated system. Throughout the literature, it is apparent that there is no single cut-off that defines a facial pain expression as determined by FACS-based manual pain coding. Therefore, it was decided to determine the ability of the system to detect pain expressions using a variety of classifiers (i.e., the sixteen possible FACS-based cut-off pain threshold scores).

A positive correlation was demonstrated between gold standard manual coding and pain behavior coding by the system. Results of this study suggested that per-frame predictions when considering all video frames achieved a comparable correlation coefficient of 0.48 compared to prior investigations with video recordings.³³ Correlational analyses demonstrated that system performance improved when restricting analysis only to pain-relevant frames compared to analyzing all frames. It is possible that this restriction allows for a reduction in noise data which are more likely to accompany frames showing non-pain-related facial expressions.

As the system is ultimately intended for use in LTC settings, it was important for it to be programmed to alert when pain expressions were identified. During testing, a light was turned on and email alerts were sent to the experimenter noting that a pain expression was detected. To the best of our knowledge, this is the only study to incorporate a fully automated algorithm with a notification alert system. This has practical implications, especially for clinical populations such as individuals with severe dementia who experience communicative difficulties.

Absence of one-to-one correspondence between system-generated scores and FACS-based scores

There does not appear to be a perfect one-to-one correspondence between scoring measures. As such, any given FACS-based pain score will correspond to a range of comparative system-generated pain scores which varies somewhat from participant to participant as a function of variability in the anatomy of their faces. Since there are individual differences in pain expression,⁶⁵ the degree and manner of deviation from their corresponding neutral baseline expression can vary significantly across individuals.

Facial expressions in response to pain do not always follow a linear pattern. For example, small increases in pain might not result in noticeable changes in facial expressions until a certain threshold is reached.^66,67 The algorithm may be more sensitive to certain types of facial movements (e.g., levator contraction and brow lowering) which could affect the consistency of pain expression scoring. Neural networks, such as the neural network model used by the algorithm in this study, often have a degree of prediction uncertainty, especially in complex tasks such as interpreting facial expressions.⁶⁸ This is because they operate by estimating probabilities based on the input data, and these probabilities can vary significantly even for similar inputs, so a neural network might struggle to map non-linear relationships accurately, resulting in a broad range of predicted values across participants.

Pain behaviors and self-reported pain

This study aimed to elucidate the co-variation between facial pain expressions and continuous, real-time self-report pain ratings as single retrospective self-report pain ratings are mainly utilized in the literature. The weak associations between self-reported pain ratings and non-verbal facial expressions of pain were unsurprising. In the literature, the relationship varies from no association to weak or moderate associations.^69,70 Kunz et al.⁶⁷ suggested that the activation threshold, defined as the minimum level of pain stimulation needed to elicit a nonverbal pain behavior, may not correspond perfectly to verbal pain reports. This explanation is supported by facial pain expressions being described as part of a late signaling system such that while pain is being experienced, and thus communicated verbally, facial cues might be delayed in their presentation until a certain pain intensity is reached.⁷¹ Kunz et al.⁷² demonstrated that social motives could be at play in the discrepancy between subjective experienced pain and facial expressions such that an atypical expression of smiling during pain can occur during experimental pain.

Gender differences in pain expression detection

The system was better at correctly identifying facial pain expressions in female participants compared to male participants. While analyses showed no statistically significant differences in facial pain expressions, mean values for manual FACS-based coding and system-generated pain coding were slightly greater for female participants. Based on previous testing using video recordings, there also appeared to be a slight difference in algorithmic performance as a function of gender.³³ It is possible that the gender difference seen during live testing was a result of the algorithm being trained more on female faces.

Few research studies examining automated facial expression recognition by other algorithms have investigated the role of variables such as gender in evaluating model performance.²⁹ That said, some facial analysis models tend to perform better on male faces than female faces.⁷³ It has been posited that the gender bias may be due to certain datasets containing largely male faces compared to female faces.⁷³ This could explain the reasons that the algorithm in this study performed better on female faces due to the original training dataset containing more female faces.

The role of pain-related facial AUs

Consistent with expectations, brow lowering and levator contraction made unique and independent contributions to the prediction of the performance of the system in generating pain scores. These findings are consistent with prior investigations.^74–76 The approach of the algorithm in which the system discerns facial pain expressions is similar to human observations. It has been demonstrated that brow lowering and levator contraction are the most salient facial cues observers utilize in their mental representations of facial pain expressions in others.⁷⁷ While certain facial cues are strongly related to the broader facial pain expressions, there may be different configurations or characteristic pain faces that combine the pain-related facial cues differently.⁶⁵

Limitations and future directions

This study represents an essential live test of a pain behavior detection algorithm performance prior to further testing in clinical settings. Although pain was experimentally induced, it is likely that the system would be able to detect and analyze facial pain expressions in individuals experiencing pain due to pathology. It is recognized that the detection of pain expressions in community-dwelling older adults may be less challenging than detection in samples of LTC residents with dementia.²⁹ Future evaluations using a sample of older chronic pain patients in a non-laboratory setting would more closely determine system performance in naturalistic situations.

Biases in algorithm performance have been found for several variables including ethnicity/race such that algorithms tend to exhibit poorer performance in accurately detecting facial expressions with darker skin tones.^73,78 Due to the homogeneity of the participant sample, it was not possible to explore algorithm performance as a function of race or ethnicity. It is necessary that future studies continue to develop datasets that include a distribution of participants across varying ethnicities and races to train facial analysis algorithms and subsequently evaluate the performance of algorithms using a diverse sample.

Conclusion

This was the first investigation of a newly developed computer vision algorithm, designed from datasets of older adults with and without dementia, using live observations and in real time. Findings supported the efficacy of the automated pain behavior detection system in successfully detecting and monitoring facial pain expressions of varying intensities in community-dwelling older adults. Despite recognizing the necessity of implementing pain assessment procedures in LTC, human resource limitations impact the availability of regular, standardized pain assessments. Technology has the potential to provide a complementary method for effective and validated monitoring of pain behavior in vulnerable populations who often experience inadequate pain management. Continuous, automated assessment of pain behavior is expected to lead to better quality of life for older adults in LTC and a decreased burden for caregivers.

Footnotes

Acknowledgments

The authors thank Vivian Tran, Louise Castillo, and Laney Yarycky for their help with the data collection.

Author contributions

R.S. contributed to the conceptualization of the study, and played a primary role in data analysis, interpretation, conceptualization and manuscript write up. A.M. contributed substantially to data analysis and interpretation. B.T. and A.M. contributed substantially by providing the algorithm for the automated system. T.H. and B.T. prepared the grant that funded this project and made major contributions to interpretation of the results and write up of the manuscript. T.H. oversaw and contributed to all aspects of this project including study conceptualization.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part through a grant from the AGE-WELL Networks of Centres of Excellence (grant # AWCAT-2019-14). Publication of this article was supported by funding from the Canadian Institutes of Health Research (#BET-190800).

Ethical statement

ORCID iD

Thomas Hadjistavropoulos

References

Gagliese

Gauthier

Narain

, et al. Pain, aging and dementia: towards a biopsychosocial model. Prog Neuropsychopharmacol Biol Psychiatry 2018; 87: 207–215.

Van Kooten

Binnekade

Van Der Wouden

, et al. A review of pain prevalence in Alzheimer’s, vascular, frontotemporal and lewy body dementias. Dement Geriatr Cogn Disord 2016; 41: 220–232.

Achterberg

Lautenbacher

Husebo

, et al. Pain in dementia. PAIN Rep 2020; 5: e803.

Corbett

Husebo

Malcangio

, et al. Assessment and treatment of pain in people with dementia. Nat Rev Neurol 2012; 8: 264–274.

Husebo

Achterberg

Lobbezoo

, et al. Pain in patients with dementia: a review of pain assessment and treatment challenges. Nor Epidemiol 2012; 22: 243–251. DOI: 10.5324/nje.v22i2.1572.

Malara

De Biase

Bettarini

, et al. Pain assessment in elderly with behavioral and psychological symptoms of dementia. J Alzheimers Dis 2016; 50: 1217–1225.

Morrison

Siu

. A comparison of pain and its treatment in advanced dementia and cognitively intact patients with hip fracture. J Pain Symptom Manage 2000; 19: 240–248.

Herr

Bjoro

Decker

. Tools for assessment of pain in nonverbal older adults with dementia: a state-of-the-science review. J Pain Symptom Manage 2006; 31: 170–192.

Hadjistavropoulos

Herr

Prkachin

, et al. Pain assessment in elderly adults with dementia. Lancet Neurol 2014; 13: 1216–1227.

10.

Scherder

EJA

Plooij

. Assessment and management of pain, with particular emphasis on central neuropathic pain, in moderate to severe dementia. Drugs Aging 2012; 29: 701–706.

11.

Weiner

Peterson

Ladd

, et al. Pain in nursing home residents: an exploration of prevalence, staff perspectives, and practical aspects of measurement. Clin J Pain 1999; 15: 92–101.

12.

Björk

Juthberg

Lindkvist

, et al. Exploring the prevalence and variance of cognitive impairment, pain, neuropsychiatric symptoms and ADL dependency among persons living in nursing homes; a cross-sectional study. BMC Geriatr 2016; 16: 154.

13.

Cheung

ENM

Benjamin

Heckman

, et al. Clinical characteristics associated with the onset of delirium among long-term nursing home residents. BMC Geriatr 2018; 18: 39.

14.

Fuchs-Lacelle

Hadjistavropoulos

Lix

. Pain assessment as intervention: a study of older adults with severe dementia. Clin J Pain 2008; 24: 697–707.

15.

Hadjistavropoulos

Kaasalainen

Williams

, et al. Improving pain assessment practices and outcomes in long-term care facilities: a mixed methods investigation. Pain Manag Nurs 2014; 15: 748–759.

16.

Gallant

Peckham

Marchildon

, et al. Provincial legislative and regulatory standards for pain assessment and management in long-term care homes: a scoping review and in-depth case analysis. BMC Geriatr 2020; 20: 458.

17.

Hadjistavropoulos

Craig

Duck

, et al. A biopsychosocial formulation of pain communication. Psychol Bull 2011; 137: 910–939.

18.

Sixsmith

. Technology and the challenge of aging. In: Sixsmith

Gutman

(eds). Technologies for active aging. Boston, MA: Springer US, 2013, pp. 7–25.

19.

Afifi

Collins

Rand

, et al. Testing the feasibility of virtual reality with older adults with cognitive impairments and their family members who live at a distance. Innov Aging 2021; 5: igab014.

20.

Moyle

Bramble

Jones

, et al. “She had a smile on her face as wide as the great Australian bite”: a qualitative examination of family perceptions of a therapeutic robot and a plush toy. Gerontologist 2019; 59: 177–185.

21.

Vandenberg

Van Beijnum

B-J

Overdevest

VGP

, et al. US and Dutch nurse experiences with fall prevention technology within nursing home environment and workflow: a qualitative study. Geriatr Nur (Lond) 2017; 38: 276–282.

22.

Zhao

Sazlina

S-G

Rokhani

, et al. The expectations and acceptability of a smart nursing home model among Chinese elderly people: a mixed methods study protocol. PLoS One 2021; 16: e0255865.

23.

Dolatabadi

Zhi

Flint

, et al. The feasibility of a vision-based sensor for longitudinal monitoring of mobility in older adults with dementia. Arch Gerontol Geriatr 2019; 82: 200–206.

24.

Husebo

Heintz

Berge

, et al. Sensing technology to monitor behavioral and psychological symptoms and to assess treatment response in people with dementia. A systematic review. Front Pharmacol 2020; 10: 1699.

25.

Ashraf

Lucey

Cohn

, et al. The painful face – pain expression recognition using active appearance models. Image Vis Comput 2009; 27: 1788–1796.

26.

Chen

Ansari

Wilkie

. Automated pain detection from facial expressions using FACS: a review. https://arxiv.org/abs/1811.07988 (2018 accessed 9 August 2024).

27.

Lucey

Cohn

Prkachin

, et al. Painful data: the UNBC-McMaster shoulder pain expression archive database. In: Face and gesture 2011, Santa Barbara, CA, USA, 21–25 March 2011. IEEE, pp. 57–64.

28.

Werner

Lopez-Martinez

Walter

, et al. Automatic recognition methods supporting pain assessment: a survey. IEEE Trans Affect Comput 2022; 13: 530–552.

29.

Taati

Zhao

Ashraf

, et al. Algorithmic bias in clinical populations—evaluating and improving facial analysis technology in older adults with dementia. IEEE Access 2019; 7: 25527–25534.

30.

Drozdowski

Rathgeb

Dantcheva

, et al. Demographic bias in biometrics: a survey on an emerging challenge. IEEE Trans Technol Soc 2020; 1: 89–103.

31.

Asgarian

Zhao

Ashraf

, et al.

Limitations and biases in facial landmark detection -- an empirical study on older adults with dementia

2019. Epub ahead of print 2019. DOI: 10.48550/ARXIV.1905.07446.

32.

White

Kalkan

, et al. Investigating bias and fairness in facial expression recognition. https://arxiv.org/abs/2007.10075 (2020, accessed 9 August 2024).

33.

Rezaei

Moturu

Zhao

, et al. Unobtrusive pain monitoring in older adults with dementia using pairwise and contrastive training. IEEE J Biomed Health Inform 2021; 25: 1450–1462.

34.

Grathwohl

Wang

K-C

Jacobsen

J-H

, et al. Your classifier is secretly an energy based model and you should treat it like one. https://arxiv.org/abs/1912.03263 (2020, accessed 9 August 2024).

35.

Ekman

Friesen

. Facial action coding system. Palo Alto, CA: Consulting Psychologist Press, 1978.

36.

Ekman

Friesen

Hager

. Facial action coding system. Network Information Research Corp, 2002.

37.

Hill

Craig

. Detecting deception in pain expressions: the structure of genuine and deceptive facial displays. Pain 2002; 98: 135–144.

38.

Craig

Prkachin

Grunau

. The facial expression of pain. In: Turk

Melzack

(eds). Handbook of pain assessment. New York: The Guilford Press, 2011, pp. 117–133.

39.

Lints-Martindale

Hadjistavropoulos

Barber

, et al. A psychophysical investigation of the facial action coding system as an index of pain variability among older adults with and without Alzheimer’s disease. Pain Med 2007; 8: 678–689.

40.

Beach

Huck

Miranda

, et al. Effects of Alzheimer disease on the facial expression of pain. Clin J Pain 2016; 32: 478–487.

41.

Bunk

Zuidema

Koch

, et al. Pain processing in older adults with dementia-related cognitive impairment is associated with frontal neurodegeneration. Neurobiol Aging 2021; 106: 139–152.

42.

Hadjistavropoulos

Browne

Prkachin

, et al. Pain in severe dementia: a comparison of a fine‐grained assessment approach to an observational checklist designed for clinical settings. Eur J Pain 2018; 22: 915–925.

43.

Prkachin

. The consistency of facial expressions of pain: a comparison across modalities. Pain 1992; 51: 297–306.

44.

Prkachin

Solomon

. The structure, reliability and validity of pain expression: evidence from patients with shoulder pain. Pain 2008; 139: 267–274.

45.

Coll

M-P

Grégoire

Prkachin

, et al. Repeated exposure to vicarious pain alters electrocortical processing of pain expressions. Exp Brain Res 2016; 234: 2677–2686.

46.

Gallant

Hadjistavropoulos

. Experiencing pain in the presence of others: a structured experimental investigation of older adults. J Pain 2017; 18: 456–467.

47.

Hampton

AJD

Hadjistavropoulos

Gagnon

. Contextual influences in decoding pain expressions: effects of patient age, informational priming, and observer characteristics. Pain 2018; 159: 2363–2374.

48.

Rash

Prkachin

Solomon

, et al. Assessing the efficacy of a manual‐based intervention for improving the detection of facial pain expression. Eur J Pain 2019; 23: 1006–1019.

49.

Huskisson

. Measurement of pain. The Lancet 1974; 2: 1127–1131.

50.

Bergeron-Vézina

Corriveau

Martel

, et al. High- and low-frequency transcutaneous electrical nerve stimulation does not reduce experimental pain in elderly individuals. Pain 2015; 156: 2093–2099.

51.

Daguet

Bergeron-Vezina

Harvey

M-P

, et al. Decreased initial peak pain sensation with aging: a psychophysical study. J Pain Res 2020; 13: 2333–2341.

52.

Eisenach

Curry

Aschenbrenner

, et al. Pupil responses and pain ratings to heat stimuli: reliability and effects of expectations and a conditioning pain stimulus. J Neurosci Methods 2017; 279: 52–59.

53.

Moana‐Filho

Herrero Babiloni

Nisley

. Endogenous pain modulation assessed with offset analgesia is not impaired in chronic temporomandibular disorder pain patients. J Oral Rehabil 2019; 46: 1009–1022.

54.

Ezenwa

Molokie

Wang

, et al.

Safety and utility of quantitative sensory testing among adults with sickle cell disease: indicators of neuropathic pain?

Pain Pract 2016; 16: 282–293.

55.

Salame

Blinkhorn

Karami

. Neurological assessment using a quantitative sensory test in patients with chronic unilateral orofacial pain. Open Dent J 2018; 12: 53–58.

56.

Van Den Bosch

Van Dijk

Tibboel

, et al. Thermal quantitative sensory testing in healthy Dutch children and adolescents standardized test paradigm and Dutch reference values. BMC Pediatr 2017; 17: 77.

57.

De Kruijf

Peters

C JacobsTiemeier

, et al. Determinants for quantitative sensory testing and the association with chronic musculoskeletal pain in the general elderly population. Pain Pract 2016; 16: 831–841.

58.

Naugle

Ohlman

Naugle

, et al. Physical activity behavior predicts endogenous pain modulation in older adults. Pain 2017; 158: 383–390.

59.

Neziri

Scaramozzino

Andersen

, et al. Reference values of mechanical and thermal pain tests in a pain‐free population. Eur J Pain 2011; 15: 376–383.

60.

Lue

Y-J

Wang

H-H

Cheng

K-I

, et al. Thermal pain tolerance and pain rating in normal subjects: gender and age effects. Eur J Pain 2018; 22: 1035–1042.

61.

Mendoza

Mumford

. Corrections for attenuation and range restriction on the predictor. J Educ Stat 1987; 12: 282.

62.

Bargshady

Zhou

Deo

, et al. Enhanced deep learning algorithm development to detect pain intensity from facial expression images. Expert Syst Appl 2020; 149: 113305.

63.

Egede

Valstar

Martinez

. Fusing deep learned and hand-crafted features of appearance, shape, and dynamics for automatic pain estimation. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017. IEEE, pp. 689–696.

64.

Xin

Lin

Yang

, et al. Pain intensity estimation based on a spatial transformation and attention CNN. PLoS One 2020; 15: e0232412.

65.

Kunz

Lautenbacher

. The faces of pain: a cluster analysis of individual differences in facial activity patterns of pain. Eur J Pain 2014; 18: 813–823.

66.

Craig

Patrick

. Facial expression during induced pain. J Pers Soc Psychol 1985; 48: 1080–1091.

67.

Kunz

Mylius

Schepelmann

, et al. On the relationship between self-report and facial expression of pain. J Pain 2004; 5: 368–376.

68.

Gawlikowski

Tassi

CRN

Ali

, et al. A survey of uncertainty in deep neural networks. Artif Intell Rev 2023; 56: 1513–1589.

69.

Gomutbutra

Kittisares

Sanguansri

, et al. Classification of elderly pain severity from automated video clip facial action unit analysis: a study from a Thai data repository. Front Artif Intell 2022; 5: 942248.

70.

Labus

Keefe

Jensen

. Self-reports of pain intensity and direct observations of pain behavior: when are they correlated? Pain 2003; 102: 109–124.

71.

Prkachin

Craig

. Expressing pain: the communication and interpretation of facial pain signals. J Nonverbal Behav 1995; 19: 191–205.

72.

Kunz

Prkachin

Lautenbacher

. Smiling in pain: explorations of its social motives. Pain Res Treat 2013; 2013: 1–8.

73.

Buolamwini

Gebru

. Gender shades: intersectional accuracy disparities in commercial gender classification. In: Proceedings of machine learning research, 2018, pp. 77–91.

74.

Atee

Hoti

Chivers

, et al. Faces of pain in dementia: learnings from a real-world study using a technology-enabled pain assessment tool. Front Pain Res 2022; 3: 827551.

75.

Mieronkoski

Syrjälä

Jiang

, et al. Developing a pain intensity prediction model using facial expression: a feasibility study with electromyography. PLoS One 2020; 15: e0235545.

76.

Coppieters

Smalbrugge

, et al. Associations between facial expressions and observational pain in residents with dementia and chronic pain. J Adv Nurs 2024; 80: 3846–3855.

77.

Blais

Fiset

Furumoto-Deshaies

, et al. Facial features underlying the decoding of pain expressions. J Pain 2019; 20: 728–738.

78.

Deng

. A deeper look at facial expression dataset bias. IEEE Trans Affect Comput 2022; 13: 881–893.

Real-time evaluation of an automated computer vision system to monitor pain behavior in older adults

Abstract

Keywords

Introduction

Method

Participant selection

Measures

Demographic questionnaire

Facial action coding system (FACS)

Computerized visual analogue scale (CoVAS)

Equipment and software

TSA-II neurosensory analyzer

Automated pain behavior detection system

Procedure

Analyses

Results

Baseline versus pain task comparisons

ROC curve analyses

Gender-based analyses

Correlational analyses

Correspondence of facial AUs to system-generated scores and self-report ratings

Discussion

Performance of the automated pain behavior detection system

Absence of one-to-one correspondence between system-generated scores and FACS-based scores

Pain behaviors and self-reported pain

Gender differences in pain expression detection

The role of pain-related facial AUs

Limitations and future directions

Conclusion

Footnotes

Acknowledgments

Author contributions

Declaration of conflicting interests

Funding

Ethical statement

ORCID iD

References