Abstract
The Group Observational Measurement of Engagement (GOME) was developed to capture the impact of group recreational activities on the engagement and general wellbeing of persons with dementia. The psychometric properties of the GOME were originally described in a study of group activities conducted at one large Canadian geriatric center. Continuing this work in Israel, this article reports on further psychometric properties of the GOME based on observations of 115 persons with dementia from 10 geriatric units, of which four were senior day center units (in three institutions) and six were nursing units (representing five other institutions). Very good inter-rater reliability between research observers was found. Factor analysis suggests that the GOME’s four individual-level outcomes can be combined into one indicator, the Wellbeing Index. Validity, examined via agreement between research observers and group activity leaders who were staff members in the facilities where the group activities were conducted, also showed high levels of positive correlations. The GOME provides a practical tool for assessing wellbeing in the context of group activities. It can be useful in clarifying the relative impact of process variables on participants’ general wellbeing.
Keywords
Introduction
Group activities present persons with dementia with opportunities for social engagement and exposure to stimulating and enjoyable content. Such activities can ameliorate loneliness, and motivate participants to use their retained skills and abilities. These activities may also reduce behavioral challenges in this population (Cohen-Mansfield et al., 2010), enhance wellbeing, and slow decline (Cohen-Mansfield, 2018; DeVries et al., 2019).
Group activities for persons with dementia are rarely assessed for efficacy. Only with reliable assessment tools is it possible to assess their impact. Without monitoring, activity leaders may not conduct scheduled group activities, or may conduct activities that do not benefit participants (Buettner & Fitzsimmons, 2003).
Five assessment instruments of engagement of persons with dementia were identified via a literature search. Good reliability and validity were reported for all of them.
The Assessment Scale for Engagement in Activities (ASEA) (Tanaka et al., 2021, 2022). The ASEA includes 10 items and has been used by occupational therapists in assessing the response of persons with dementia to activities in an acute-phase psychiatric hospital and a recuperation hospital in Japan.
The Menorah Park Engagement Scale (MPES) (Skrajner & Camp, 2007) is an observational tool with 11 items. The main outcomes divide engagement into four types: Constructive Engagement, Passive Engagement, Non-Engagement, and Other Engagement. Several items examine affect, such as the presence of pleasure. It has been used by research observers in nursing homes, in a community dwelling sample (Chan et al., 2021), and in acute care settings (Cheong et al., 2016).
The modified version of the Greater Cincinnati Chapter Well-Being Observation Tool© (Sauer et al., 2016) includes 25 items, which target the domains of wellbeing (social interest, engagement, and pleasure); and of ill-being (disengagement, negative affect, sadness, and confusion). Observations by research coders rated each 5-minute interval of videotapes of the activity sessions.
The Engagement of a Person with Dementia Scale (EPWDS) (Jones et al., 2018) includes 10 items and was validated using videos of persons with dementia in long term care facilities interacting with a companion seal robot (PARO). The 10 items capture five domains: affective engagement (positive affect and negative affect), visual engagement (visually engaged, visually avoidant), verbal engagement, behavioral engagement, and social engagement. Validity was established against an apathy rating scale.
The Group Observational Measurement of Engagement (GOME) was created in order to promote accountability and enable a science of optimizing group activities for persons with dementia using an instrument that was developed to be intuitively understood by both group leaders and research observers (Cohen-Mansfield et al., 2017). It is based on the study of engagement with activities of individuals with dementia, the OME (Observational Measurement of Engagement; Cohen-Mansfield et al., 2009). The psychometric properties (inter-rater reliability and validity) of the GOME, the group-based assessment, were found satisfactory in a report based on 10 group activity topics within a large Canadian geriatric facility (Cohen-Mansfield et al., 2017; Cohen-Mansfield, 2017). The OME has been used in multiple locations and settings (e.g., Cohen-Mansfield et al., 2009; D’Onofrio et al., 2019; Feng et al., 2020; Leone et al., 2012; Nordgren et al., 2022; Perugia et al., 2018).
Several characteristics of the GOME render it potentially useful for improving clinical practice: The scales do not require specialized training or recording of the activities; they have been used to capture the whole activity period. The GOME is the only assessment that includes not only an individual section, that is, rating the responses of each individual, but also ratings of the group as a whole. This is important because (1) it enables raters to rate via individual or group measures or both; and (2) some groups are too large for an observer to retain the information for each participant. Thus, group measures provide information which would not be captured in individual measures.
The GOME has been used to examine the relative impact of background characteristics of activity group participants on their response to diverse types of group activities (Cohen-Mansfield, 2017), the impact of environmental variables on their response (Cohen-Mansfield, 2020), and the impact of group activity content on their response (Cohen-Mansfield, 2018). In this article, we report further validation of the GOME within a more heterogeneous sample in Israel, rather than Canada, and examine whether the GOME’s individual outcome measures can be combined into one index.
Methods
This is an observational methodological study examining the reliability, validity, and utility of the GOME.
Participants
This study was approved by the Institutional Review Board of Tel Aviv University (Project 0000520-3). The study involved participants from six nursing home units, and four senior day center units in the Tel Aviv and Jerusalem metropolitan areas. These units, all devoted to the care of persons with dementia, belonged to five institutions with nursing care units and three senior day centers. Staff members at the different facilities obtained informed consent for 126 individuals to be observed for the study. Of those, 11 never participated in any of the group activities. This article is therefore based on data concerning the remaining 115 participants. The main inclusion criterion was that the unit was designated for care of persons with dementia. Exclusion criteria were: (1) no dexterity movement in either hand, (2) inability to be comfortably seated in a chair or wheelchair, or inability to be moved to the site of the group activity. Research staff did not have access to the units’ participant files nor to information from family members.
Background Characteristics
All participants were Caucasian. Over three-quarters of the participants were female (77.39%, Table 1). Participant average age was 83.58 years (SD = 7.61, ranging from 55 to 99 years). Participant Activities of Daily Living (ADL) average rating [the average of the following items as measured on the Minimum Data Set (Morris et al., 1999): dressing, eating, bathroom, hygiene, and mobility] was 1.77 (SD = 1.3; range 0–4; Scale: 0 = “independent” to 4 = “total dependence”). Cognitive functioning, assessed via the Cognitive Performance Scale (CPS) (Morris et al., 1994) averaged 3.27 (SD = 1.68; range 0–6; Scale: 0 = “Intact” to 6 = “Very Severe impairment”). This average CPS score is equivalent to a Mini Mental State Examination rating of 14 according to Wellens et al. (2013), though it is closer to a 9 rating according to Hartmaier et al. (1995). The sample was highly heterogeneous, ranging from being analphabetic, that is, individuals who immigrated from countries where they received no education at all, to individuals who were highly educated, and from secular to ultra-orthodox in religious outlook and lifestyle (in different facilities). Participants attending nursing homes differed significantly from those attending senior day centers on several background variables. Nursing home residents were older (85.7 years vs. 81.4, t112 = 3.10, p < .01), were more impaired in their ADLs (2.4 vs. 1.1, t113 = 6.52, p < .001), and were more impaired in their cognitive function as measured by the CPS (3.7 vs. 2.8, t113 = 2.88, p < .01). Differences were not significant for sex of participants, education, or marital status.
Participant Demographics and Functional Background (n = 115).
Note. an = 114.
n = 92.
CPS: Cognitive Performance Scale. Scale: 0—“Intact” to 6—“Very severe impairment.”
ADL: Activities of Daily Living Scale. Calculated as the mean of the scales for five tasks: dressing, eating, bathroom, hygiene, and mobility. Scale: 0—“Independent” to 4 —“Total Dependence.”
Procedure
Originally, we prepared 43 group activity kits on the basis of our experience in Canada (Cohen-Mansfield et al., 2016), with four additional kits developed during the study. Materials for specific group activities were tailored to meet the needs of the specific unit’s participants, and included items such as large-printed booklets for reading groups, large-printed song books for choral groups, USB memory sticks with singing performances for choral groups, art materials and supplies for creative arts projects, and PowerPoint presentations to demonstrate the content of the activity and enhance group participation.
Our preparatory work with two facilities began just prior to the emergence of COVID-19 in Israel. During the pandemic, research observers were prohibited from entering these facilities, and the study was halted for about a year. Given the time limitations on the study’s funding, we asked one nursing home to participate in the study without the onsite presence of our research observers, and provided guidance via telephone and delivered the group activity materials to the front door of the facility. The activity leaders in this facility completed the questionnaires we supplied. After demonstration of the viability of use of the group activity kits in this facility, we proceeded to conduct the study in other facilities using a stepped wedge design (Hemming, 2022). With the exception of the first facility, a research observer observed all the group activities. Due to financial constraints, we were not able to place a second research observer at facilities at all times, but a second research observer was present whenever possible. The data for the inter-rater reliability aspect of this study were derived only from group activities that were independently observed by two research observers. Data collection took place from 2021 to 2023.
One group activity was conducted per session, and the order of activities was randomized for each unit. Activity leaders were requested to conduct all the group activities for which we provided kits, but in one unit, activity leaders declined to conduct certain group activities on the ground that they would be inappropriate for their ultra-orthodox audience. In another unit, the activity leader left the facility after about half the activities had been conducted. Although the facility set out to replace this activity leader, it was unable to do so. There were cases when an activity leader felt unable to complete the group activity within the allotted time (usually an hour), and was permitted to continue with the same activity during the next session; this occurred for 17 of the 47 activity kits. For the analysis, we used the average scores obtained from the two sessions. Each unit completed between 20 and 47 of the group activity kits (average = 43.3). During the course of the study, 11 participants left the unit, and 14 died—in both instances for reasons unrelated to the study. At the end of each group activity, research observers and the activity leader who conducted the group activity independently completed the same assessment. In order to reduce the burden on activity leaders, who were not compensated for their participation in the study, some questions were deleted from their assessment forms.
Assessments
Assessment of Engagement
The Group Observational Measurement of Engagement (GOME). The assessment included the following items: Outcome measures on an individual level pertaining to each participant for whom we had an informed consent:
Attendance duration: rated on a 7-point scale from 0 = “none of the time” to 6 = “all of the time.”
Engagement: measured the amount of time the participant was attentive to the group activity on a 6-point scale from 0 = “none of the time” to 5 = “most or all of the time.”
Positive mood: measured the degree to which the participant manifested a positive mood via expressions of happiness, smiles, positive talk, etc., on a 5-point scale from 0 = “not at all” to 4 = “very much.”
Active participation: measured the extent to which the participant actively partook in the group activity on a 5-point scale from 0 = “not at all” to 4 = “very much.”
Attitude: measured the participant’s approach toward the group activity on a 7-point scale from 1 = “very negative” to 7 = “very positive.”
Group level assessments pertaining to the group as a whole, including, but not limited to, those for whom informed consent was obtained:
Number of participants in the group: How many people were present in the group.
Positive interaction (e.g., smiling, mutual encouragement among group members) was rated on a 6-point scale from 0 = “none” to 5 = “very high” (more than 10 interactions).
Negative interactions (e.g., angry comments) among group participants measured on a 6-point scale from 0 = “none” to 6 = “very high” (more than 10 interactions).
Interest in the activity: percentage of participants that showed interest in the group activity.
Active participation in the activity: percentage of participants that actively participated in the group activity.
Activity enjoyment: percentage of participants that showed enjoyment or improvement in mood.
Whereas the research observers assigned as stated above, ratings for all individual and group level assessments, the activity leader who conducted the group activity completed only some of the questions, as specified in Table 3. Data for the GOME were recorded through direct observations on pre-prepared Excel spreadsheets.
Background Assessments
Background information about the participants was gathered by unit staff specifically for this study, as facilities did not have a consistent dataset. This information included demographic variables (date of birth, sex, marital status, and years of education), cognitive function data via the CPS (Morris et al., 1994), and ADL based on the mean of the following items as used in the MDS (Morris et al., 1999): dressing, eating, bathroom, hygiene, and mobility.
Analytic Approach
The study had three goals: (1) to examine inter-rater reliability between trained research observers; (2) to examine whether the four main outcome variables, engagement, active participation, positive mood, and attitude, can be subsumed under one index; and (3) to examine the validity of the GOME by examining the correlation between ratings among research observers and ratings completed by the activity leader who conducted the group activity.
Inter-Rater Reliability Assessments
After checking the number of sessions in which there were two research observers, we selected the largest numbers of joint observations. For each pair of research observers, we developed clusters of joint observations by choosing the group activities in which the largest number of persons participated. When there was a choice between group activities with the same number of participants, we selected one at random. If some of the potential participants did not participate in a group activity, other group activities were chosen at random. Each final cluster had a maximum number of different participants, that is, no participant was included more than once. For individual level assessments, we used nine clusters, and for group activities’ reliability assessments, we used 6 clusters. Since the unit of analysis for group variables is an institutional unit, the maximum possible number of observations within a cluster was 10. Since the first unit that participated in the study conducted the group activities during COVID-19, mostly without research observers, only two of the clusters reached 10 observations (i.e., 10 units). The other four clusters involved nine observations. For each cluster of observations, we examined the intra-class correlation coefficients (ICC) between the two research observers and the inter-rater agreement rate (with agreement defined as 0–1-point discrepancy). Means across the clusters were calculated for inter-rater agreement rates. For ICCs, each of the correlations was transformed into a z score using r to z transformation. The mean of these z scores was transformed back to a correlation coefficient.
Factor Analysis of Outcome Variables
For the second goal of examining whether the four main outcome variables (engagement, active participation, positive mood, and attitude) can be subsumed under one index, we conducted a factor analysis. The extraction method was principal axis factoring with an oblique (direct oblimin with Kaiser normalization) rotation. Factor analyses were conducted nine times, once for one group activity kit of each of the nine group activity kit categories (physical activity, reading, creative art, art history, cognitive training, singing, travels around the world, Judaism, other). Through this design, all participants within each analysis were independent of each other.
Validation
For the third goal, validating the research observers’ ratings with those of the activity leader who conducted the group activity, we conducted ICCs and agreement rates considering a 0 to 1 difference as an agreement. For this analysis we prepared three clusters of participants with participants with concurrent ratings by research observers and by the activity leader who conducted the group activity. The first cluster included 115 participants and the two others had 114 participants.
All analyses were conducted via IBM SPSS 29.
Results
Descriptive statistics for the measures of attendance and outcome variables are presented in Table 2. The variables are not directly comparable because they utilize different scales.
Descriptive Statistics for Measures of Attendance and Engagement.
Note. Number of observations (n) is larger for activity leaders because during COVID-19, research observers could not enter units.
Attendance duration: rated only by observers on a 7-point scale from 0 = “none of the time” to 6 = “all of the time.”
Engagement: rated on a 6-point scale from 0 = “none of the time” to 5 = “most or all of the time.”
Active participation: rated on a 5-point scale from 0 = “not at all” to 4 = “very much.”
Positive mood: rated on a 5-point scale from 0 = “not at all” to 4 = “very much.”
Attitude rated on a 7-point scale from 1 = “very negative” to 7 = “very positive.”
Number of people in the group: completed only by observers.
Positive interaction: rated on a 6-point scale from 0 = “none” to 5 = “very high (more than 10 interactions).”
Negative interactions among group participants: rated on a 6-point scale from 0 = “none” to 5 = “very high (more than 10 interactions).”
Interested in the activity: percentage of participants that showed interest in the group activity.
Active participation in the activity: percentage of participants that actively participated in the group activity.
Enjoyed the activity: percentage of participant that showed enjoyment or improved mood.
Inter-Rater Reliability Among Research Observers
The results, presented in Table 3, show very good inter-rater reliability with intra-class correlations ranging between 0.88 and 0.96 for individual level variables; 0.88 and 1 for group level variables. Percent of agreement rate between research observers, where agreement was defined as a 0 to 1-point discrepancy, were also high, ranging from 81.54% to 96.57% for individual variables and 82.05% to 98.15% for group variables.
Inter-Rater Reliability Between Research Observers of the Group Observational Measure of Engagement (GOME) a .
Note. aCalculations of reliability indices based on individual level assessments represent an average over nine clusters, and on the group level—over six clusters.
Intraclass correlations were calculated within each cluster and then transformed into z scores, then averaged and the n turned the average z score back into a Pearson correlation.
Agreement rates were calculated with a difference of 0 to 1 considered to be agreement.
Attendance duration: for the research observers, only eight clusters were used because the ninth cluster did not manifest sufficient variability in responses. Activity leaders did not rate this variable.
ICC could not be calculated because there was insufficient variability in half the clusters.
These variables were calculated as a percentage of the number of participants; therefore, they were not appropriate for calculation of agreement rates. These variables were not rated by activity leaders.
Factor Analysis
In order to examine whether the different outcome variables (engagement, active participation, positive mood, and attitude towards the group activity) converged, we conducted factor analyses. Factor analyses were conducted nine times, once for each of the categories of group activities we used (e.g., reading, creative art, travel around the world, etc.). All nine factor analyses revealed one factor with an eigenvalue ranging between 3.26 and 3.59, which accounted for between 81.4% and 89.7% of the variance. These factor analyses were based on sample sizes of between 63 and 82 participants where data existed for all four variables during the group activity for which the factor analysis was conducted. The range of factor loadings for each of the four outcome variables is presented in Table 4.
Results of Factor Analyses of the Four Outcome Variables.
Note. n = 63 to 82 participants in each activity. Ranges of factor loadings represent factor loadings of nine group activity kits (one for each category: physical activity, reading, creative art, art history, cognitive training, singing, travels around the world, Judaism, other).
Validity: Construct Validity
ICCs and agreement rates between research observers and the activity leader who conducted the group activity were calculated, and these are presented in Table 3. For individual level assessments, agreement rates ranged from 76.38% to 83.01%, and ICCs ranged from 0.77 to 0.85, showing good agreement, albeit somewhat lower than that obtained between the pairs of research observers. For group level measures, only the measures of positive and negative interactions among group participants were available. For negative interactions, the agreement rate was found to be reasonable at 82.13%, but ICC could not be calculated for some of the clusters because of insufficient variation in the data: for 64.4% of research observations and 76.1% of activity leaders’ observations no negative interactions were reported. For positive interactions, the agreement rate was relatively low (57.97%), and the ICC was moderate at 0.65.
Discussion
This research corroborates and extends previous findings (Cohen-Mansfield, 2017; Cohen-Mansfield et al., 2017) regarding the GOME. Whereas the earlier sample included older persons from one large Canadian facility, this research includes a sample in multiple facilities in Israel that was very heterogeneous concerning functional status, education, and religious background. The results demonstrate high levels of inter-rater reliability among research observers for all outcome variables—both variables capturing individual participants and the overall group.
This validation of the GOME was achieved by comparing research observers’ ratings with the activity leader’s ratings. For individual level variables, agreement rates and ICCs were high. For group level variables, the activity leader completed only two questions, concerning negative and positive participant interactions. The agreement rate was high for negative interactions; however, since most observations did not report negative interactions, an ICC could not be calculated for this variable. For positive interactions, agreement rates were low, and ICC rates were low to moderate. For these two variables, agreement rates between research observers was good, but agreement with the activity leader was lower. Multiple potential reasons may account for this. It may be difficult for the activity leader to simultaneously conduct a group activity and count—and retain—the number of positive and negative interactions. The activity leader may be more sensitive to participants’ reactions due to longer acquaintance with the participants and with the population under study, and may be better at detecting nuances in participants’ behaviors. Activity leaders may be biased via demand characteristics, wanting to perceive their work as successful. Since there is no “objective measure” of the outcome variables, it is impossible to determine whose ratings better represent “reality.”
Limitations
This article reports the results of a “real life field study,” and may be limited by this fact. The research did not flow as planned because of COVID-19, as units closed and activities were subject to postponement when residents or staff got infected. Although COVID-19 has been reported to affect geriatric facilities, their staff and residents in Israel (Cohen-Mansfield, 2022; Cohen-Mansfield & Meschiany, 2022) and beyond, for example, DeVries and Kemeny (2023), the activity staff at the facilities that participated in this study were able to conduct group activities during the study period. Participants sometimes left or died during the study, resulting in missing data. Research observers did not have direct access to participants or families, and therefore, unit staff members (such as occupational therapists) collected background data from families, resulting in possible bias or misunderstanding by families. In addition, the research team’s dependence on facility staff and the need to not overburden them in the matter of data gathering resulted in an absence of data concerning chronic disease conditions and history of addictions.
Generalization
The similarity of findings to prior research (Cohen-Mansfield et al., 2017) in a different context (different country and more heterogeneous sample) suggests a favorable potential for generalization. Yet, the differences between the background characteristics of participants in nursing homes versus senior day centers raise a potential generalizability concern in that the sample size was insufficient for repeating all analyses within each type of setting. This remains a limitation that should be examined in future research. The differences in background characteristics could also be viewed as a strength in that the results represent a heterogeneous sample.
Conclusion
This study represents a useful advance in research on the impact of group activities for persons with dementia by demonstrating that the four outcome variables on the individual level (engagement, active participation, mood, and attitude towards the group activity) converge into one construct, as found in the factor analysis. We have labeled the index derived from these variables, the “Index of Wellbeing,” calculated as a mean of the four outcome variables, after scaling them to a 7-point scale (since each was originally rated on a different scale to render the scales intuitive for raters). A research tool that combines four outcome variables into one index facilitates the analysis of outcome. Furthermore, an index that assesses wellbeing in persons with dementia can advance efforts to enhance their quality of life.
The GOME enables study of impact of group activities according to the lived experience of persons with dementia. Such group activities may improve the wellbeing of this population (Cohen-Mansfield, 2018; Cohen-Mansfield, Marx, Thein, et al., 2011), address the social needs of participants (Cohen-Mansfield et al., 2015), and can also decrease challenging behaviors (Cohen-Mansfield et al., 2010). The GOME enables the study of parameters affecting the impact of group processes for persons with dementia, and the comparison of methods to optimize their impact.
The relative ease of use of the GOME and especially its outcome variables, suggests that it is well-fitted for routine use in ongoing monitoring of recreational activities for persons with dementia. Ongoing monitoring is crucial for evaluating current group activities, developing and evaluating new activities, and enabling systemic quality improvement of activities for persons with dementia. Thus, the global implications of the findings dictate promotion of ongoing assessment of the impact of activities for persons with dementia around the world, and the institution of processes to maximize positive impact on wellbeing. This suggests the need for change in monitoring routines by group activity leaders to assure continuous monitoring and increased investment in (1) research that compares the impact of different types of recreational activities for persons with dementia. This includes all aspects of the comprehensive process model of engagement (Cohen-Mansfield, Marx, Freedman, et al., 2011), such as content of activities, personal characteristics of participants, environmental characteristics of the activities, including physical activity and characteristics of activity delivery as well as interactions among these factors, and (2) research on the dissemination of those findings to assist group activity leaders in improving the contents and methods of the recreational group activities they lead in order to optimize wellbeing in the lived experience of persons with dementia.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Minerva Foundation.
Data Availability Statement
Data used for this study are available from the corresponding author in anonymized form after institutional review board approval.
