Abstract
Background:
Advancing patient safety during handoffs remains a public health priority. The application of cognitive load theory offers promise, but is currently limited by the inability to measure cognitive load types.
Objective:
To develop and collect validity evidence for a revised self-report inventory that measures cognitive load types during a handoff.
Methods:
Based on prior published work, input from experts in cognitive load theory and handoffs, and a think-aloud exercise with residents, a revised Cognitive Load Inventory for Handoffs was developed. The Cognitive Load Inventory for Handoffs has items for intrinsic, extraneous, and germane load. Students who were second- and sixth-year students recruited from a Dutch medical school participated in four simulated handoffs (two simple and two complex cases). At the end of each handoff, study participants completed the Cognitive Load Inventory for Handoffs, Paas’ Cognitive Load Scale, and one global rating item for intrinsic load, extraneous load, and germane load, respectively. Factor and correlational analyses were performed to collect evidence for validity.
Results:
Confirmatory factor analysis yielded a single factor that combined intrinsic and germane loads. The extraneous load items performed poorly and were removed from the model. The score from the combined intrinsic and germane load items associated, as predicted by cognitive load theory, with a commonly used measure of overall cognitive load (Pearson’s r = 0.83, p < 0.001), case complexity (beta = 0.74, p < 0.001), level of experience (beta = −0.96, p < 0.001), and handoff accuracy (r = −0.34, p < 0.001).
Conclusion:
These results offer encouragement that intrinsic load during handoffs may be measured via a self-report measure. Additional work is required to develop an adequate measure of extraneous load.
Background
Patient handoffs are associated with medical errors and harm to patients.1,2 Considerable attention in the literature has been focused on interventions to improve patient safety during handoffs, 3 many of which have been adapted from industries in which transition errors have high consequences. 4 These best practices facilitate information transfer via communication protocols that include structured face-to-face and written sign-out, teamwork, interactive questioning, and distraction-free settings.3,5 Recent implementation of a handoff bundle in multiple pediatric hospitals yielded improvements in educational and clinical outcomes. 6
Despite these advances, handoffs remain a significant patient safety challenge. Conceptual work has highlighted cognitive load theory (CLT) as a framework that may help researchers better appreciate the cognitive mechanisms of handoff errors. 7 Originally developed by Sweller and colleagues8,9 in the context of studying how students problem-solve, CLT focuses on the implications of limited working memory (WM) for learning. Unlike sensory and long-term memory, WM is not infinite—WM can only actively process (i.e. organize, compare, and contrast) no more than two to four elements at any given moment as suggested by the most recent work in the area.10,11 Theoretically, when the cognitive load of a handoff exceeds the WM capacity of the learner, errors occur, often in the form of information loss (e.g. drug allergy, critical co-morbidity, relevant history, or current treatments) or distortion (e.g. wrong medication dose, wrong surgical site, or incorrect diagnosis).
CLT understands learning as the construction and automation of schemata. 12 Researchers have differentiated overall cognitive load into three types: intrinsic load (IL) (information processing essential to learning the skill), extraneous load (EL) (information processing induced by sub-optimal design of the task or the physical environment), and germane load (GL) (information processing imposed by the learner’s deliberate use of cognitive strategies to refine existing schemata and enhance storage in long-term memory). 13 Recent work by Sweller and others has suggested that GL may best be understood as a component of IL rather than a separate type of load.14,15 In this view, a two-factor model (IL and EL) is preferred. Regardless of whether IL and GL are considered separate constructs, CLT’s focus on WM as the bottleneck for learning leads to three instructional strategies: minimize EL, match IL to the developmental stage of the learner, and optimize GL. 12
Researchers have developed a number of techniques to estimate cognitive load,16,17 including learner self-rating of effort, 18 response time to a secondary task (e.g. participant’s response to a vibration sensation) presented during the primary task,19,20 observations, 21 and psychophysiological measures (e.g. heart rate variability, pupillary response, and electrical skin conductance). 22 While secondary task and physiological measures allow for the objective measurement of cognitive load dynamically throughout the task in contrast to self-ratings which are more subjective and occur only after the task, researchers most commonly use learner self-rating because it is inexpensive and easy to administer. 23 Paas’ 23 single-item self-report measure has been used extensively. 24 While developed as a measure of overall cognitive load, some argue that Paas’ Scale may actually measure IL rather than overall load.20,25 The other most commonly used self-rating instrument is the National Aeronautics and Space Administration Task Load Index (NASA-TLX), a multi-item scale that measures overall mental workload.26,27
However, the application of CLT has been limited by the absence of measures that differentiate cognitive load types. Such a measure would help identify the cognitive mechanisms of handoff errors and develop new educational strategies and protocols that modulate IL, EL, and GL in the desired directions. Outside of handoffs, the most promising efforts to measure load types have focused on classroom-based learning (e.g. college statistics)14,28 and colonoscopy performed by gastroenterology fellows. 29 Both groups have developed instruments with evidence for validity that are promising but not directly applicable to handoffs. Only one published study has reported efforts to develop a handoff-specific measure. This study had mixed results. 30 The IL items did form a single factor, but the EL items performed poorly. In addition, this inventory had only a single item for GL and the study did not use an adequate measure of performance.
As of now, the field has yet to develop a measure of cognitive load types during handoffs that has sufficient evidence for validity to warrant its use. Therefore, we revised the prior inventory to create a new one, the CLIH. This study describes results from psychometric assessment of the CLIH in the context of a simulated handoff performed by medical students. To provide evidence in support of the validity of the scores from this measure, the study examined the factor structure and determined whether the CLIH scores vary, as predicted by CLT, with a measure of overall cognitive load, learner experience, case complexity, and performance.
Methods
Design
This is a psychometric study of the CLIH in which we utilized the unitary model of validity31,32 to obtain evidence from several sources: content of the items (input from experts), response process (cognitive think aloud with residents), internal structure (factor analysis and internal consistency), and correlation with other variables. We did not collect evidence for consequential validity.
Development of the CLIH (content validity)
To guide item development, the authors focused on recently published conceptual work that identifies drivers of cognitive load types during handoffs 7 and also on emerging empirical work that has reported success in measuring cognitive load types with self-report instruments—two studies of college students learning classroom material14,28 and one study of medical trainees performing colonoscopy. 29 The IL items from our initial study 30 were modified and several new items added to capture hypothesized drivers were as follows: the volume, complexity, and interactivity of the handoff information. We wrote new EL and GL items. EL included items focused not only on task design (e.g. clarity of the protocol), but also recently proposed dimensions such as the physical 33 and internal 30 environment. Following the recommendations of several studies,12,30 concepts related to schema construction and metacognition (e.g. taking steps to clarify understanding) were adapted to further specify GL. Three CLT and two handoffs experts iteratively reviewed drafts. To examine the response process of trainees, five residents at the lead author’s institution explained in a group setting how they understood each item leading to revisions of several items. These steps led to a significantly revised and expanded inventory requiring collection of new evidence for validity. The resulting CLIH had 19 items total with 7 items for IL and 6 items each for EL and GL (Table 1). The instructions for each item were as follows: “Please rate your level of agreement with each of the following statements regarding this handoff.” Participants indicated their level of agreement via a 5-point Likert scale (strongly disagree to strongly agree).
Factor loadings a for each handoff simulation.
IL: intrinsic load; EL: extraneous load; GL: germane load; SBAR: situation, background, assessment, and recommendation.
Factor loadings derived from principal axis factoring with promax rotation using robust weighted least square estimation.
The instructions for each item were as follows: “Please rate your level of agreement with each of the following statements regarding this handoff.” Participants indicated their level of agreement via a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree).
Participants and procedures
The data for this study were collected in the context of a separate study that examined predictors of information loss and distortion during simulated handoffs. 34 Study participants were second- and sixth-year students, recruited from a Dutch medical school. Risk and benefits were explained to each participant and written informed consent was obtained. After providing information about prior handoff experiences, each participant performed four simulated handoffs. Two were simple cases; two were complex cases. Simple cases had an established diagnosis with typical associated clinical findings, whereas complex cases contained unrelated findings partially consistent with multiple possible diagnoses. The order of the cases was randomly assigned to each participant. At the end of each handoff, study participants completed the CLIH, Paas’ Cognitive Load Scale, and one global rating item for IL, EL, and GL, respectively. This study was approved by the Ethical Review Board of the Netherlands Association for Medical Education.
Relationship with other variables
We assessed the relationship of the CLIH with several variables. First, the total cognitive load score from the CLIH should correlate positively with measures of overall cognitive load. To test this hypothesis, we adapted Paas’ Cognitive Load Scale (Paas’ Scale), a single item designed to measure overall cognitive load to read: “During the handoff I just finished, I invested …” followed by a 9-point scale (ranging from extremely low mental effort to extremely high mental effort). Second, the score of each load type should correlate with a global measure of each load type, respectively. We, therefore, included a single global item for IL, EL, and GL, respectively. We assumed that the overall perception of each kind of load will correlate with the corresponding score generated by the CLIH.
Finally, the CLIH should relate to other variables as predicted by CLT. For example, as a learner’s knowledge increases, a given task becomes more routine and the IL and the total cognitive load for that task should decrease. We, therefore, examined the relationship between the CLIH and experience (level of training), task complexity (simple vs complex cases), and performance (proportion of information successfully transferred during the simulated handoff).
For the performance outcome, we used a measure of the proportion of information successfully transferred during each simulated handoff that had been calculated for a different study occurring during the same simulation. Details are described elsewhere. 34 In short, for each case signed-out by a subject, an overall index of information accuracy was calculated. Because cases differed in their total number of information elements, the raw scores were standardized.
Analysis
There were four different handoff cases that each participant completed, that is, four case results were nested within each participant. Given the nested structure of this data, we preferred to conduct a multilevel factor analysis. However, our limited sample size did not permit this approach. We, therefore, performed separate categorical exploratory factor analysis (EFA) on each of the four cases. We conducted principal axis factoring with promax rotation using robust weighted least square estimation in Mplus (Muthen and Muthen 35 ). We included all 19 items. For model selection (number of factors extracted), we used several criteria to indicate acceptable fit: the eigenvalue must be greater than 1, a factor must have at least two items with loadings greater than 0.40, and the standardized root-mean-square residuals should be less than 0.08 (Hu and Bentler 36 ). To derive the final model selection, we examined the consistency in patterns of factor loadings across the four simulations.
We created scores for the resulting factors by summing the items that composed of the factor. SPSS version 23.0 (IBM Corporation, Armonk, NY) was used for the additional analyses. Because each subject participated in all four conditions (two simple cases plus two complex cases), we used repeated-measures analysis of variance to assess the CLIH score by case complexity (within subjects) and level of training (between subjects).
Results
Participant characteristics
In total, 52 medical students participated. A typical participant was a female (N = 47, 85%) sixth-year student (n = 29, 56%). Approximately half of the participants reported performing fewer than five handoffs as a sender and as a receiver. Most (N = 40, 78%) had not received any prior training in handoffs. Because level of training and reported number of prior handoffs covaried, we used only level of training as the experience variable in subsequent analyses.
Factor analyses
Each simulation generated a unique model (Table 1). The number of factors identified in each of the four models varied from two factors in cases A and B, one factor in case C, and three factors in case D (Table 1). The differences between the models were largely due to the inconsistencies in the distribution of EL items across the factors. Yet, all four models had a significant common feature: a single IL/GL factor that represented all of the IL items and five out of the six GL items. Moreover, internal consistency for the IL/GL factor was high (Cronbach’s alpha was 0.92). Based on these results, we concluded that one factor with the overlapping indicators of IL and GL was stable and met our model-fit criteria.
Additional evidence for validity
Given the consistency of the one factor representing the IL and five of the six GL items across the four simulations, we created an IL/GL factor to explore additional evidence for validity. Table 2 summarizes the relationship of the single factor incorporating IL/GL with the other variables. For three of the variables (Paas’ Scale score, IL/GL score, and performance score), each study participant had four scores (one for each completed case) and these scores had high internal consistency (Cronbach’s alpha: 0.89 for the IL/GL score, 0.83 for Paas’ Scale score, and 0.95 for the performance score). Therefore, for each variable, we used the means score across the four cases for each individual to assess correlations.
Relationship of the IL/GL factor with other variables.
CI: confidence interval; IL: intrinsic load; EL: extraneous load; GL: germane load.
N = 52 unless indicated.
For three of the measures (Paas’ Scale score, IL/GL score, and performance score), each study participant has four scores (one for each completed case). For each of these three measures, the scores had high internal consistency (see results). Therefore, we report here only the correlation between the mean scores of each individual.
Performance: proportion of information accurately transmitted at handoff, N = 49.
The mean IL/GL score correlated with the mean score of the Paas measures (r = 0.83, p < 0.001). In addition, the IL/GL score correlated negatively with handoff accuracy (Pearson’s r = −0.34, p < 0.001). Finally, the IL/GL score correlated highly with both the Global IL item (Pearson’s r = 0.81, p < 0.001) and the Global GL item (Pearson’s r = 0.83, p < 0.001). No correlation was calculated for Global EL since we did not identify an EL factor. Between-subjects, repeated-measures regression analysis showed that IL/GL was lower for sixth-year students compared to second year students (beta = −0.96, p < 0.001) and simple versus complex cases (beta = 0.74, p < 0.001). There was no interaction between the two independent variables.
Discussion
A measure that differentiates cognitive load types is necessary to identify the factors that most impact trainee learning and performance. The results of this study suggest that for handoffs, the CLIH measures a single dimension of cognitive load that is a combination of IL and GL. The score from this measure had high internal consistency and correlated as hypothesized with Paas’ Scale, the global IL and GL items, level of training, case complexity, and performance. At the same time, the EL items did not perform as expected in the factor analyses. This represents our second failed attempt to measure EL during handoffs via subjective self-report.
The finding that IL and GL form one factor adds to the current debate among CLT and medical education researchers. CLT originally proposed two types of cognitive load: IL and EL. 8 In the 1990s, this framework was expanded to include a third factor, GL. 13 Yet, in recent years, some theorists have argued that GL does not constitute a separate type of load but rather falls within IL and represents the WM resources dedicated to processing IL.15,37 Our results support this view that GL and IL are best understood as part of a single process, at least as perceived by students completing the CLIH. These conflicting results regarding the relationship of the GL items with IL items may highlight the challenge trainees face in assessing their own mental processes or the challenge in measuring a construct as complicated as handoffs.
While this instrument can be used to measure IL/GL during a handoff, it cannot be used to measure EL. The EL items did not form a single factor. There are several potential explanations. First, the sample size may have been too small to permit the detection of the underlying relationship between the EL items. Second, while our six items focused on the main drivers of EL (task design and organization) 38 and the physical environment 33 , the construct of EL may nevertheless have been under-represented in our items. Other groups have reported difficulty measuring EL. In a recent mixed-methods study, Naismith et al. 25 present qualitative data suggesting that Paas’ Scale, the NASA-TLX Scale, and their own Cognitive Load Component Measure do not adequately capture EL. Third, despite the pre-testing with five residents, the construction of the items themselves may not be sufficiently clear, and, as a result, the items may not be understood in a consistent manner across study participants or may be interpreted in a way that is not consistent with the construct. Finally, the context of the simulated handoff may be a significant factor. The physical environment was controlled to minimize EL from the physical environment (interruptions, noise, and space) and the design of the task (clear instructions, all needed information in a single place). In retrospect, this likely represents the most important reason why the EL items did not form a factor. Therefore, instead of abandoning efforts to assess EL in handoffs, we advocate exploring this construct in settings where EL factors are not intentionally minimized.
Additional limitations of the study included the size of the nested sample (208 observations nested within 52 subjects). Multilevel factor analysis would have been preferable, but the sample size was too small for this approach. Another limitation was that participants were recruited from a single institution. Strengths included two different levels of learners, varying case complexity, and a relatively robust measure of performance.
In summary, this study provides several sources of validity evidence for the IL/GL score generated by the CLIH. The EL items did not perform well. Yet, differentiating EL from IL/GL is essential if we are to use CLT to improve the instructional and clinical environments. Therefore, the EL items need to be redrafted with a more systematic assessment of the response process. In addition, the next version should be tested in either a simulated environment that intentionally introduces and varies EL or, even better, an authentic clinical setting. A measure that can differentiate between IL/GL and EL would allow handoffs researchers to determine the relative contribution of EL and IL/GL to handoff errors. This would help prioritize efforts to improve patient safety during handoffs. Current handoff protocols focus on reducing EL rather than managing IL. 39 But, we do not know how effective these practices are in reducing EL if we cannot measure it; nor do we know whether the emphasis on EL is warranted. Moreover, the ability to measure load types would enable us to better understand the cognitive mechanisms and effectiveness of current and future handoff interventions and develop a bundle that effectively manages all cognitive load types.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Ethical approval for this study was obtained from the Ethical Review Board of the Netherlands Association for Medical Education. NERB Dosier no. 450.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Informed consent
Written informed consent was obtained from all subjects before the study.
