Abstract
Background
Surgical postgraduate training is often debated in an economically squeezed health care system, particularly regarding the effectiveness of massed training, when considering the impact on Cognitive Load (CL). However, the relationship between CL and massed training in surgical education has not been extensively studied.
Objective
This study aimed to explore how a five-day simulation-based orthopedic course affected cognitive load among orthopedic residents.
Methods
Surgical residents (N = 40) in their final phase of medical specialization were enrolled in a surgical skill simulations course from spring 2022 till fall 2023. Questionnaires were administered to course participants to measure CL using NASA-TLX. Fluctuations in CL during the five-day course were analyzed using ANCOVA and Spearman correlation.
Results
Forty out of the 41 course participants agreed to participate. The findings revealed fluctuations in CL among orthopedic residents during the five-day course. NASA-TLX scores increased significantly after day 1 (P = .017, P < .001) and decreased on day 4. ANCOVA showed a significant effect of day on total workload scores (P = .014), with no significant associations between workload and covariates. No specific factors were identified as key drivers of cognitive load variation.
Conclusion
The findings revealed variations in perceived workload over the five days, with no single factor identified as the primary driver of changes in CL. While the study did not find a detrimental impact from massed training, further research is needed to understand the connection between training approaches and their impact on CL and learning outcomes.
Introduction
Surgical postgraduate training is a much-debated topic in medical education, and especially the concept of massed versus distributed training has been up for discussion in the past decade.1–4 Often, the argument for distributing training is centered around learning prerequisites and if massed training loads cognitive capacity to a degree that could affected learning. However, not much is known about how cognitive load fluctuates during massed training. Therefore, the aim of this study was to explore day-to-day fluctuations in cognitive load during a five-day massed simulation-based orthopedic course and to examine whether the structure and content of training sessions differentially influenced the components of cognitive load.
Cognitive Load Theory
Cognitive load has been a subject of much research in both psychology, education, and medicine. 5 The term has especially been of interest in aviation training and research, and similar high-stake professions. 6 The term cognitive load refers to the mental capacity and related cost (ie, stress, fatigue, or attention) that impacts mental workload, which has been defined as “…the cost of accomplishing mission requirements for the human operator.” 7 Cognitive load theory thus rests on the concept of working memory, which is a momentary container for all sensory inputs where information is processed and—if learned—integrated into one's long-term memory, forming and building future knowledge and skills. Cognitive load theory here builds on the tenet that working memory is a limited cognitive space, which can be loaded by three different elements: intrinsic, extraneous, and germane load. If a learning task exceeds the learners’ cognitive capacity, they may become overloaded which impacts their ability to integrate new knowledge, perform tasks and manage sensory inputs.8,9
Intrinsic load refers to load specific to the task, like task complexity. Extraneous load refers to all factors that are non-essential to the task, like interruptions and noise. Germane load is the amount of space used to processes information and produce long lasting memories. While the germane load is vehemently discussed if it is, in fact, a predictor of cognitive load and thus also its true connection to working memory recent work suggests that the effortful processes associated with learning may stem from the resources available in working memory rather than constituting a separate type of load. 10 The main argument for germane load is, that if there is insufficient capacity for germane load or processing, learning becomes less effective, a condition known as cognitive overload. To lessen the chance of overloading, the concept of distributed training has been advocated in the simulation-based surgical literature, 2 as learning tasks place demands on cognitive load limiting the capacity for germane load. 11
Massed Versus Distributed Training
Much literature posit that distributed training is superior to massed training, where learners engage with learning material in shorter sessions over time. 12 The underlying mechanisms that warrant such an effect, is said to be attributed to episodic memory, meaning that learners remember items more accurately, not only when they are repeating the material, but also when the repetitions of the material is separated by adjacent material.12,13 In medical education, this notion translates to smaller sessions of teachings, that are interrupted by periods of time where learners are in clinical or academic settings, engaging with similar, adjacent material, to then reconvene and repeat and add to this learning. 14 Studies find that this promotes greater integration of knowledge 14 and retention of skills.4,15 In this way, this process leaves space for integration of learning. However, this notion has not been linked to cognitive load theory, stating that overloading might impair learning. Furthermore, it has been argued, that overloading might not be an issue, and that loading, to some extent, is necessary for learners to actually integrate new knowledge. 16 However, this contradicts studies in simulation-based surgical education, showing that cognitive load decreases in distributed practice and that this structure shows superior learning outcomes comparing to massed practice.1,2 Therefore, in this study, we set out to explore how these two concepts are linked by investigating what items drive changes in cognitive load over a five-day orthopedic techniques course.
Methods
This study was designed as an exploratory cohort study based on data collected across three course iterations over 1.5 years, aiming to describe cognitive load progression.
Study Population
The study population consisted of course participants enrolled at the obligatory surgical skill simulations course under the Danish Orthopedic Society (DOS) and The Danish Health Authorities (SST). The attending participants were surgeons in their final year of specialization to become general orthopedic surgeons. The data were collected over three courses during a 1.5-year period, from spring 2022 until fall 2023.
All residents attending the course were invited to participate, and enrollment was voluntary. Although all but one participant consented, some did not complete all questionnaire segments, resulting in incomplete records that were excluded from analysis. Only participants who provided usable data for the cognitive-load measures were included in the final dataset.
Description of the Course
The surgical skill course under the SST and DOS is conducted in cooperation with the Department of Clinical Medicine at Aarhus University Hospital animal laboratory facilities. During the course, 18 participants are divided into groups of three, who together share their surgical experience and assist each other around an operating table with an anesthetized porcine model. There are six operating tables with three course participants supervised by a senior surgeon within their own subspecialty. The anesthetized porcine models are monitored and cared for by technical assistants employed in the animal lab, with specialized skills and authorization to handling of experimental animals.
Prior to skills training, the procedures in the curriculum were lectured by a subspecialized senior surgeon in each field. The course run once every semester and is nation-wide, meaning all orthopedic residents in specialist training in DK needs to be enrolled to receive their final authorized specialization.
Content of the Course
Faculty consist of surgeons subspecialized in the theme of the day and changes accordingly. Day 1 consists of hand surgery in the morning and traumatology in the afternoon (Hand & Trauma). Day 2 consists of spine surgery in the morning and plastic surgery in the afternoon with focus on lap construction (Spine & Recon). Day 3 consists of thoracic surgery within common salvage procedures in their field, which a medical doctor specialized within traumatology is supposed to handle in the Emergency Department (Thoracic & A&E). Day 4 is vascular surgery focusing on acute handling of vascular damage (Vascular). Day 5 is plastic surgery within the microsurgery field (Micro). On this day the course participants operate alone on a rat and under a microscope to create a functioning end-to-end anastomosis in the femoral artery and vein.
Data Collection
The questionnaires (seen in Appendix 1) were introduced and administrated during the welcome introduction on the first day prior to the first lecture. Two of the authors (MLG and KH) introduced the questionnaire and the aim of the study to all the course participants. All participants were told to fulfill the questionnaires after each procedure and at the end of day. At the end of the course, in conjunction with official approval of the course, the questionnaires were collected by the chair of the course (KH).
Outcome Measures
Cognitive load was measured using NASA-TLX, a validated subjective self-report tool.7,17,18 NASA-TLX consist of six subitems, that cohesively measures overall cognitive load:
Mental demand, asking how mentally demanding the task was. Physical demand, asking how physically demanding the task was. Temporal demand, asking how hurried or rushed the pace of the task was. Performance, asking how successful the participant was in accomplishing what they were asked. Effort, asking how much hard the participant had to work to accomplish their level of performance. Frustration, asking how frustrated the participant were during the task.
All items are scored on a scale from 0 to 20, ranging from very low to very high, except for performance, which is scored from 0 to 20 ranging from perfect to failure. Scores are transformed to range from 0 to 100. We selected the NASA-TLX because it has comparable construct validity to other cognitive-load instruments.19,20 The short format being far more feasible for repeated, multi-day administration as its brevity reduces participant burden and minimizes repeated-measure fatigue, making it suitable for intensive simulation courses. 20
Additional data on age, sex, employment, and surgical experience measured as years working in surgery were collected from participants.
Statistical Analysis
Statistical analyses were conducted using IBM SPSS Statistics (version 29). STATA 19.5 (SE) was employed for investigation of normal distribution.
Descriptive statistics were performed on demographic data with calculations of means and proportion when appropriate.
To assess distributional assumptions, we visually inspected the data using histograms. The histograms for days 1–5 point to consistent departures from normality. Distributions are unimodal yet right-skewed, with heavier right tails that the kernel density makes visible against the overlaid Normal curve. The pooled histogram follows the same profile. Taken together, these plots indicate that the shape of the data, not just its central tendency, leans toward asymmetry, reinforcing our decision to summarize with medians and proceed with methods robust to non-normality. As these histograms showed clear deviations from normality, the following analyses were conducted using median-based and appropriately adjusted statistical methods.
Differences in cognitive load scores between days were analyzed using one-way ANCOVA with repeated measures with day as independent variable, NASA-TLX as dependent variable, and age, sex, department, and experience as covariates. Post-hoc pairwise comparison of means was performed with Bonferroni adjustment for multiple comparison. Results are reported as significance levels. To determine if any subitems were more correlated to changes in total score, Spearman correlation was performed on the relationship between subitems and total score on each day. Results are reported as r-values. Missing data were computed using the median of the variable.
Ethics
According to the Danish Consolidation Act on Research Ethics Review of Health Research Projects, Consolidation Act number 1338 of 1 September 2020 section 14 (1) only health research studies have to be notified to the Committees. The Committees do not consider educational studies to be health research studies (section 2 (1)). 21 This study is therefore exempt from formal review. The study was conducted in accordance with the principles of the Declaration of Helsinki, which dictated that we collect written informed consent from all participants prior to data collection.
Results
Demographics
A total of 41 surgeons participated on the courses in the investigation period, of which, 40 participants agreed to participate and answered the questionnaire with 25 participants completing all elements (NASA TLX and reflective questions) of the questionnaire. Demographic data are shown in Table 1.
Demographic Data on Participants (n = 40).
Results are shown as proportions or medians (range)
* n = 37 for experience due to missing data from three participants
Outcome Results
Table 2 shows medians and range of NASA-TLX and each subitem in the questionnaire. Both scores for subitems and total score increased throughout the course, except for day 4, where a decrease in the subitems physical demand, temporal demand, performance, effort and total score were seen. A significant effect on day on total score of NASA-TLX was shown in the ANCOVA with repeated measures (P = .014, Wilks lambda) when adjusting for experience, department, age, and sex. No significant association between NASA-TLX total score and covariates was seen.
NASA-TLX Total and Subitem Scores on Each Day.
Post-hoc pairwise comparison was conducted and showed a significant difference between day 1 and all other days (P = .017, P < .001, P < .001, P < .001) (Table 3). Additionally, there was a significant difference between days 2 and 5 (P < P < .001). Differences between all other days were non-significant.
Significance of Pairwise Comparison.
Spearman correlation was performed to show if any one subitem were the primary driver for changes in total score. The analyses showed no clear patterns (Table 4). Meaning no one subitem was more correlated with total score throughout the course, and changes in all subitems affects the total score.
Significance of Pairwise Comparison.
Discussion
Summary of Key Results
The results highlight the variations in NASA-TLX (Task Load Index) scores across the course, reflecting changes in perceived workload over time. Median scores for NASA-TLX and its subitems generally increased throughout the course, indicating a rise in perceived task load, however not significantly. Day 1 stood out as being significantly lower from all other days, with perceived workload increasing significantly from this day. There was also a significant difference between day 2 and day 5, showing a substantial increase in workload toward the end of the course. On day 4, however, there was a notable decrease in several subitems (physical demand, temporal demand, performance, effort) and the total score, suggesting a temporary reduction in workload.
The results suggest that participants experienced an overall increase in cognitive load during the course, with a notable drop on day 4, potentially due to course design or a temporary decrease in task demands. Thereby indicating that the content of the training plays a pivotal role on cognitive load. While research suggest that distributed practice is superior to massed practice in regards to learning outcome,4,22 it has been suggested that massed practice is logistically superior. 23 The present study adds to this discussion, by highlighting that the content in the training, and not only the structure of the training, plays a pivotal role in the load on the learners. This aligns with findings from Rölfing et al, (2019), 24 showing that failing a simulated task is associated with higher cognitive load. Similarly, it is found that massed practice seems superior in knowledge acquisition, 22 hence it is important to nuance this discussion on massed versus distributed practice with content and type of skill. The significant differences between day 1 and subsequent days imply that participants found the workload much higher after the initial day, and the substantial difference between day 2 and day 5 suggests a heightened sense of workload by the end of the course. Since no single subitem was identified as the primary driver of these changes, all aspects of workload (eg, physical demand, effort) collectively contributed to the perceived increase in task load, which might indicate that no single item drives cognitive load, and that causes of perceived load might be more idiosyncratic than first theorized. Day 5 (independent microsurgery on live rats) is inherently more difficult than the other course subjects, which might explain this finding. Still, this indicates a trend that the effect on cognitive load is higher when participants are either met with increased task-demands or exhaustion, which might explain the drop in cognitive load on day 4, but that still being significantly higher than on day 1. Importantly, research on the impact of massed practice often investigate how one-day massed training compares to distributed practice in surgical education.4,25–27 This study indicates that massed training, such as five-day courses, have a more complicated impact on cognitive load. In contrast, recent evidence from health professions education shows strong overall benefits of distributed practice, 14 yet these findings are largely based on learning outcomes rather than day-to-day cognitive load. Integrating such perspectives underscores that the relationship between training schedule and cognitive load is multifaceted and influenced by both structure and task demands.
Hence, studies on massed versus distributed practice may in the future compare multiple-day courses to more distributed training and how that impacts more intricate patterns of cognitive load, psychological pressure and retention of learning, both short and long term.
Our results suggest that massed training, such as a five-day course, does not impact cognitive load linearly. This is in contrast to prior literature reviews, which found a significant negative impact on learning outcomes through massed training. 8 Our study did not target learning outcomes but showcased that in terms of cognitive load, massed training is not impacting learners in detrimental ways. However, this should be interpreted cautiously, as our cognitive-load measure captures overall load only and does not allow detection of more nuanced psychological factors relevant to retention and transfer. Looking to learning theory, this can also mean that massed training does not mitigate learning because of the processing capacity it consumes. Instead, massed training may impact learning due to the lack of time for integration of knowledge or due to fatigue.
Practically, this suggests that careful sequencing of tasks within multi-day courses is essential. Because cognitive load varies with task demands rather than rising uniformly, alternating high- and low-intensity activities may help keep learners within an optimal load range. This indicates that massed training can remain feasible when distributed practice is impractical, provided course design mitigates fatigue and supports the integration of new knowledge.
Limitations
While we did not see an impact on cognitive load, our interpretation as to the impact on learning is theoretical, as we did not administer a test on skills prior or post the intervention. This can be seen as a limitation of our conclusions. As an exploratory cohort study without a comparison group, we cannot determine whether the observed progression in cognitive load was attributable specifically to the massed training format or to characteristics of the tasks themselves. In continuation of this limitation, no a priori sample size calculation was performed. All course participants were invited to participate, resulting in convenience sampling with voluntary participation, which may have introduced selection bias and affected the generalizability of the findings. To reduce analytic bias, we assessed distributional assumptions and used non-parametric tests when normality was not met.
Additionally, NASA TLX measures overall load, and thus does not differentiate between intrinsic, extrinsic, or germane load, 17 which may have had an impact on our ability to target the nuanced psychological impact of massed training. Thus, we did not explore fatigue or other psychological factors that may limit germane load and therefore cannot say with certainty that this massed training course did not load participants’ cognitive capacity, posing further limitations to the conclusions drawn. Moreover, although subjective rating scales such as the NASA-TLX are widely used and suitable for repeated cognitive-load measurement, they show limited sensitivity to extraneous and germane load, 10 which further underscores this limitation. Furthermore, subjective instruments like the NASA-TLX are often criticized for limitations in objectivity, reliability, and validity, which should also be acknowledged as a constraint in our measurement approach. 20
As such, although some studies report negative effects of massed practice, 14 these typically examine single-day massed training; in contrast, our multi-day format likely benefits from repeated consolidation across days, which may mitigate such effects. Nonetheless, our conclusions remain cautious, as our cognitive load measure captures overall load only and may not detect more nuanced psychological mechanisms that influence retention and transfer.
Conclusion
In conclusion, this study aimed to investigate how a five-day simulation-based orthopedic course affected cognitive load among orthopedic residents. By exploring the concepts of cognitive load theory and massed training, we sought to understand the impact of different training approaches on cognitive load.
The results of the study showed variations in perceived workload over the five days, with a notable decrease in workload on day 4. While there was an overall increase in cognitive load during the course, no single subitem was identified as the primary driver of these changes, suggesting that all aspects of workload collectively contributed to the perceived increase in task load. Overall, while the study did not find a detrimental impact on cognitive load from massed training, our findings indicate that multi-day massed courses show a more complex cognitive-load pattern than typically assumed in studies focused on single-day massed training. The fluctuations observed across days underscore that course content and task demands play a decisive role in shaping perceived load, beyond scheduling alone.
At the same time, as our cognitive-load measure captures overall load only, we cannot speak to more nuanced psychological processes relevant to learning outcomes, retention, or transfer. With limited options for distributed practice, our study suggests that massed training may be an acceptable alternative. However, given that distributed practice is generally found to improve learning outcomes, and recent reviews continue to emphasize its superiority, future studies should examine how multi-day massed training compares to distributed formats not only in terms of cognitive load but also in retention and performance. Further research exploring the relationship between massed training, cognitive load, and learning outcomes is needed to provide a more comprehensive understanding of the topic.
Supplemental Material
sj-docx-1-mde-10.1177_23821205261443550 - Supplemental material for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency
Supplemental material, sj-docx-1-mde-10.1177_23821205261443550 for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency by Maria Louise Gamborg, Kirstine Bruun Viuf, Rune Dall Jensen, Jan Duedal Rölfing and Kristian Høy in Journal of Medical Education and Curricular Development
Supplemental Material
sj-docx-2-mde-10.1177_23821205261443550 - Supplemental material for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency
Supplemental material, sj-docx-2-mde-10.1177_23821205261443550 for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency by Maria Louise Gamborg, Kirstine Bruun Viuf, Rune Dall Jensen, Jan Duedal Rölfing and Kristian Høy in Journal of Medical Education and Curricular Development
Supplemental Material
sj-pdf-3-mde-10.1177_23821205261443550 - Supplemental material for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency
Supplemental material, sj-pdf-3-mde-10.1177_23821205261443550 for The More the Merrier? Appropriateness of Massed Simulation Training in Surgical Residency by Maria Louise Gamborg, Kirstine Bruun Viuf, Rune Dall Jensen, Jan Duedal Rölfing and Kristian Høy in Journal of Medical Education and Curricular Development
Footnotes
Ethics and Consent
According to the Danish Consolidation Act on Research Ethics Review of Health Research Projects, Consolidation Act number 1338 of 1 September 2020 section 14 (1) only health research studies have to be notified to the Committees. The Committees do not consider educational studies to be health research studies (section 2 (1)). 21 This study is therefore exempt from formal review. The study was conducted in accordance with the principles of the Declaration of Helsinki, which dictated that we collect written informed consent obtained from all participants prior to data collection.
Author's Contribution
KH, MLG, RDJ, and JDR conceived the study idea and overall design. KH and MLG led data collection, with RDJ and JDR contributing to resolving practical and methodological issues as they arose. KBV conducted the majority of the statistical analyses in close consultation with KH, RDJ, and JDR. MLG drafted the initial manuscript, with KBV contributing most of the results section and all co-authors providing revisions. All subsequent revisions were led by MLG, and the final manuscript was reviewed, adjusted, and approved by all authors.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was not funded by any grant agency, and authors are not affiliated with any organizations that may have competing interests.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Disclaimer
The authors have nothing disclose. This study was not funded by any grant agency, and authors are not affiliated with any organizations that may have competing interests.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
