Abstract
In this paper we discuss the relationship between cognitive processing demands and virtual reality. Advancements in technology and cost efficiency are driving the implementation of immersive virtual reality (I-VR) in training and education, offering an experiential method for learning complex skills and knowledge that would otherwise be too costly or dangerous. However, technological innovations have overshadowed the need to thoughtfully integrate learning theories in the development of I-VR. We discuss the concept of
Despite the rich history of research on immersive virtual reality (I-VR), the need for theoretical development to inform its use in education and training has been overshadowed by technological advancements (McGowin et al., 2023; Slater & Sanchez-Vives, 2016). As a result, there is a notable disparity among implementations of I-VR and the integration of learning theories to ensure their efficacy. In this paper, we partially address this gap by examining the concept of “cognitive processing demands” (CPD) and its implications for the effective design and use of I-VR in learning contexts.
We use CPD as an umbrella term to encompass the collective multidimensional constructs and conceptualizations used in the literature for the demands placed on cognitive resources during task performance—including notions of cognitive load, mental workload, mental effort, and cognitive efficiency. We approach this topic agnostic as to whether these demands are concentrated solely on the working memory (WM) system, or distributed across the perception, action, long-term memory (LTM), and/or learner-environment systems. We adopt this perspective as I-VR affords experiences beyond the interactivity-limited and modality-restrictive instruction of traditional learning environments. As such, it is our goal to clarify the complex interplay between CPD and learning experiences in I-VR.
Our objectives for this paper are twofold. First, we establish a foundational understanding of CPD in I-VR for learning, by analyzing CPD in learning contexts across interdisciplinary theories, and synthesizing these varying conceptualizations of CPD to identify principles, components, and other similarities applicable to I-VR. Second, we help stakeholders advance CPD research and practice by identifying extant knowledge gaps, opportunities for further study, and developing a set of propositions for stakeholders to optimize learning by balancing the capabilities of I-VR and human cognition. We offer practical considerations for the use of multiple sensory modalities, immersion, presence, attention, user experience, and instructional design (ID). Our evidence-based recommendations aim to align CPD with learner requirements and enhance training effectiveness, and motivate future research exploring the complex interplay between CPD and I-VR learning experiences.
Synthesis of Major Theories
Researchers have examined cognitive processing as well as its various components, capacities, and limitations for over a century. As such, we include a representative, albeit non-exhaustive, selection of major theories related to CPD. Due to space limitations, we do not review the major theories of learning and cognition. Rather, our overview includes reference to theories of task load, cognitive load, information processing, and multimedia learning. We add to this by exploring specific frameworks for learning in I-VR with the Limited Capacity Model of Motivated Mediated Message Processing (LC4MP) from the field of communications, and affordances within the framework of embedded, embodied, enactive, extended (4E) cognition.
Common Principle #1: Limited Capacity
Across theories of cognition, such as human information processing (HIP), information processing theory (IPT), and workload, a commonality is that of capacity limitations (e.g., Hart, 1986, Wickens & Carswell, 2021). In the context of I-VR, a limited capacity of cognitive resources may constrain learners’ ability to perceive, acquire, or act upon all information available to them. Resulting cognitive overload leads to information loss and diminishes concurrent performance and learning at immediate and progressive scales (i.e., over multiple learning events). This can be detrimental for novices, who lack the repertoire of schemata needed to compensate for these gaps.
While conceptualizations of the sources of CPD (demands, loads, modalities) and of cognitive resources (e.g., attention, energy, WM capacity) differ among theories, they agree on the concepts of limited capacity, cognitive overload, and general effects on task performance. Where these theories most meaningfully differ, however, is with respect to the mechanisms underlying how learners may respond when task demands exceed this capacity. Workload-related literature accounts for the additional strain resulting from such interactions (e.g., effort, frustration, performance; Hart, 1986), while IPT/HIP specifies mechanisms of cognitive resource allocation via attention (e.g., selection, focus, division) which act to compensate for cognitive overload. Cognitive load theory (CLT), and, by extension, Cognitive Theory of Multimedia Learning (CTML), delineate how overload affects WM functions in support of schema development and focus on mitigating overload through ID. This has generated principles (e.g., coherence, signaling, redundancy) to mitigate or enhance observed effects of these mechanisms (e.g., seductive details, expertise reversal, generative processing).
These perspectives provide value for modeling learner behavior in I-VR but do not fully address a range of factors (e.g., dynamics, affect, motivation, multimodality). We add to these more cognitive theories with the addition of volitional components. Specifically, LC4MP considers overload as motivationally-driven cognitive offload, and suggests that cognitive resources are diverted from processing tasks when perceived rewards fail to match required effort, or when messages evoke excessive negative affect (Fisher et al., 2018). LC4MP emphasizes motivational (i.e., appetitive, aversive) and adaptive response mechanisms, combined with a measurable operationalization of information introduced per second (ii/s) and acknowledges the differences in load (and overload) between perceptual and cognitive systems. Conversely, ecological psychology (Gibson, 1977) and related theories such as 4E cognition (cf. Richardson & Chemero, 2014), to emphasize the brain-body-environment dynamics which function to inherently reduce and externally offload CPD.
Many of the mechanisms by which individuals respond to limited cognitive capacity and cognitive overload have been observed. We suggest that these are further complicated within I-VR learning contexts. In traditional computer-based instructional contexts, learners had a set frame of reference which delimits the amount of instructional information available to them. However, the affordances of I-VR (McGowin et al., 2023) including immersion, as facilitated by a broader Field of View, as well as a vastly increased 360° Field of Regard, and interactivity (e.g., delivery, sensory, and interface modalities), escalate the scope (modal, spatial), rate (temporal), and specificity (spatial, temporal) by which information is presented—with varying instructional relevance. Familiar principles and concepts from the ID literature (e.g., seductive details) may simply no longer be sufficient to fully account for the mechanisms of CPD in immersive, interactive, and ecologically-valid learning contexts. Contemporary perspectives discussed here (e.g., Mayer, 2021; Wickens & Carswell, 2021) provide a useful starting point. But current findings and theory on how CPD influences learning in I-VR provide an imperfect account of these mechanisms. As such, additional research and a theoretical synthesis informed by perspectives from 4E cognition and other disciplines (e.g., McGowin et al., 2023) may be necessary to better improve I-VR learning experiences.
Common Principles #2: Multimodality
The concurrent distribution of demand across multiple modalities may both hinder and benefit learner experiences in I-VR.
Theories developed out of the computer-based training literature (e.g., Cognitive Affective Theory of Learning with Media [CATLM], Moreno, 2006) address a range of modalities for CPD. More recent theorizing, including some developed for VR, only explicitly address demand for the visual and auditory modalities (e.g., Cognitive Affective Model of Immersive Learning [CAMIL], Makransky & Petersen, 2021; CATLM-VR; cf. CTML). This is important as I-VR
The main implication is that designers and practitioners should cautiously apply ID principles derived from studies involving non-immersive and/or non-simulation media to the development of I-VR. For example, to concurrently apply CTML principles such as the modality, multimedia, multiple representations, spatial contiguity, embodiment, and transient information principles together in a single I-VR learning event, without also careful application of the split attention principle, may result in cognitive overload for learners (see also Mayer, 2021). The problem which arises is that these principles may be too generalizable to be practically useful as guidance for the design of multimodal and interactive experiences which I-VR affords, let alone the design of events for specific learning outcomes (e.g., cognitive, psychomotor, affective). There has been insufficient focus on the impact of instruction mediated through modalities other than visual/auditory upon CPD in I-VR, highlighting the need for studies to develop nuanced principles for designing multimodal experiences.
Common Principle #3: Perceptual Versus Cognitive Demand
Most CPD frameworks recognize differences between potentiated demands imparted upon the perceptual system and actualized loads which affect learning and task performance. Theories specific to information processing, workload, LC4MP, as well as (implicitly) 4E cognition and I-VR all differentiate CPD resulting from factors and processes as those demands are being perceived (
In the context of I-VR, managing perceptual and cognitive loads in these vivid and encompassing environments is crucial. The reason for managing loads is clear; proper management of perceptual load ensures that learners are not overwhelmed by the multitude of sensory inputs whereas proper management of cognitive load ensures WM and associated functions are not overwhelmed by concurrent demands for sensory integration, schemata refinement, and interaction during the learning task. To effectively manage these loads, it is essential to consider the dynamic nature of information presented in I-VR.
Common Principle #4: Dynamics
The temporal dynamics of CPD provide a rich source of information for understanding learning processes and predicting learning outcomes over time.
Many theoretical frameworks of cognition acknowledge a temporal dimension to CPD. In workload-related literature, there is evidence for this in the temporal demand item of the NASA-TLX and in neuroergonomic approaches to workload measurement (e.g., via analyses of physiological signals), while, in the IPT literature, this is inferred through action-perception feedback. CLT addresses this via embedding its derivative indices of cognitive load (e.g., instantaneous load, peak load, accumulated load; Paas et al., 2016). Across all of these literatures, there is a shared implementation of dual- and secondary-task paradigms (e.g., Mayer, 2021; Paas et al., 2016; Wickens & Carswell, 2021). A philosophical schism exists, however, in how these temporal dynamics are treated. The IPT, CLT/CMLT, and workload literatures take a component-dominant view of the underlying dynamics and apply a methodological approach which treats CPD as the
Understanding the temporal dynamics of CPD in I-VR design has several important implications. The traditional component-dominant approach can result in a loss of granularity and data, leading to a reduced understanding of the learning process itself. This granularity could provide practitioners with a more predictable and deterministic understanding of how and why learners might struggle or succeed. While learners are generally good at identifying when and where they face difficulties (i.e., single point loads), they often struggle to articulate why these difficulties occur (e.g., De Jong, 2010). This lack of self-regulation can be mitigated by providing more precise data about experienced CPD to stakeholders, enabling them to offer external support to help learners regulate their load more effectively.
Furthermore, as implied in the previous section, designers of I-VR must consider the dynamic nature of information presentation to ensure that both perceptual and cognitive demands are appropriate. Unlike CLT’s element interactivity, which arguably focuses on the inherent complexity of content, LC4MP offers guidance on how to achieve this balance by measuring the rate of information introduced per second (ii/s). From this, designers may adapt perceptual demands (e.g., visual, audio, haptic) and cognitive demands (e.g., attention, comprehension, recall, action), by controlling the complexity (e.g., type, pace, difficulty) of information and instruction within I-VR to optimize cognitive load. By providing real-time support and adaptivity to meet individual learner needs, this approach ensures that cognitive and perceptual demands are managed appropriately to enhance the effectiveness and efficiency of I-VR applications.
Common Principle #5: Affect & Motivation
Affect and motivation are involved in regulating the availability and allocation of cognitive resources (e.g., attention, energy, WM capacity), but stronger responses from these systems during instruction are not always desirable.
Both the frustration component of workload (Hart, 1986) and resource allocation component of IPT (Wickens & Carswell, 2021) point toward, but do not elucidate, the role of affect and motivation in CPD. The latter of these tends to adopt a rational view of information processing and decision making. It acknowledges, but does not fully explain, how demand is influenced by the relative value of stimuli and how concepts such as interest, engagement, and importance play a role in informing one’s resource allocation policy, potentially affecting performance in concurrent tasks (Wickens & Carswell, 2021). Similarly, CTML acknowledges the role of affect through, for example, the emotional design principle, but the only theoretical accounting of these features may be, ambiguously, through LTM. Conversely, LC4MP explicitly links motivation and affect to an individual’s LTM associations through flexible (i.e., contextually specific) and rigid (i.e., established over evolutionary time) connections via the reward/threat paradigm. This model demonstrates how affective content interacts with arousal to facilitate processing outcomes. More precisely, attention is directed to motivationally relevant content, which, in turn, increases resource allocation to the stimuli over time.
In I-VR, we suggest the interplay between affect and motivation is an important factor due to the immersive nature of the technology. I-VR can elicit strong emotional responses (Allcoat & von Mühlenen, 2018; cf. Botella et al., 2017) and heightened levels of engagement (Allcoat & von Mühlenen, 2018), making the careful design of affective and motivational elements essential. For instance, an immersive VR experience can amplify both positive and negative emotional reactions, thereby significantly influencing cognitive resource allocation and processing efficiency. Therefore, designers should consider how to harness affect and motivation to enhance learning and performance without overwhelming the user. This approach can lead to more personalized and effective learning, ensuring that cognitive processing demands are managed appropriately to maximize the benefits of I-VR.
Research Directions
Along with our brief synthesis and identification of component principles, we developed a set of questions to guide future research. These questions are focused on understanding the complex relationship between CPD and learning outcomes in VR simulations. Broadly, these center around technological factors, psychological factors, and individual differences. While we cannot provide an exhaustive treatment of each of these questions here, we delineate considerations for two of particular importance.
How Does CPD Differ in I-VR Versus Non-Immersive Learning Environments?
Theoretical, empirical, and practical differences in CPD have been observed in I-VR versus non-simulation and/or non-immersive learning environments. We outline a few of these differences across two themes: factors and mechanisms.
First, a number of technological and psychological factors appear to be introduced, accentuated, or compounded by experiences in I-VR. These technological factors—such as multimodality (including delivery, sensory, and control modalities), fidelity, immersion, and interactivity, as well as learner control—affect both the perceptual and cognitive demands of performing primary (e.g., instructional steps), secondary (e.g., navigation), and tertiary (e.g., familiarity, comfort) tasks in support of instructional objectives. Psychological factors, such as presence, agency, and engagement, affect cognitive demands for attention, comprehension, recall, and action in I-VR, and modulate the extent to which aspects of the experience invoke extraneous or generative processing to the benefit or detriment of learning outcomes. Nuanced effects of other factors, including individual differences (e.g., proficiency, familiarity, self-efficacy, absorption tendencies) and time (discussed previously), further complicate attempts to generalize findings and theory from non-immersive learning contexts to I-VR. Despite the rich history of I-VR research, we still do not have a clear understanding of the relationships between these factors, let alone their influence on CPD in all of its conceptualizations. As a result, we propose there is a need for theory which explicitly delineates CPD in I-VR and satisfies the propositions outlined herein (i.e., limited capacity, multimodality, perceptual vs. cognitive demand, dynamics).
Second, the mechanisms of learning in I-VR may differ from those in non-immersive learning contexts. The literature suggests that interactions in I-VR leverage a deeper range of cognitive systems through: (a) the effects of a richer range of sensory stimuli (multimodality) on attentional, affective, and motivational regulation (e.g., Makransky & Petersen, 2021); (b) the encoding and retrieval of semantic, procedural, and episodic memories (e.g., Sonnenfeld et al., 2023); and, (3) a greater capacity for reciprocation and coupling between learners and their environments through embodiment and interactivity (e.g., Hacques et al., 2021). Practically speaking, it is no longer beneficial to assess and predict learning with LTM as a “black box” representing semantic memory (cf. CTML, CAMIL), or the environment as a “black box” representing feedback (cf. HIP). There are gaps in our understanding of exactly how learner-system-environment coupling or the acquisition and use of (colloquially) “muscle memory,” “event memory,” “tacit knowledge,” may constrain or liberate cognitive resources. Further, it is unclear if we should delimit our understanding of these resources as attention and WM function or whether broader concepts of energy and fatigue (cf. Pichora-Fuller et al., 2016) may have an influence on actual CPD in I-VR learning contexts.
What Factors Determine Whether Spatial Presence in I-VR Constrains or Liberates Limited Cognitive Resources?
As a specification of the previous section, there is a need to better explain the relationships among CPD, spatial presence, embodiment, and the allocation of cognitive resources.
Much of the theoretical literature on presence suggests a plausible connection or similitude with attentional mechanisms (e.g., Ijsselsteijn & Riva, 2003). What remains unexplored is how direct perception may modulate such resource allocation. A learners’ embodied and embedded affordances within I-VR, and their mental model thereof, may be substantially different than those derived from the real environment (e.g., McGowin et al., 2023). Simulated environments typically constrain the possible actions the learner may perform in the environment, change the psychomotor inputs associated with a particular action, and provide sensory information at magnitudes of a lower degree than the real world (Cuervo et al., 2018), thus changing the very nature of the ecologically-rich information available to them in the ambient array. However, I-VR may concurrently afford actions that would be otherwise impossible to enact within the real world (Abtahi et al., 2022; McGowin et al., 2023). For novices in particular, there is a period of time needed to familiarize oneself with the simulated environment. In the ecological perspective, this equates to the process of
Conclusions
Through a discussion of common principles, competing views, and implications of I-VR across disparate theoretical traditions, we attempted to disentangle considerations affecting CPD in I-VR. We additionally highlighted how current approaches, attempting to retrofit CPD-related theories derived from non-immersive learning contexts, may not be appropriate or sufficient for I-VR. CPD in I-VR appears to be a “double-edged sword” in that different aspects of I-VR may amplify or mitigate perceptual and cognitive demands on learners.
In sum, much remains unknown about the mechanisms of CPD in I-VR. This underscores a need to study the interplay between environments, sensory input, cognition, and action in I-VR, and a need for a theory of CPD specific to learning in I-VR. This paper sought to lay a foundation to develop empirical lines of research examining CPD in I-VR, and to contribute to the development of adaptive I-VR systems that monitor and regulate stimuli to optimize cognitive demand and resource allocation. Practical recommendations proposed within this paper may inform instructional strategies based on the precise measurement and thoughtful regulation of CPD in I-VR for education and training.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Writing of this paper was partially supported by funding from Lockheed Martin Corporation contract MRA20-001-RPP006 and by US Air Force Office of Scientific Research (AFOSR) grant FA9550-22-1-0151, both awarded to Stephen M. Fiore. Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of these organizations or the University of Central Florida.
