Abstract
This paper explores the concept of “immersion” in virtual reality (VR) systems, emphasizing the technological aspects that contribute to a comprehensive and multisensory user experience. Immersion refers to a system’s technological capabilities to provide extensive, multisensory displays surrounding the user. While factors like display resolution, spatial audio, and cross-modal consistency characterize immersion, no singular element independently achieves complete immersion, and, as such, systems require a combination of factors. However, quantifying immersion is challenging due to the lack of standardized and systematic measures of VR system’s technical capabilities. Opportunities exist to establish operational frameworks and standards that objectively quantify immersive contributions. This paper reviews immersion concepts, measures, and technological factors. From this, we outline a series of recommendations aimed at advancing the study of immersion in VR.
Introduction
In this paper we provide an overview of the concept of immersion as it is used in the virtual reality (VR) literature. We have a set of complementary objectives with these analyses. First, we reflect on how immersion is defined conceptually and operationally. Second, along with this, we examine methods used to measure immersion. Third, we catalog representative technological factors that have been reported to influence immersion. Finally, we offer suggestions for advancing immersion research and practice. We start with an overview of presence and its relationship to immersion. This is followed by a more detailed discussion of immersion, its measures, and factors that may affect immersion. We conclude with a series of recommendations for stakeholders in the VR community to advance immersion research and practice.
Presence
It is difficult to talk about immersion without first discussing presence. Depending on context and scholar, these terms are often used interchangeably to describe the sensation of being situated in a virtual environment (VE), like VR, where the VE becomes more attended to than the unmediated (i.e., real) environment. However, in the interdisciplinary VR community, there is support to bifurcate these into separate categories, one of psychology and the other technology. For this paper, presence refers to the psychological factors that facilitate a person feeling both located within the VE (i.e., place illusion) and believability that the VE is plausible (i.e., plausibility illusion; Slater, 2009; Slater et al., 2022), and with immersion being comprised solely of technological elements. We make this distinction as this allows us to more directly focus on the technological elements potentially contributing to different psychological states. This provides a common ground for discussing technological capabilities, allowing for precise variations that can then be used to test how they may alter the VR experience.
Immersion
Like presence, immersion has a rich history with multiple definitions developed over time (e.g., Brown & Cairns, 2004; Cairns et al., 2006; Ermi & Mäyrä, 2005; Jennett et al., 2008; Nilsson et al., 2016; Slater & Wilbur, 1997; Witmer & Singer, 1998). We ascribe to the notion that immersion is a property of a system’s technology, namely the system’s ability to replicate the human senses (Slater et al., 2022; Slater & Wilbur, 1997). As such, immersion is concerned with how information is displayed to human senses (Slater & Wilbur, 1997). In synthesizing the literature on VR, and drawing on Slater and Wilbur’s (1997) framework, we suggest that immersion can be understood as the objective, inclusive, extensive, surrounding, and vivid illusion of reality the system can reproduce (Bowman & McMahan, 2007; Cummings & Bailenson, 2016; Slater & Wilbur, 1997). In this paper we state that objective (O) constitutes the measurable aspects of immersion. Inclusive (I) is used to capture the extent to which the VE becomes attentionally dominant. Extensive (E) represents the range of senses displayed. Surrounding (S) can be seen as the extent with which the system envelops the user; similar to field of regard (FOR) or a panoramic effect. Last, vivid (V) is the illusion of reality (e.g., the depth and richness of displays within each sensory modality) the system can reproduce. Finally, it can encompass self-representation (i.e., self-avatars), and cross-modal congruency (i.e., multisensory integration, proprioceptive matching).
Systems with “High” immersion can help focus attention on the mediated experience and may create disassociation from the physical environment, enhancing the sense of presence in the virtual world (Lee, 2004; Slater, 2003). Generally, systems that more accurately represent the senses across multiple channels (e.g., range of sensorimotor inputs, sensory-motor concurrency) provide greater immersion (Slater et al., 2022; see also McMahan et al., 2016).
To illustrate the importance of this conceptual separation, we next deconflate a set of constructs from the VE literature. First, while interactions within VEs are important for user experience, specific interaction techniques should be viewed as conceptually distinct from immersion, as realism of interaction and naturalness of controls can be viewed separately from sensory immersion (Bowman & McMahan, 2007; cf. Slater & Sanchez-Vives, 2022). Second, we suggest that content factors such as narrative-immersion, flow, challenge-immersion (Nilsson et al., 2016) and “plot” or “story-line” (Slater & Wilburg, 1997) be characterized as conceptually distinct from a system’s technological immersion. While we acknowledge the importance of these (e.g., flow, narrative) in VEs for learning and entertainment, we argue against integrating it into a “technology as immersion” model. We do this because the core components of these evoke a psychological and affective response that can be separated from objective technological elements (e.g., FOV, frame rate, pixel density). Third, while immersion has been widely operationalized using various measurement techniques, few academic works have systematically measured immersion strictly as a technological concept (e.g., Cummings & Bailenson, 2016; Bowman & McMahan, 2007; Slater & Wilbur, 1997; cf. Selzer & Castro, 2023). In other words, while immersion is conceptually well-defined as technological elements, operational definitions have lagged behind. For example, in several recent reviews, immersion is qualified by type of system (e.g., HMD-VR, CAVE, Mixed reality) or by labels such as non-immersive (e.g., PowerPoint, small screen desktop), semi/low-immersive (e.g., large screen desktop, 3DOF HMD-VR headset), and high/full-immersion (e.g., 6DOF HMD-VR, CAVE). To some extent, this is a misnomer, as these systems are limited in their extensiveness of sensory modalities (cf. olfaction, gustation). While these definitions are useful, they are not entirely descriptive of the systems’ use or capabilities, which we argue are needed for defining the systems’ abilities and their level of immersion.
Models that do quantify immersion typically focus on psychological factors (e.g., Immersive Experience Model, SCI-model, IPQ, Model of Immersion in Games) possibly combined with some technological elements (e.g., Nilsson et al., 2016), or rely on subjective measures to judge the level of immersion (e.g., Selzer & Castro, 2023). When it comes to objectively measuring immersion based on technological capabilities, as proposed by Slater and Wilbur (1997), there is a lack of concrete quantifiable metrics. Thus, developing more objective measures could further the standardization and comparison of the technical factors that influence immersion.
In response to the limited number of operationalizations of immersion as a technology in current literature, we offer foundational elements for a framework that defines immersion based on measurable system characteristics. Said most simply, a systems’ level of immersion can be operationally defined by the characteristics of the system. This can include factors such as visuals, audio, haptics, tracking, and multimodal congruence (i.e., inclusive, extensive, surrounding, vividness; Slater & Wilbur, 1997), as well as novel metrics derived from ideas such as tracking, stereoscopy, image quality, and FOV/FOR (Bowman & McMahan, 2007; Cummings & Bailenson, 2016). We next discuss a partial set of technological factors, based on the extant literature, that may influence immersion (see Table 1).
Factors That may Influence Immersion.
Note. *Non-HMD/CAVE studies.
Immersion and Technology
The level of immersion in VR hinges on various technological and design factors. Realistic, high-fidelity visuals (V) enhance the detail and realism of the virtual environment, making it appear more lifelike and plausible A broad FOV/FOR (S) can enhance immersion by providing a panoramic experience that envelops the user, making the virtual environment feel expansive and all-encompassing (Slater & Wilbur, 1997). Spatialized audio that responds to user actions (i.e., cross-modal congruency) also improves immersion through heightened sensory stimulation (Hendrix & Barfield, 1996a; Larsson et al., 2001; Serafin et al., 2018). Haptic feedback, such as force or vibration, enhances the sense of touch potentially affecting immersion by simulating interactions with virtual objects (Gorlewicz, 2013). Embodying the user with a self-avatar provides an anchor within the virtual environment, effectively matching multiple sensory cues (Johnson-Glenberg, 2018) to enhance immersion (Kilteni et al., 2013). High degrees of freedom for body movement (DOF) (Barfield et al., 1999; Hendrix & Barfield, 1996b) and minimal latency, combined with a high frame rate and refresh rate (Claypool & Claypool, 2007), further strengthen this immersive experience. This occurs through cross-modal congruence and preserves presence by enabling seamless interactions without disruptive lag (Meehan et al., 2003) and can reduce breaks in presence (BIPs). Eliminating imperfections in binocular images, ensuring consistent image quality between eyes in stereoscopic vision, and customizing interpupillary distance (IPD) to match the user’s eyes, can enhance immersion and may reduce fatigue and cyber/sim/VR sickness (Koo & Toet, 2004; Lambooij et al., 2009). Accommodation-supporting HMDs with advanced lens designs, such as varifocal optics, could further increase realism (Padmanaban et al., 2017).
While thoughtful optimization of various elements may enhance immersion, it is important to consider their interactions rather than maximizing each individually. By organizing findings on the technological elements that may enhance immersion, we aim to elucidate variables that can help test and specify these interactions. Finally, it's important to recognize that immersion is a multidimensional construct, that is, a composite of numerous contributing components, with each facet carrying the potential to enhance (or hinder) a given application (e.g., Skarbez et al., 2020). As noted by Bowman and McMahan (2007), “Immersion is not all or nothing, as the terms immersive and nonimmersive suggest, but rather a multidimensional continuum” (p. 39). Instead, multiple dimensions interact to produce a sense of immersion that can be intensified based on the synergistic implementation of those elements. This nuanced perspective of immersion, as a complex gradational product of many variables, provides a more useful framework for analysis and design.
Up to this point we have explored immersion’s conceptual and operational definitions, discussed how it may (or not) be measured, and provided a start to collating factors that may influence immersion. Next, we discuss a series of recommendations for helping stakeholders in the VR community to advance immersion research and practice.
Recommendations
Recommendation 1: One of the challenges in VR research is that there is no widely accepted taxonomy or framework for defining and measuring immersion as a technology. While several researchers (e.g., Bowman et al., Slater et al., Johnson-Glenberg et al., Skarbez et al.) have enumerated some technological factors that influence immersion (e.g., display characteristics, consistency of multimodal sensory input, precision of head and motion tracking, user embodiment representation, environmental fidelity and haptic/tactile feedback), there remains no unified theory. Without this, it is challenging to capture the combinatory effect of each element and assign some numerical rating, specified by the technology, that accounts for these various hardware/software parameters. This requires development of a robust taxonomy that would provide a standard vocabulary and set of benchmarks for immersion. Such a taxonomy would help identify the most critical factors for achieving immersion, allow for the optimization of VR system configurations and content design for targeted applications, and improve the validity of immersion measurements. While some attempts have been proposed (e.g., Skarbez et al., 2020) to operationalize immersion as a multi-dimensional vector, these remain undeveloped, with no comprehensive “immersion score” yet established. To address this gap, a standardized methodology for calculating and applying this score is necessary. This approach facilitates a nuanced evaluation and categorization of VR systems, transitioning from simplistic “levels” of immersion to a more precise ratio scale. For example, 0% immersion could represent a hypothetical scenario with no sensory input (i.e., a brain in a vat), while 100% immersion would equate to fully accommodating the range and richness of sensory information experienced in the real world.
With an objective systems specification in place, it could facilitate comparisons across different VR platforms and configurations to foster greater standardization in measurement and evaluation methods. This will provide an essential foundation guiding further advancements in VR technology, experiences, and assessments.
Recommendation 2: Research indicates that providing learners with first-hand experiential activities in “high” immersive VR environments, where they feel spatially situated within the VE, may benefit the development of particular spatial cognitive abilities that are required for understanding and manipulating spatial relationships and perspectives (e.g., König et al., 2021). These spatial thinking skills, which can include mental rotation, navigation, and spatial visualization, appear to show greater gains through embodied learning in immersive VR versus less spatially enveloping desktop interfaces and traditional instruction. High levels of immersion afforded by HMD and body tracking in VR seem to offer advantages for acquiring the spatial knowledge needed to reason about and operate within 3D spaces compared to studying abstract 2D representations. To better understand this, research is needed to explore the nuances of how increased immersion impacts building spatial knowledge. Additionally, we recommend that studies exploring spatial cognition should also measure the technological immersive factors such as FOV/FOR, display resolution, and tracking capabilities. Examining the relation between immersion and spatial learning gains can clarify the degree to which a greater sense of immersion in VEs facilitates the development of spatial thinking skills.
Recommendation 3: Researchers should examine the effects of individual factors on maintaining immersion (and to a lesser extent, presence). For example, while it is generally believed that rates above 60 (and ideally 90 or more) FPS are needed to avoid disrupting immersion for most users, we have seen few studies that actively explore this in terms of immersion and presence. Additionally, while there is some evidence to suggest that IPD can improve comfort and reduce motion/cyber/VR sickness, we have not seen enough research on these effects, arguing that further research is needed. A more detailed understanding of these factors and others can facilitate developers in determining performance requirements to optimize VR systems.
Recommendation 5: Studies should focus on complex interactions between immersion factors rather than just maximizing each one individually. Carefully balancing these elements may optimize immersion rather than just maximizing each in isolation. Providing a more nuanced view of immersion may help developers select critical/non-critical features.
Recommendation 6: It is challenging to identify the specific thresholds at which increasing an immersive feature no longer enhances the experience. Rarely is there an exact plateau where the effect of a particular factor peaks and additional improvements make little perceptual difference. Delimiting the degree and location of these plateaus remains an open question and a difficult yet worthy research goal.
Conclusion
Virtual reality shows great potential as a tool for enhancing learning and training. As VR systems become more widely adopted, it is important that their development and implementation be guided by theories and evidence-based practices. In this paper we lay the groundwork for advancing research to study, evaluate, and optimize virtual environments. We began by discussing foundational assumptions and definitions for immersion in VR, we discussed the multifaceted nature of immersion, collated representative research of elements that may influence immersion, and provided recommendations for advancing future research
Robust frameworks detailing technological capabilities are essential for matching the complexity of human cognition that occurs within VR’s multisensory, embodied interactions. The unique affordances of VR must be thoughtfully designed based on theory and research and not based solely on the newest technologies. Further research is needed to clarify the nuanced boundary conditions and mechanisms that determine VR's effectiveness across domains. Isolating the effects of specific technological capabilities and interactions on cognitive, behavioral, and affective processes will provide clearer principles for VR design, implementation, and measurement. With continued empirical studies, VR holds the potential to transform the educational, training, and entertainment landscape by bringing abstract concepts to life in immersive environments and enabling safe interactions for experimentation, complex skill practice, or transportation into narrative worlds. Realizing this potential requires converging efforts across disciplines to refine theoretical frameworks, models, and taxonomies capable of linking the cognitive with the technological. Such empirical research can guide the creation of captivating and powerful VR environments and systems. The insights and recommendations compiled in this review aim to spur progress toward these goals.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
