Abstract
As Virtual Reality (VR) technology continues to evolve, it presents both significant opportunities and challenges for audio designers, particularly in the context of music and storytelling approaches. This study explores the experiential and perceptual dimensions of VR audio through semi-structured interviews with leading VR audio expert practitioners. Building upon a previous investigation into the production challenges and techniques of VR audio, this research analyzes a distinct segment of the interview data from the same sample of participants, focusing on immersive storytelling and music integration in VR audio practice. The aim of the research is to uncover strategies and insights to help inform a practice language for this nascent field. Key findings highlight the transformative potential of VR audio in creating intimate and emotionally engaging narratives, emphasizing the importance of spatial design considerations, interactive sound approaches, and dynamic musical composition. Furthermore, the critical role of egocentric sound in enhancing immersion, the challenges of voiceover placement within spatial environments, and the potential of treating the sonic environment as an interactive character. This research contributes to the ongoing discourse on VR audio practice, offering valuable perspectives for audio professionals and researchers aiming to push the boundaries of storytelling and immersive experiences in VR.
Introduction
Virtual Reality (VR) has emerged as a transformative technology, offering unprecedented opportunities and challenges for audio designers, particularly in the realms of music and storytelling. As VR technology continues to evolve, it significantly reshapes the landscape of audio creative practice. Traditional audio practices adapted from established media such as film and gaming have laid the groundwork for VR audio design (Collins 2008; Schütze S and Irwin-Schütze 2018). However, the unique capabilities of VR Head Mounted Display (HMD) technology necessitate a reevaluation of these design strategies. This reevaluation opens the door for novel audio techniques that are only beginning to be thoroughly explored within academic research (Paterson and Kadel, 2023; Turner et al., 2021).
A crucial aspect of VR audio is its potential to revolutionize storytelling and music integration. The immersive nature of VR provides a unique platform for spatial audio, which enhances the user’s sense of presence and engagement within virtual environments (Bosman et al., 2023; Hedges et al., 2023). Traditional media such as film (particularly formats such as IMAX), live theater performance, and video games all strive to create an immersive environment, and involve design considerations that aim for audience engagement. However, VR HMD presents the further challenge of the user’s physical body controlling the representation of the virtual environment, which offers new avenues for how narratives are experienced. Music in VR also plays a pivotal role in enhancing emotional resonance and immersion. The dynamic and interactive nature of VR allows for innovative approaches to musical composition and listening experiences, creating opportunities for deeper emotional engagement. The integration of music within VR narratives offers a powerful tool for storytelling, capable of influencing the user’s emotional journey and heightening the immersive experience.
This study aims to contribute to and extend the current discourse on VR audio practice through interviews with leading VR audio expert practitioners. By focusing on the perspective of practitioners (rather than audience/users), particularly in terms of storytelling and musical integration, this research seeks to uncover novel strategies and insights that can advance the field of VR audio. This study extends on our previous work (Hedges et al., 2024), which analyzed a subset of interview data focusing on tools and workflows for VR audio, differences between audio production for conventional media and VR, and techniques for creating immersion. Here, we analyze a different subset of questions from the same interviews, examining audio approaches for storytelling in VR, challenges and opportunities for music in VR, participants’ professional backgrounds, and their advice for future practitioners. While both studies draw from the same interviews and participants, this study presents entirely new findings from a distinct data subset.
The findings emphasize the transformative potential of sound in creating compelling and immersive VR experiences, offering users unprecedented levels of intimacy and interaction within virtual narratives. This research also highlights critical implications for the use of audio in the future of training, education, cultural heritage, and professional practice in immersive storytelling (as outlined in; Privitera et al., 2024; Bosman et al., 2023; Slater and Sanchez-Vives 2016). By identifying innovative approaches to sound design and music integration, the study provides insights that can inform the development of industry practices and pedagogical tools, equipping practitioners with the skills needed to harness VR’s unique capabilities in crafting emotionally engaging narratives.
Background and related works
The landscape of VR experiences spans a broad spectrum of complexity. The simplest form starts with 3 Degrees of Freedom (3DoF) content, permitting rotational navigation (i.e., movement across yaw, pitch, and roll degrees) but restricts translational movement (i.e., movement across front/back, left/right, up/down degrees). This type of content, often integrated with 3DoF linear media delivered in a sequential and predetermined order, is commonly referred to as Cinematic VR or 360 Videos (Swords and Willment, 2024). Conversely, 6 Degrees of Freedom (6DoF) VR allows users to navigate both rotationally and translationally within a virtual environment, thereby necessitating more sophisticated spatial audio design strategies (Candusso, 2015). VR uniquely integrates these spatial and interactive dimensions, emphasizing its potential for novel storytelling and design approaches, especially through audio (Deacon and Barthet, 2022; McArthur and Kalonaris, 2017).
Our perception of VR is heavily influenced by physical orientation, particularly the orientation of the head and ears. This is known as the ‘egocentric’ perspective, defined as ‘the perceptual reference system for the acquisition of multi-sensory information in immersive VR technologies as well as the sense of subjectivity and perceptual/cognitive individuality that shape the self, identity, or consciousness’ (Geronazzo and Serafin, 2023 p.6). This shift in perspective from traditional media makes VR uniquely suited to spatial audio design, as the audience’s physical movements directly interact with the virtual environment.
Taking these unique factors into account, VR demands the development of new narrative techniques, as some traditional storytelling methods often fail to translate effectively in this medium. This demand for a novel vocabulary and understanding of best creative practices aligns with recent literature on the practice of screenwriting immersive narratives, termed as ‘Immersography’ (Reyes 2024). Immersography emphasizes the importance of crafting narratives that are intrinsically linked to the user’s spatial and interactive experiences, which is a fundamental departure from conventional media narratives. This approach to creating a new language for immersive screenwriting is also reflected through new developments in audio practice. Recent studies such as those by Paterson and Kadel (2023), as well as Turner et al. (2021) provide a number of early enquiries into how this language is beginning to develop through case studies and interviews in relation to immersive audio creative practice.
Design decisions in VR audio prioritize enhancing user immersion by fostering a sense of presence and emotional engagement within virtual environments. Immersion in VR is a multifaceted concept, particularly concerning the role of audio in influencing immersive experiences. While there is ongoing debate about the precise definition of immersion, Lee (2022) offers a framework for understanding immersive experience in the context of extended reality (XR). This framework emphasizes the importance of physical presence, social presence, self-presence, and involvement, dividing immersion into technological attributes and psychological aspects, and considering subjective factors affecting immersion. Biocca (1997) further elaborates on the definition of presence to include: (1) Physical Presence: The perception of virtual objects and environments as real. (2) Social Presence: The perception of virtual social actors as real. (3) Self Presence: The perception of the virtual self as real. Involvement, a term linked to cognitive absorption from video game studies, incorporates elements of challenge-based and task-based immersion commonly referred to in digital game research (Calleja, 2007).
In terms of how audio relates to immersion, research highlights that spatial audio attributes and continuous background auditory information are crucial for replicating real-world scenes, while congruent sensory feedback and high-quality sound further bolster the sense of presence (Nordahl and Nillson, 2014). Research on how audio impacts involvement in the context of virtual environments points to the importance of music’s emotional impact, musical literacy, and interactive responsiveness (Van Elferen 2016). The emotional impact of music is of particular importance in VR experiences. This aligns with the concept of affect, involving personal investment through memory, emotion, and identification, described by Massumi (2021) and further elaborated by Gregg and Seigworth (2020). They argue that affect is a performative dimension of intensity, deeply embedded in sensual perception and embodied subjectivity. Van Elferen (2016) suggests that music’s affective performativity often surpasses conscious perception in the context of digital game immersion, which extends to VR by enhancing the immersive experience through emotional commentary on visual or interactive events.
Recent research highlights the transformative potential of VR as a platform for reimagining music experiences, enabling new creative and interactive approaches to composition, performance, and listening. Unlike traditional mediums, VR offers unique opportunities for immersive sound diffusion, allowing users to experience music in a spatialized, embodied manner akin to being within a concert hall but without the logistical constraints of physical performance spaces (Buckley and Carlson, 2019). These interactive soundscapes are often informed by principles from acoustic ecology and digital instrument design, encouraging user agency in shaping auditory environments. Studies also explore the integration of dynamic composition techniques, such as algorithmic generation and spatial modulation, to create responsive and evolving musical experiences (McErlean, 2018). These innovations extend beyond performance to transform music listening itself, with VR environments enabling listeners to inhabit ‘virtual auditory realities’ that blur the lines between narrative, emotion, and space (Findlay-Walsh, 2021). While these advancements demonstrate the medium’s creative potential, they also highlight challenges, such as the need for intuitive interaction design, accessible tools for creators, and further exploration of how music in VR can emotionally and narratively engage users in ways distinct from traditional audiovisual media (Buckley and Carlson, 2019; Çamcı and Hamilton, 2020).
While this research area isn’t entirely new, the number of studies that have contributed to our understanding of VR audio practice remains limited. Turner et al. (2021) conducted interviews with audio professionals in immersive media experiences (IMEs), highlighting the significance of spatial audio, user interaction, narrative quality, and visual content in creating immersion. They identified key challenges such as the difficulty in crafting auditory distance and the need for high-quality, multi-sensory stimuli to enhance immersion. Similarly, Paterson and Kadel (2023) examined XR audio case studies, uncovering challenges and strategies related to technologies, tools, techniques, and perception (3TP). Their findings underscore the importance of Ambisonics, binaural synthesis, and reverberation in spatial audio production for XR, which is further supported by Candusso (2015). Issues such as the rapid obsolescence of tools and the need for custom software configurations persist, while advanced techniques like dynamic head-tracking and 360° audio integration offer significant experiential benefits. Both studies highlight the necessity for innovation in tools and workflows and a deeper understanding of perceptual factors to overcome current limitations and improve the immersive quality of VR audio experiences. These insights provide a detailed context for exploring the novel solutions and strategies employed by leading VR audio practitioners in addressing the challenges of immersive audio production.
While VR audio research has advanced significantly, several knowledge gaps remain, particularly in the areas of storytelling, music practices, and practitioner insights. First, there is a critical need to develop new narrative techniques for integrating sound into VR storytelling. Traditional sound design methods, often rooted in linear cinematic workflows, are insufficient for the dynamic, spatial, and interactive demands of VR, where sound plays a central role in guiding attention and shaping user experiences (Candusso, 2015; Dumlu and Demir, 2020; Eames, 2019). Additionally, while VR’s potential as a platform for music listening and composition is increasingly recognized, exploration in this area remains limited. Studies such as Buckley and Carlson (2019) highlight the unique affordances of VR for creating interactive soundscapes and dynamic musical experiences, yet practical frameworks for integrating these approaches into mainstream practices are still in their infancy. Moreover, the evolving role of music in VR as both a narrative and emotional driver, particularly through techniques like algorithmic composition and adaptive soundtracks, has not been sufficiently studied (McErlean, 2018; Çamcı and Hamilton, 2020). Lastly, there is a lack of consolidated insights from experienced practitioners on how to navigate the interdisciplinary and rapidly evolving nature of VR audio production. While some studies provide anecdotal insights into challenges and strategies (Turner et al., 2021), further synthesis of expert knowledge is essential to guide future creatives in the field. Addressing these gaps will require interdisciplinary collaboration and innovative methodologies that bridge narrative, musical, and technical dimensions in VR audio practice.
Given the identified knowledge gaps and challenges discussed in this section, the following research questions were formulated. These questions also guided the development of the semi-structured interview questions for this study, which are detailed in Appendix A. The aim is to uncover new strategies and insights while also validating those identified in previous studies. (1) What new opportunities does VR provide for storytelling through sound, and what techniques highlight this? (2) What new opportunities does VR provide for music listening and composition, and what techniques highlight this? (3) What advice do expert VR audio practitioners have for future creatives interested in the field?
Methodology
The aim of this research is to uncover strategies and insights specific to the use of sound and music for storytelling in VR audio practice. To achieve this, the study employs thematic analysis (TA), specifically drawing on the reflexive approach outlined by Braun and Clarke (2006, 2023). This approach was chosen for its emphasis on the researcher’s active role in the co-construction of meaning, aligning with the study’s interpretive and exploratory focus. Reflexive Thematic Analysis (RTA) was particularly suited to the nascent and evolving nature of VR audio practice, as it allows for rich, nuanced interpretations that integrate the researcher’s professional expertise as a lens for analysis (Braun and Clarke, 2023).
Although the terminology of RTA was not explicitly adopted during the research design stage, the analytic process reflected its principles in practice. Specifically, the analysis prioritized the generation of themes as interpretive outputs underpinned by patterns of shared meaning rather than mere topic summaries. Reflexivity was embedded throughout, acknowledging the researcher’s positionality as an early-career VR audio practitioner and the influence of this on both data interpretation and theme development.
This approach builds on and complements related studies, such as Turner et al. (2021), which explored spatial audio design challenges in immersive media experiences. Unlike Turner et al., this study focused exclusively on VR HMD audio practice and employed one-on-one semi-structured interviews as the sole method of data collection to ensure consistency and depth in the dataset.
Participant selection
Participants were chosen using a non-probabilistic, purposive sampling approach, targeting individuals recognized as leading experts in VR audio practice. The criteria for defining a ‘leading expert’ included a professional history of five or more years in VR audio, with contributions to prominent VR production studios (both indie and AAA), leading VR-related technology companies, or research institutions, as well as a substantial body of work in VR audio production or research, judged by the authors to demonstrate notable expertise. The recruitment process involved identifying experts through professional networks, publications, and online VR communities.
From 14 interviews conducted, two participants were excluded during the analysis phase as their output as a VR audio practitioner did not meet the defined criteria. The remaining 12 participants represented a range of technical and creative expertise across cinematic and interactive VR formats (3DoF and 6DoF). Their professional experience ranged from five to 20 years, covering diverse roles in the VR industry. While demographic data (e.g., gender, ethnicity, or geographical location) was not explicitly collected, participants’ varied educational and career backgrounds are reflected in the findings. This omission aligns with the study’s focus on professional insights rather than personal demographics. Interviews were mostly conducted online via Zoom as the majority of participants were based overseas, except for one interview conducted face-to-face in Sydney, Australia.
Data analysis
The analysis followed the six-phase reflexive thematic analysis process outlined by Braun and Clarke (2006, 2023), with reflexive engagement at its core: 1. 2. 3. 4. 5. 6.
Themes were developed as outputs of the analytic process, demonstrating nuanced interpretations rather than descriptive topic summaries. This process ensured alignment with RTA principles, emphasizing meaning-making over data reduction. The researcher’s professional background as an early-career practitioner and researcher in VR audio informed the interpretation of themes. Reflexivity was central to the analytic process, ensuring transparency and critical engagement with the researcher’s influence on the data.
Extension of previous study
As mentioned in the introduction section, A previous study conducted by the authors used the same methodology described here on a different subset of interview questions from the same set of interviews (Hedges et al., 2024). Themes from this analysis informed the initial thematic framework presented in a conference paper. Subsequently, a second thematic analysis was conducted on transcripts addressing answers to different questions from the same interviews. The themes derived from this second analysis were integrated into the overall framework, enhancing the breadth and depth of the study’s findings.
This two-phase approach was intentional and aligns with how RTA emphasizes the researcher’s active role in interpreting the data, allowing for flexibility in adapting the analytic process to the research’s evolving focus (Braun and Clarke, 2023). The analysis remained inductive throughout, with themes grounded in the data rather than being predetermined by theoretical frameworks, ensuring that the findings authentically reflected participants’ perspectives. By analyzing subsets of the data independently, we ensured detailed engagement with specific areas of inquiry, reducing the risk of superficial analysis that could arise from attempting to synthesize all themes in a single step. Integrating themes from the second phase into the overarching framework allowed us to construct a comprehensive narrative that better captures the nuanced experiences and strategies of VR audio practitioners. This approach also reflects the iterative and dynamic nature of RTA, where themes are not static outputs but are refined and enriched as the analysis progresses.
The decision to structure the analysis this way was guided by the research aim of exploring distinct yet interconnected facets of VR audio practice. The initial analysis focused on storytelling and music as foundational elements of the VR audio experience, while the subsequent analysis added further insights into technological and workflow challenges, spatial audio techniques, and professional advice for emerging practitioners. This layered analytic strategy aligns with the interpretive and exploratory goals of the study, leveraging the flexibility of RTA to uncover a richer, more holistic understanding of the field.
Results and discussion
During the thematic analysis, 1175 initial raw codes were identified through a line-by-line coding process, which was conducted as a way to immerse the researcher in the data and start thinking about patterns of meaning, rather than a systematic reduction of the data. These were initially generated as descriptive codes, then subsequently refined into 10 focused codes, including a ‘Miscellaneous’ category for any lines deemed irrelevant to the study. The focused codes were then analyzed and developed into sub themes within the final thematic framework, as illustrated in Figure 1 and detailed in Table 1 below. The following sections elaborate on these themes, presenting key findings and relevant excerpts from the interviews that align with the study’s objectives. Illustration of extended thematic framework identified from VR audio expert insights. The upper sections are the themes and sub-themes explored in this study, and the lower sections are those explored in Hedges et al. (2024). Expanded thematic framework for VR audio practice (Themes with * adapted from Hedges et al., 2024).
Background and experience of VR audio experts
This section explores the educational and early career backgrounds of VR audio experts, illustrating the varied pathways that led them into the field. The participants’ experience with VR audio in a professional setting ranged from 5 years to 20 years, with participants working for established VR and game production companies, VR industry leading tech companies, as well as notable academics and researchers in the field of VR audio practice. Participants had broad educational backgrounds. Six had formal education in audio-related fields, such as audio technology, acoustics, and sound design. Four participants transitioned from non-audio fields like architecture, engineering, and business. Additionally, two participants did not mention formal education, instead highlighting their reliance on practical, hands-on experience in live sound, theater, and game development. This diversity underscores that VR audio expertise can arise from various academic and practical foundations, which contrasts slightly with findings in a recent survey of the game audio industry, where 83% of game audio professionals hold a bachelor’s degree or higher in music/audio-related fields (Schmidt, 2023). This finding suggests that novel skill sets and knowledge obtained from certain fields, such as architecture, can prove useful as this new spatial media industry develops.
The early careers of the participants were equally diverse. Many began in game audio, film production, or academia, acquiring skills that would later apply to VR audio. Others transitioned from non-audio industries, such as video post-production and advertising, or explored art through electronic music and visual projects. The transition to VR audio was typically influenced by exposure to emerging technologies, involvement in specific projects, and participation in meetups and game jams. Some participants encountered VR through industry work, while others discovered it through self-driven exploration. Early experiences with VR technologies like Oculus Rift and developer kits played a significant role in their transition, reflecting a proactive approach to integrating new technologies. In summary, the backgrounds and experiences of VR audio experts highlight the interdisciplinary nature of the field. Whether through formal education in audio, transitions from other fields, or practical industry experience, the journey into VR audio is marked by diverse pathways and a common thread of adaptability and enthusiasm for emerging technologies.
Unique narrative potential of VR
New languages for VR audio storytelling
This sub theme focuses on how VR offers a unique perspective for audio driven storytelling approaches. The transition to VR from traditional media necessitates a fundamental shift not only in a design approach (Hedges et al., 2024), but also in the narrative language used. Traditional storytelling techniques often fail to translate effectively in this environment, prompting a need for new vocabulary and creative practices. ‘In VR, [most traditional practices are] not a language we can use. We have to change our language because that language doesn't work. And so we need a new language. We are still developing a vocab, and I don’t know that we’re going to be given the time to develop the vocab to its potential’ (P1).
The shift to VR necessitates the development of new narrative vocabularies and techniques, as emphasized by P1. This aligns with the concept of creating a ‘spatial and interactive literacy’, which has been termed ‘Immersography’, presenting the basis for a new vocabulary in writing immersive narratives (Reyes 2024). This new philosophy of creative practice is echoed in P6’s advice to ‘think of this as a fresh platform that is informed by existing media formats but ultimately offers something totally unique’. P2 supports this and points to the importance in the act of separating from traditional narrative approaches to reach the full potential of VR: ‘Once you start divorcing your practice from just the two-dimensional screen, you give yourself more opportunity to be more esoteric and creative’. This freedom is essential for pushing narrative boundaries and creating unique experiences, and audio plays a key role in creating this new language.
Importance of intimacy in VR audio
A particularly novel aspect of VR audio with strong narrative potential includes the level of intimacy and emotional engagement allowed through the egocentric nature of the medium. P2 elaborated on how (for them) this is a critical new feature of audio storytelling for VR: ‘Especially now, when we talk about having more distance and more directions to play with, you can bring things closer and have the person experiencing the content have a more intimate relationship with those elements or those sound sources’ (P2).
As P2 notes, VR allows creators to ‘bring things closer’ to the listener, fostering a deeper connection with sound sources. This idea is closely related to the concept of egocentric audio, where sound plays a key role in utilizing perceptual and physiological factors for spatial anchoring between humans and immersive technology (Geronazzo and Serafin, 2023). P2 continues to emphasize the creative freedom intimacy in VR provides: ‘You can use this to further your storytelling’. This sentiment points to the idea that understanding the capabilities of egocentric audio as the shift towards interactive, spatial, and embodied audio experiences that leverage proximity and direction can create significant emotional impact (Geronazzo and Serafin, 2023). Whilst this opportunity is promising to experiment with, understanding the psychological impact of sound is crucial in VR, as P2 points out: ‘What’s more important is understanding why people feel stuff because of sound’. This reflects the need for continuous knowledge exchange between psychophysical research and the technological development of virtual environments to create both meaningful and perceptually authentic experiences. This unique ability of VR audio to provide a level of narrative intimacy also aligns with the third paradigm shift in Human Computer Interaction (HCI) research, focusing on situated, embodied, and social experiences characterized by emotions and complex relations (Harrison et al., 2007).
Positioning the voiceover in VR
Another key audio driven narrative element in traditional media is the use of voiceover for narration. However, when adapting this approach to VR, positioning the voice presents challenges that are not encountered in traditional media. The spatial freedom in VR means that the source of the voice can come from any direction, requiring thoughtful consideration of its placement to maintain narrative coherence. ‘So where should I put the voice? Should I put the voice in front of you? Or maybe it's good enough if I put the voice behind you, and what happens about the fact that you could completely rotate your head?’ (P11).
As P11 points out, the freedom to position the voice in any direction requires careful consideration in terms of its context within the narrative. This is crucial since improper placement can disrupt the immersive experience by creating a dissonance between the audio and the visual elements (Vosmeer et al., 2017). P4 further elaborates on the intricacies: ‘And the voiceover is naturally something that was recorded in a studio very close to your face. But suddenly you have to distinguish where you are mixing those elements like the voiceover. You can’t just put it in the front; otherwise, you would have a ghost in your scene that’s not really there’.
This highlights the need to balance traditional recording methods with the spatial dynamics of VR to avoid breaking the illusion of self-presence (Barbara and Haahr, 2021). However, the importance of not overusing spatial design choices without a purpose that supports the narrative is also crucial. On this occasion, using stereo head-locked audio can have its advantages depending on the context, especially considering how the range of possible viewer roles can change drastically throughout the experience, particularly in cinematic VR (Weaving, 2024). Further research indicates that second-person voice overs can create a strong connection with the viewer, although their effectiveness can vary depending on the viewer’s prior VR experience (Vosmeer et al., 2017). Both the experts and the established literature reflect that voiceover techniques need to be carefully considered for each VR experience to maintain their impact on the narrative.
The sonic environment as a character
This section explores the concept of treating the environmental sound design as a dynamic and interactive character in VR. By doing so, the setting becomes an integral part of the narrative, significantly enhancing the user’s perception and engagement. Utilizing spatial audio techniques and crafting detailed soundscapes guide the user’s attention and interaction, making the VR experience more compelling and memorable. ‘We treat our environments like characters unto themselves in some ways, where each one has its own characterization and voice, making it a living, breathing space’. (P3)
P3’s insight underscores the importance of giving each environment its unique characterization and voice, which can transform the setting into a ‘living, breathing space’. This approach is supported by P1, who likens the importance of environments in VR to significant narrative elements: ‘The environment must be treated as a living, breathing character. In both cases, the environments were as important as if they were Smaug the dragon’. This perspective aligns with findings from Haehn et al. (2024), which indicated that ambient sounds support a deeper involvement in video games. However, they also noted that this effect was only significant when character sounds were turned off, suggesting that ambient sounds play a more subtle role in dynamic virtual environments. Nevertheless, by treating environments as characters, designers can enhance these aspects in VR, making the experience more engaging and emotionally resonant.
P7 further elaborates on this concept by emphasizing the importance of creating beautiful and contrasting ambiences: ‘And what I've brought to that is to make the ambiences and the world really beautiful... so that when I do those heroic, big mega sounds, the contrast is really apparent’. This strategy not only heightens the impact of significant audio events but also ensures that the background environment enriches the narrative.
Narrative driven mixing decisions in VR
Just as creating a balanced visual environment can assist in VR storytelling, effective audio mixing in VR is crucial for enhancing narrative immersion. Participants highlighted several techniques and considerations unique to VR audio production, such as managing the frequency spectrum, balancing 3D sound elements with head-locked music, and employing nuanced techniques to enhance the narrative through audio. P9 points to the need for a nuanced approach: ‘I have learned to value immersive dynamics and mix decisions that truly serve the end user by providing the most holistic and honest representation of the work. Since features vary in immersiveness, importance, and priority, the complexity of spatial audio adapts accordingly. There may be something enormous and exciting in the world but relatively unimportant on screen, so I’ll look for ways to design audio that feels naturally big but mixed relatively low to more important tracks’ (P9).
This need for nuanced audio mixing is supported by the findings of Bargum et al. (2023), who emphasize the sensory and interactive benefits of VR in enhancing the experience of visually positioning sound sources. Furthermore, Participant P3 discusses the importance of managing low frequency spectrum: ‘I wouldn’t want a lot of low end eating up a lot of my ambient space, or because I think it’s unnerving in VR’. This highlights the need to consider the influence of certain sound frequencies in overwhelming the auditory space, ensuring that each element contributes to the overall immersive experience without causing discomfort. Bass frequencies tend to have a strong impact. Lessiter et al. (2001) observed that their inclusion in both multichannel and stereo audio playback significantly enhanced the sense of presence, suggesting that it may play a unique role in the user experience of virtual environments.
P4 addresses the challenge of balancing 3D sound effects with non-diegetic music: ‘When you have a 3D sound effect and non-diegetic music, it will be so hard for the little sound effect to be localized because music, especially, fills out the whole spectrum and a little sound effect maybe just tests a small band, right?’ This further reinforces the importance of thoughtful integration of diverse audio elements to maintain clarity and focus within the soundscape. The creative freedom allowed from VR through its spatial possibilities still needs care taken to create meaningful narrative elements and minimize cognitive load.
New opportunities for music in VR
Challenges of VR as a music platform
The interviews with VR audio experts revealed insights on current limitations in VR music production and its slow adoption within the music industry. However, these challenges are balanced by the potential for innovative approaches that could transform music experiences in VR. This is particularly interesting for music driven narrative experiences, as P6 explains: ‘Where it’s going to be more exciting for narrative experiences in music is when we start adopting more disruptive models, like thinking in terms of a timeless musical piece that’s generative’ (P6).
The literature supports this potential by highlighting the shift towards context-based models in the music industry, where the focus is on the experience of music rather than mere consumption (Charron, 2017). Advances in digital technologies and immersive experiences often blur the boundaries between live and mediated performances, offering new ways for audiences to engage with music. Whilst this potential is promising, integrating music within VR faces significant hurdles, including industry resistance, financial viability, and the meaningful application of VR techniques. P2 highlights the slow pace of adoption: ‘It’s unlikely that all the studios are going to have to do a VR mix when people come in to record their album soon’. This hesitation within the industry is a major barrier to widespread VR integration, with P5 pointing out broader issues affecting the music industry: ‘The music industry is currently facing several significant challenges, which in turn creates an environment that is not conducive to the emergence of new experiential mediums like XR’. This context complicates the adoption of new technologies like VR, as the industry struggles with financial sustainability as a whole (Cohen, 2023). However, VR also offers new avenues for artists to share their creations outside of the traditional and problematic methods, especially in the context of virtual performances (Charron, 2017).
In terms of the potential for VR to become a standard medium for music consumption, P10 underscores how first, VR itself must be a standard consumer medium ‘It’s so dependent on the medium itself, like VR music, sure, might be a standard if VR becomes a standard, you know, like they’re not quite there personally’. This reflects the current state of VR technology and its acceptance within the broader industry. The potential for VR to create dynamic, personalized, and engaging musical experiences is compelling. By leveraging immersive technologies, VR can enhance the sense of presence and participation in virtual concerts, making them more appealing and meaningful to audiences (Onderdijk et al., 2023). The concept of co-creation in VR, where users actively participate in the creation and experience of music, aligns with the evolving nature of music consumption in the digital era (Vargo and Lusch, 2004; 2008).
Enhancing emotional engagement through music
Music is crucial in establishing emotional contexts and deeply influencing user engagement, making it an essential component of VR storytelling. By carefully selecting and integrating music, VR designers can evoke specific emotional responses, synchronizing visual and auditory elements to create a profoundly immersive and resonant experience. ‘I think the emotional context that music sets is even more important in creating the emotional context in a VR experience’ (P3).
As P3 emphasizes, the emotional context set by music is critical in VR experiences and their narratives. This aligns with the concept of affect, which involves personal investment through memory, emotion, and identification, described by Massumi (2021) and further elaborated by Gregg and Seigworth (2020). They argue that affect is a performative dimension of intensity, deeply embedded in sensual perception and embodied subjectivity. P3 also notes, ‘Music plays a huge role in really helping create that experience and set the tone and the emotional relativity to that stuff’. This sentiment reflects the findings of Van Elferen (2016), who suggests that music’s affective performativity operates at levels that often surpass conscious perception, in the context of digital game immersion. Extending this to VR, this has the value of enhancing the immersive experience by providing emotional commentary to visual or interactive events.
P7 highlights the interactive nature of music in VR: ‘But in terms of what you see, being in the space while you’re hearing that music, that interaction is what can make it really special’. This interaction between auditory and visual elements is crucial for creating a compelling VR experience. As music synchronizes with visual cues, it enhances the sense of presence and emotional depth, making the virtual environment more engaging (Nordhall and Nillson, 2014). This is reflected in research by Jørgensen (2008) that demonstrates that music is vital for emotional engagement in video games, which can be extrapolated to VR. Without music, users express a lack of involvement, even in dramatic scenarios, underscoring the importance of auditory elements in creating immersive experiences. Moreover, Massumi (2021) discusses the concept of cross-temporality in affect, where musical affect spans past, present, and future, creating a continuous loop of emotional engagement. This cross-temporality is particularly relevant in VR, where music can guide users through different temporal and emotional states, enhancing the overall narrative experience.
Spatial design considerations for VR music
This section explores the integration of new technologies and spatial audio techniques in VR music production, highlighting both the opportunities and challenges. The use of three-dimensional audio in 6DoF virtual environments allows for a richer and more immersive soundscape, but it requires careful consideration to avoid overwhelming or disorienting listeners. One example is how the inclusion of spatial design factors, such as architectural acoustic considerations, can provide new create compositional avenues: ‘But now we have that opportunity in VR, because we can layer elements of the mix in not just a spatial orientation, but also by introducing room effects, occlusion effects’ (P6).
The potential of VR technology to revolutionize music production lies in its ability to layer audio elements spatially and introduce effects that enhance realism and plausibility. P6’s statement encapsulates this potential, emphasizing the use of spatial orientation, room effects, and occlusion effects to create immersive compositions. This aligns with the findings of Summers et al. (2021), which analyzes a panel of experts discussing the development of tools for manipulating audio objects in space to enhance perceived realism within VR environments, and how this can be a creative avenue for music within the medium. Furthermore, P10 underscores the importance of spatial design considerations needing to start early in the compositional process to maximize their impact: ‘I feel like spatial music can be great if it’s being composed to be spatial’. This suggests that the artistic intention behind music composition needs to align with the spatial capabilities of VR to achieve a compelling auditory experience.
P11 highlights how current design challenges are not only based on technology, but on our way of thinking: ‘There's a lot of things that we still need to solve in the listening experience. It's not exactly a technology problem; it's just a matter of finding new ways to use these powerful instruments’. This indicates that while the technology is available, its effective application requires innovation and refinement in artistic and technical approaches. P5 further illustrates the interactive potential of VR in music, recalling a specific example: ‘You can move around the musicians, focus on the first violin or the cellos, and even walk up to the conductor’s podium’. This ability to interact with and move around the music provides a unique and personalized experience, one that takes advantage of the fact that it cannot be replicated in the real world. This unique interactive potential was further discussed by Summers et al. (2021), highlighting the use of VR to position music within virtual spaces, allowing users to interact with the sound sources and experience different mixes as they move around.
Listener orientation as a creative tool in VR
Participants often emphasized that the key change between standard music platforms and VR is the ability to allow the listener multiple perspectives to experience the music. These techniques offer dynamic and interactive listening experiences, for example, by enabling specific instruments to become more prominent based on the listener’s focus. This approach allows for a more personalized engagement with music in a virtual environment. In describing a specific approach to this in the context of a creative project where the listener is inside a sphere surrounded by orchestral instruments, P1 mentions: ‘So literally, you can kind of hear everything, you can hear the whole piece, but depending on where you’re looking, that particular instrument would actually raise up in volume like it was a solo instrument. So if you basically honed in, it’s like, “Oh, there’s a trumpet right there,” and I can hear the trumpet more clearly than everybody else, and then I move back, and it would sort of just fall back into the mix, etc’. (P1).
This use of spatial mixing based on listener orientation in VR audio allows for a novel musical experience by making specific instruments more prominent based on the listener’s orientation. As P1 describes, this technique allows listeners to isolate and emphasize different elements of a composition, creating a dynamic and interactive experience. P4 elaborates further on the potential of spatial audio to create unique interactive experiences: ‘Suddenly you can change it and move into the direction of game audio and to another room, and then there is another instrument playing’. This highlights the ability of spatial audio to transform the way music is experienced, making it possible for listeners to interact with and explore the music in three-dimensional spaces.
P6 discusses the conceptual shift required for effective spatial audio design: ‘We won’t necessarily think in terms of “Oh, let me put this to the left, or let me put this above the listener’s head,” but what does it actually mean to be in a space that is the piece itself?’ This perspective emphasizes the need for a holistic approach to spatial audio design, where the space itself becomes an integral part of the musical experience. This concept of the virtual space as an instrument is aligned with the established field of research known as VR musical instruments (VRMIs). Frameworks and design principles surrounding VRMIs have been discussed by Turchet et al. (2021) and Serafin et al. (2016), which emphasize principles such as designing for feedback and mapping, considering both natural and ‘magical’ interaction, and creating a sense of presence. These principles align with the interview findings, suggesting that spatial audio techniques can offer new forms of artistic expression and deeper emotional connections with the audience that blur the line between interactive experience and musical performance.
Creative placement of music elements in VR
One of the most obvious advantages of composing music for VR explores the creative placement of music elements around the listener. Participants emphasized the potential for innovative spatial arrangements that leverage the unique capabilities of VR, transforming how music is perceived and interacted with by listeners. ‘But we have an entire sphere. We could put bass under your feet. We could put voices in heaven over you. This is something that's still not really used, and people don't know exactly how to do that’ (P11).
The potential of VR to revolutionize music experiences lies in its ability to create intimate and dynamic interactions with music. As P11 highlights, VR allows for an entire sphere of sound, enabling bass to be placed underfoot and voices above, creating a truly immersive environment. This concept aligns with the findings of Barboza et al. (2021), who discuss how spatial sound can enhance harmonic clarity and rhythmic perception by avoiding spectral clashes and providing a larger sound space. P6 expands on this: ‘What if the singer could whisper into my ear and walk away from me?’ This illustrates how VR can make the listener feel as though the singer is physically present and moving within the same space, thereby enhancing the emotional connection and engagement.
P8 cautions against overuse of spatial techniques that detract from the musical experience: ‘Don’t do stupid things like putting flying background vocals around and take the listener’s experience away from that beautiful vocal that’s happening’. This advice underscores the importance of thoughtful and purposeful application of spatial audio to maintain the integrity of the musical piece. A recent study of design principles for spatial audio instrument placement explored by Barboza et al. (2021) describes how the use of spatial audio approaches such as the use of Ambisonics can provide greater fidelity to instruments’ natural timbre and avoid masking effects, which enhances the listener’s experience.
Diegesis considerations for VR music
This section examines the role of diegesis in music (where the music originates from in respect to the narrative environment) in enhancing the immersive quality of VR experiences. By embedding music through everyday objects within the VR scene, creators can anchor the music within the virtual environment, increasing the user’s sense of presence and making the experience more authentic and engaging. ‘And this kind of complex language that we are used to watching in a movie, this could be enhanced a lot using binaural sounds in games, or also in 360 videos because you can localize diegetic music while you are not localizing non-diegetic music’ (P11).
The incorporation of diegetic music in VR leverages spatial audio technologies with our understanding of the virtual world, extending our ability to influence the narrative through composition. This connection to the understanding of the virtual world aligns with findings by Jørgensen (2008) that highlight the effectiveness of auditory information in complex visual contexts. P11 further explains the challenge of maintaining an extra-diegetic point of view with spatialized music: ‘It’s very difficult to maintain an extra-diegetic point of view when we go into localized, spatialized music… it becomes diegetic’. This insight is crucial for VR designers aiming to use music as an immersive tool rather than a background element, as supported by Van Elferen (2016), who discusses the role of adaptive and non-linear game music in guiding player interaction and enhancing immersion.
P10 illustrates how diegetic music can be seamlessly integrated through a specific example: ‘There’s a scene where there’s like a little radio in a diorama, and it sounds like the music is coming from there’. This approach ensures that the music feels like a natural part of the scene, enhancing the user’s sense of presence. This is supported though P12, who provides a similar practical solution for integrating music within the VR scene: ‘But there are other ways to do it while staying in VR, like putting down their TV or a radio and letting that music play on the TV or the radio in that virtual environment’. This method uses familiar objects to anchor the music within the environment, making the virtual world feel more tangible and real. The literature supports these insights by emphasizing the role of auditory information in enhancing gameplay and immersion. Jørgensen (2008) notes that audio is particularly useful for providing information when the visual system is overloaded or restricted, making it an ideal medium for enhancing VR experiences where visual information alone may not suffice.
Expert advice for future VR audio practitioners
The final question of the semi-structured interviews asked the experts for advice for creative professionals interested in pursuing VR audio as a practice. The responses revealed that developing technical skills and mastering audio tools are crucial for new VR audio professionals. One participant noted the importance of understanding the creative nuances in VR: ‘You really need to think about upping your skill level as a creator’ (P2). Another emphasized the need for tool proficiency: ‘Learning the tools is important, because without them it’s like trying to hammer a nail into a wall with your fist’ (P2). Starting with fundamental technologies like game engines and Ambisonics, then expanding expertise, was also recommended: ‘Start small, maybe learn a game engine, middleware, or Ambisonics, and build on that’ (P4). Along with technical knowledge in-depth understanding of human auditory perception is key. One participant advised, ‘Think about why humans respond to the things that they hear the way that they do’ (P2). Listening to and connecting with one’s environment was also highlighted: ‘Just really listen. I’ll be outside, or anywhere, and just stop and listen’ (P5). Taking a mindful approach to sound enhances creativity: ‘Whether you pursue this professionally or casually, be grateful for the opportunity to create and express yourself. It’s a wonderful gift that not everyone has the chance to experience. Seek out knowledge and real-world experiences to open your mind and challenge the way you think’ (P9).
Understanding the historical context and evolution of audio technologies provides valuable insights. One expert emphasized, ‘Knowing where something came from so you don’t repeat the same mistakes’ (P3). The video game industry is a leading field in VR audio: ‘The best thing to do is to go into the video game medium because video games are leading all the things around VR’ (P11). Whilst this advice to build on skills from traditional media is highly useful, embracing experimentation is crucial in VR audio. One participant encouraged a creative mindset: ‘Do not try to think in terms of, “I make music in stereo,” but think in terms of how you can break that mold’ (P6). The field offers ample opportunities for experimentation: ‘Just think free and try stuff out’ (P12). Staying ahead of technological advances, especially in AI, is essential: ‘Get ready for AI. It’s going to change the industry’ (P3). The connecting thread throughout all these points of advice is the need for VR audio professionals to be adaptable, as the medium is constantly evolving and this calls for a high level of both logical and creative problem solving skills.
Conclusion
This study has illuminated key strategies and insights that contribute to a developing practice language for VR audio, with a specific focus on storytelling and musical integration. Through interviews with leading VR audio experts, we identified how these practitioners leverage the spatial and interactive nature of VR through the strategic use of sound and music, fostering deeper emotional engagement and a more compelling narrative experience. This research highlights the interdisciplinary nature of VR audio, showcasing how experts from diverse professional backgrounds contribute to its development. Whilst VR audio practice is informed by traditional approaches, the unique demands of VR necessitate new narrative techniques and vocabularies that traditional media cannot address, emphasizing the adaptability required in this field.
Analysis of the experts’ responses found that VR audio’s capacity for creating intimacy and emotional engagement through the egocentric perspective represents a significant advancement in storytelling approaches. Positioning audio sources to dynamically respond to the listeners changing perspective can enhance the immersive experience, while the challenge of placing voiceovers underscores the complexity of maintaining narrative coherence in a spatial environment. Further findings reveal that treating the sonic environment as an interactive character enriches user engagement and narrative depth, aligning with evolving human-computer interaction practices that focus on embodied and social experiences. Integrating music in VR presents challenges due to slow industry adoption, but it offers promising avenues for new ways to experience music from an embodied, interactive perspective.
Furthermore, expert advice underscores the importance of technical proficiency, understanding auditory perception, and embracing experimentation for success in the field of VR audio. Staying ahead of technological advancements, particularly in AI, is crucial. Adaptability and creativity are key for thriving in this evolving field. As VR continues to evolve, the development of a comprehensive practice language will be crucial for unlocking its full potential. This research contributes to that goal, offering actionable strategies and fostering interdisciplinary collaboration to guide the next generation of VR audio practitioners. In doing so, it lays the groundwork for a future where sound not only supports but actively shapes the narratives and emotional landscapes of virtual realities.
Footnotes
Acknowledgments
The authors acknowledge that OpenAIs LLM ‘Chat GPT 4o’ was used to assist in summarizing some of the authors original writing for this paper. It was used in line with Sage Publications guidelines on using LLMs, and was only ever used to improve the author’s original writing, never to generate its own writing or research.
Funding
This research was supported and funded by Dolby Australia and the Australian Government Research Training Program (RTP) Scholarship. The authors would also like to thank all the participants for their giving up their time and valuable insights for the study.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
