Abstract
Enhancing Audio Description is a research project that explores how sound design, first-person narration, and binaural audio could be utilised to provide accessible versions of films for visually impaired audiences, presenting an alternative to current audio description (AD) practices. This article explores such techniques in the context of the redesign of the short film ‘Pearl’, by discussing the creative process as well as evaluating the feedback supplied by visually impaired audiences. The research presented in this article demonstrates that the methods proposed were as successful as traditional AD in terms of providing information, enjoyment, and accessibility to audiences, demonstrating that both practices can coexist and, as a result, cater for the different stylistic preferences of end users.
Introduction
Audio description (AD) for film and television (TV) refers to a pre-recorded verbal commentary that describes visual elements, and sometimes sonic elements, for visually impaired audiences. Users of AD report an increase in engagement with the source material and they regard having access to information they would have otherwise missed as crucial to their audio–visual experiences (Lopez, Kearney & Hofstädter, 2018). It is such access to information that allows audiences to reconstruct the stories experienced (Remael, 2012). Furthermore, an increase in confidence and self-esteem has been associated with the use of AD services (Schmeidler & Kirchner, 2011). Moreover, research on the use of cinematic AD (i.e., AD that includes information related to camera angles and types of shot) demonstrated that the inclusion of such terminology increased the sense of presence in people with sight loss (Fryer & Freeman, 2013).
However, there are reported disadvantages of AD, including the fact that it masks elements in the soundtrack and the challenge of achieving a balance between AD and the original soundtrack that is satisfactory to audiences (Lopez et al., 2018; Remael, 2012). Furthermore, it is an accessibility measure that is outside the creative process involved in a film or TV production (Whitfield & Fels, 2013), which results in the describer providing their own interpretation of a particular piece rather than the artistic vision of the filmmaking team (Udo & Fels, 2010; Whitfield & Fels, 2013). AD users have expressed disappointment at finding out that the AD script does not have any input from filmmakers (Lopez et al., 2018). In addition to this, AD follows a ‘one-fits-all’ model as it is assumed that all visually impaired people will find the system useful and enjoyable and there is not an alternative accessibility method. Previous research by the authors has found that there is a need to personalise access services due to the diversity of needs and preferences among visually impaired people (Lopez et al., 2018). For example, the age of the AD user, the type of sight loss, whether it is congenital or acquired, and whether the audience member has background knowledge of the media have been shown to result in varying needs and expectations in relation to AD (Independent Television Commission [ITC], 2000). Such expectations may affect the amount of detail preferred in the descriptions, the type of content being described, and whether cinematic language is used (ITC, 2000). Similarly, the need for personalisation of access has been studied in relation to accessibility for audiences with hearing loss (Ward & Shirley, 2017).
This article explores the results of the Enhancing Audio Description project which was funded by the Arts and Humanities Research Council in the United Kingdom and ran from 2016 to 2018 (Lopez et al., 2016, 2018). The project focussed on exploring the creation of an alternative form of accessibility for visually impaired audiences, one which uses sound design techniques and binaural audio to reduce the number of verbal descriptions while also incorporating first-person narration to create an organic piece. Moreover, the project suggests the incorporation of accessibility to film and TV workflows with the aim of providing an audio track that is closer to the artist’s vision. This article starts by providing readers with further insights into a variety of projects that have sought to widen technical and aesthetic considerations in AD, as well as discussing other forms of accessibility for visually impaired audiences. The authors then proceed to describe and evaluate the design of an accessible soundtrack for the short film Pearl (Palumbo & Feng, 2015).
Varying perspectives on accessibility
Alternatives to traditional AD and integrated approaches
Traditional AD guidelines encourage objectivity and a neutral approach both in terms of word choice and delivery (Fryer, 2016, 2018; Rai et al., 2010; Snyder, 2007). However, as Fryer (2018) highlights, there has been a changing trend towards questioning objectivity and instead considering accessibility as part of the creative process, requiring the collaboration between creative teams and accessibility experts and bearing in mind that, ultimately, it is the user experience that matters. Similarly, Cavallo (2015) questions the so-called neutrality of AD by analysing how throughout the whole process of description aspects, such as what information is considered relevant, how that information is delivered, and what audiences should focus on, is tightly controlled. An interesting approach is the one analysed by Udo and Fels (2009) in relation to a theatrical production of Hamlet, in which maintaining the creative intentions of the director in the accessible version was considered of the utmost importance. In Udo and Fels’ (2009) case study, the director of the production and the audio describer worked closely to create an accessible version of the play for visually impaired audiences. The strategies included the description being presented in iambic pentameter, so as to better fit the original text; the use of descriptions to capture the meaning of the sets rather than just providing information on their physical characteristics; and introducing the describer as one of the characters – Horatio.
In the field of AD for film and TV, there have been a number of research projects and practical work exploring varying degrees of subjectivity. Szarkowska (2013) worked on a format called ‘auteur description’, which seeks to incorporate the unique marks of a director’s work to the description process using the script of the film, interviews with the director, and film reviews. Similarly, Walczak and Fryer (2017) investigated the effectiveness of Creative Audio Description (CRD) and demonstrated its impact on the emotional reception of a film and the sense of immersion. Furthermore, Naraine et al. (2018) studied the efficacy of an integrated approach to AD applied to eight episodes of a comedy show and report on its positive reception by AD users. The approach entailed incorporating accessibility from the start of the creative process as well as delivering it with emotion and intonation that fit the source material.
Examples of integrated accessibility sit within the field of universal design practices. The concept of universal design stems from the desire to create products and environments that are free from barriers, accessible to everyone (Story et al., 1998). It supports the consideration of accessibility as integral to a product or environment and, as a result, reflects on how the integration of accessibility to the design process results in more aesthetically pleasing outcomes. These outcomes are also cheaper and can be marketed to a significant percentage of the population (greater than the group that they might have been initially intended for).
In relation to AD, Udo and Fels (2010) have mentioned how, for example, AD can benefit sighted individuals who cannot concentrate on the visual aspects of audio–visual material due to their need to multitask. Similarly, Kleege (2018) discusses how the increase in availability of AD through its addition to on-demand video services, such as Netflix, could mean that, as more sighted people come across it and find a use for it, the more its social-cultural benefits will expand, resulting in a greater degree of acceptance and, hopefully, contributing to a more inclusive society.
Pablo Romero-Fresco’s (2013, 2019) work also champions notions of integration through the field of accessible filmmaking, which proposes a filmmaking workflow in which accessibility is integrated, instead of being an afterthought. Romero-Fresco (2019) explains that by introducing accessibility to the filmmaking process, the creative aims of the director and the filmmaking team can be maintained. An accessible filmmaking approach does not only increase the audience of the film in question but it also helps the creative team maintain control over their work (Romero-Fresco, 2019).
Accessibility and sound design
The fields of audio films and audio games consider sound design as an accessibility method that is integral to the creative process. In audio films, there are no visuals and all information is provided through sound, sound processing, and surround sound (Lopez & Pauletto, 2009a, 2009b, 2010; Lopez, 2015). Moreover, it is an experience that is designed to be accessible from the onset. Audio games may include visual elements, but sound is their main means of communicating the storyline and aiding gameplay (Drossos et al., 2015). The sonic elements in audio games often include narration, dialogue, real and abstract sound effects, diegetic and non-diegetic sounds, atmos tracks as well as sound processing, such as reverberation (Lopez, Kearney & Hofstädter, 2016, 2018). Moreover, developers of audio games have recognised the potential of 3D audio for accessibility purposes and they often use binaural audio to help the player locate objects and characters (Drossos et al., 2015). Binaural audio refers to the presentation of audio signals to the ears over headphones that carry the same spatial information as sound sources experienced in real life. The result is that the soundtrack is no longer localised in head, but instead is perceived to be in a 3D soundfield external to the headphone presentation. Recordings from binaural microphones or audio signal processing strategies can be used to generate the desired headphone signals (Kearney et al., 2009; Masterson et al., 2012).
Outside of the field of accessibility, audio professionals have expressed the importance of incorporating ideas on sound design from the pre-production stage of filmmaking so as to create a better integrated final product (Thom, 2011). Sharing the sound design concept for a film across different creative departments can help ensure cohesion and inform the choices made on production sound as well as the effects sought in post-production (Lee & Stringer, 2018). This concept can be easily extended to accessibility – the consideration of sound design techniques for visually impaired audiences (such as those proposed by the Enhancing Audio Description project, see section ‘Creating an accessible soundtrack’) from the pre-production stage can result in the creation of an alternative accessible soundtrack.
Personalisation
The consideration of different types of accessibility, ranging from objective to subjective as well as those based on sound design strategies, invites us to explore the notion of personalisation and how moving forward the accessibility sector might be able to make different forms of access available to audience members so as to allow them to choose what best fits their needs and preferences (Lopez et al., 2018). An example of the notion of personalisation can be found in the 2016 production Notes on Blindness by Peter Middleton and James Spinney, a documentary on the academic John Hull, who after losing his sight started keeping an audio diary to record his experiences. All dialogues in the film are either from those audio diaries or from interviews conducted with John and Marilyn (his wife) for the film. Notes on Blindness was released with three different accessible tracks: two different audio-described versions (one by Louise Fryer and the other by Stephen Mangan) and an enhanced option that uses additional interviews as well as supplementary sound design and music (Lopez et al., 2016; ‘Notes on Blindness’, n.d.). As visually impaired audiences have three choices of accessible soundtracks rather than just one, it is the provision of these alternatives that makes Notes on Blindness an example of the personalisation of accessibility.
Creating an accessible soundtrack
Three key methods were explored as part of the Enhancing Audio Description project. The first is the addition of sound effects to the original soundtrack to replace some verbal descriptions in AD. The second studied the potential of spatial audio to convey information in the visual channel, and the third was the use of first-person description. All these methods require the incorporation of accessibility to the creative process, transforming it into part of the creative workflow. Consultation on these methods with visually impaired film and TV audiences was key to the project, as being aware of the target audience’s needs and wishes is crucial to the success of accessibility methods (Fryer, 2016).
To brainstorm design ideas, apply them, and evaluate the results, the 20-min short film, Pearl, was chosen as a case study. The film was selected as its unconventional storyline presents a challenge for visually impaired audiences: Pearl follows Cecily, the secluded and controlling mother of Margaret, a young woman with a sickness caused by her strange ability to cough up pearls. When a doctor from the city finds out about them their life in the quiet, isolated house by the sea changes forever. As the modern world invades their lonely and controlled environment they can no longer remain in their fairy tale existence.
Eight focus groups with a total of 42 participants were organised. The age of the participants varied from 21 to 93 years. Furthermore, 48% of the participants had acquired sight loss and 52% had congenital sight loss. Moreover, 31% of the participants were blind, 48% blind with residual vision, and 21% partially sighted. Participants were recruited through charities, talking newspapers, and social media outlets. Participants were shown the film without any accessibility features and without providing any information on the plot, and the focus group leader encouraged them to discuss its content (see Table 1 for a scene-by-scene summary of Pearl). After this initial discussion, the participants were provided with information on the plot and invited to explore isolated extracts from the film and discuss how these could be made accessible. During the discussions, the focus group leader introduced notions to be investigated during the project and sought opinions on their potential interest. It is the findings of these focus groups together with the results from a previous survey (Lopez et al., 2018) that were used to initiate the creative process of accessibility applied to Pearl.
Scenes from the film Pearl.
Additional sound effects
Focus group participants felt that the addition of sound effects could help make storylines clearer and that there is no need to describe sound effects if they can be inferred. The sections that follow (‘Conveying actions’ to ‘Auditory logo’) focus on the use of sound effects for accessibility in a number of scenes from Pearl and are divided in relation to their main aim.
Conveying actions
Pearl’s opening sequence (Scene 1, see Figure 1) features Margaret, her mother (Cecily), and her carer (June) in Margaret’s bedroom. Margaret is unwell, but the audience does not know why, it is only towards the end of the sequence that we learn that Margaret coughs up pearls and that this is making her ill. To clarify the scene for visually impaired audiences, Margaret’s breathing was emphasised together with the addition of Foley elements, such as footsteps, the sound of the oxygen mask, and the bed springs.

Screenshots of Scene 1.
The scene that follows (Scene 2) takes place in a narrow corridor. The sound of water drops falling into buckets was added to make it clearer that these were present in the scene and also to indicate the lack of maintenance in the house and the fact that it was being overtaken by water. Furthermore, the action of Cecily pushing open the kitchen door and as a result walking into a different space was also emphasised by adding the corresponding sound effects and changing the room reverberation accordingly (see Figure 2).

Screenshots of Scenes 2 and 3.
Scene 7 (see Figure 3) was singled out as challenging by focus group volunteers, as in the original version, the sound of scrubbing was not accompanied by any other sonic elements. To address this issue, the sound of running water was added, establishing both the action and the room of the house by association.

Screenshot from Scene 7.
Scene 8 takes us to Margaret’s bedroom as she expresses her fascination for the ocean and her frustration at her health (see Figure 4). The scene was enhanced by adding the effect of Cecily stroking Margaret’s hair as well as the sound of Margaret trying to wipe off the blood from her hands. A drone effect was also used to enhance the tension as Margaret starts coughing again and there is a risk of another episode. Bed spring sounds are used to make up for the fact that audiences would not be able to see the characters sitting on the bed. An electric buzz accompanied the power cut in the scene to make the meaning more evident.

Screenshots from Scene 8.
The arrival of the doctor is followed by a conversation between him and Cecily (Scene 13). Additional Foley effects were added as they sit at the table so that the action is clearly represented. Furthermore, the addition of the sound of a grandfather clock works as a soundmark for the space (Schafer, 1994), indicating the space the characters are in and from then on recurring whenever the characters visit that room (see Figure 5).

Screenshot of Scene 13.
Establishing shots
The original soundtrack used the sound of a thunderstorm and waves to accompany the establishing shot shown in Figure 6, but this was deemed insufficient by participants at the focus groups, who found it challenging to interpret that the house was by the sea but deemed this to be crucial to the understanding of the film: Very relevant to the story the fact that they are in a rather isolated, very beautiful to look at, house by the beach near the sea but obviously isolated.

Establishing shot in Pearl.
To make this clearer, the sound of seagulls was added to create a more direct association with the seashore. Although the use of seagulls for seaside scenes might seem unoriginal, auditory clichés help the audience members enter the world of the film more easily (Sonneschein, 2001). Furthermore, as Kerins (2011) discusses, the use of spatialised soundscapes (in this case, through the use of binaural audio, see section ‘Spatial audio’) can remove the need for visual establishing shots. In the second establishing shot, at night time, a sound layer of crickets was added to indicate the difference in time of day. The transition from the establishing shot to the indoor scene (Scene 8) was smoothened by the sound of rain remaining audible through the open bedroom window and adding a filtering effect as the window is shut.
Hallucination scenes
Pearl deals on two separate occasions with Cecily’s hallucinations (Scenes 4 and 17). In the first instance, she imagines the kitchen being flooded (see Figure 7), while on the second occasion, she spies on her daughter and the doctor through the keyhole as the medical examination takes place but imagines that they are having an intimate encounter (see Figure 8). Focus groups reported their lack of awareness of those two scenes. With the aim of clarifying their meaning, it was decided to design a cacophony of voices repeating in whispers parts of the script and processed using reverb. The notion behind the design concept was that it would help understand the action while creating the idea of confusion. Inspiration for the creation of the cacophony came from Kaufmann’s 2015 film Anomalisa which uses a similar technique in its starting sequence, whereas reverberation is often used as a technique to signify altered states of mind (Gabriel, 2013).

Screenshots of hallucination, Scene 4.

Screenshots of hallucination, Scene 17.
Establishing the presence of characters
The original film has a montage sequence (Scene 20, Figure 9) that shows Margaret running towards the beach as Cecily and the doctor chase after her. In its original version, this scene is dominated by music and atmos. Our focus groups indicated that this scene was very disconcerting as there were no aural cues on actions or characters. With this aim in mind, this scene was redesigned. Footsteps on the beach surface were added as suggested in the focus groups and the atmos was enriched with the sound of seagulls. The addition of Margaret’s breathing also reinforces her presence. In addition to this, the cries of her mother, which were originally absent and only implied by the actress moving her lips, were recorded and added to the mix.

Screenshot of Scene 20.
Auditory logo
The most challenging aspect of the sound design of Pearl was to convey the fact that Margaret coughs up pearls, which is crucial to the understanding of the film. If this is not made clear, as was evidenced in the focus groups, it just seems like the story of a sick young woman and the more mystical side of the plot is lost. To fix this, a high-frequency sound accompanying the appearance of the pearls throughout the film was devised, which also accompanied the title of the film on screen. In the case of the title card, the sound is also accompanied by the whispered title. This followed the advice given by volunteers in focus groups: Making the sound of the pearl more obvious and maybe describe it the first time and then people would know what it was and there is no need to describe it again. That would be excellent. . .because it would cut down the talking and, of course, it’s presuming, if we are always told that this is the sound of, you know, something falling, it kind of takes away, it just presumes that we don’t know what on earth is going on and, you know. . .we can’t think for ourselves. . .
Spatial audio
The majority of volunteers in the focus groups commented that the use of spatial audio to replace or complement the verbal information given in AD would be advisable as it would add to the atmosphere of the film and as a result enhance the cinematic experience. Furthermore, one of the volunteers mentioned that the convention that determines that voices are panned to the centre in films is unsuitable for accessibility purposes, as it does not allow visually impaired audiences to determine the movement of the characters. However, one participant mentioned that too much spatial information could distract audiences from the film and those who gave more neutral feedback felt that sounds should only be spatialised when relevant to the storyline.
A major aspect of effective spatialisation of sound sources is an understanding of traditional broadcast and film mixing strategies for stereo and surround sound. As correctly discerned by the volunteers, in a standard 5.1 surround sound mix, it is typically the case that all dialogues and character-driven sound effects are relegated to the centre channel (Holman, 2008; Kerins, 2011; Rumsey, 2001). In a cinematic context, this ensures good auditory localisation at the screen and relies on the use of the ventriloquist effect for audiences to perceive the location of the sound at the position of the character. The ventriloquist effect occurs since the visual cue will dominate the perceived position of a synchronous audio–visual source (Hairston et al., 2003; Jack & Thurlow, 1973; Witkin et al., 1952). The spatial limit of this perception is dependent on the angle of presentation of the audio and visual sources (Hairston et al. 2003), but the aforementioned studies have shown it to be in the region of 20° to 38°. However, having all character-driven sounds coming from the same place can lead to confusion for visually impaired audiences which is then compounded by the fact that to hear the AD service, they must wear headphones and listen to a stereo downmix of the original soundtrack.
To establish the extent to which this is problematic, as well as to investigate the effectiveness of alternative spatialisation strategies, 26 visually impaired volunteers were invited to undertake spatial audio listening tests at the AudioLab of the Department of Electronic Engineering at the University of York. Listening tests were undertaken within the 50-channel spherical loudspeaker array housed at the laboratory which employs Ambisonic surround sound techniques (Gerzon, 1973). The first phase of the tests was a validation phase that ensured that binaural audio over headphones can give the participants equivalent spatial audio experience to that of loudspeakers through the use of personalised binaural filters for each participant. This involved a series of tests where subjects had to identify the location of sound sources relative to their head in both a real loudspeaker array and a corresponding virtual version presented over headphones. Subjects were also asked to assess the perception of simulated room reverberation over both headphones and loudspeakers. In both cases, the difference between the overall subject error rates was not statistically significant, demonstrating that good spatialisation equivalent to loudspeaker listening was perceived over headphones under the experimental conditions utilised.
The second phase of the tests consisted of a series of listening trials over headphones to assess the perceptual limitations of binaural hearing in the context of standard and Enhanced Audio Description (EAD) soundtracks. Subjects first undertook a spatial discrimination task. Here, they were presented with four speech recordings from different talkers that were panned using different sound design strategies as shown in Figure 10: mono – centre front (as per standard film and TV mixing), narrow (two sources at ±7° and two sources at ±15°) and wide (two sources at ±42° and two sources at ±90°). Subjects were asked to identify how many people were talking in the presented scene. The highest error rate was shown with mono at 44% with 32% and 22% for narrow and wide panned sources, respectively. This demonstrates that spatial panning of sources is critical to discriminating the number of characters in any given scene.

Panning strategies for talker discrimination test.
Another aspect addressed in these binauralisation tests was the application of reverberation in soundtrack design, in particular, to aid sound source positioning within a scene. This is especially relevant to the earlier focus group comments where visually impaired participants did not get any indication from the soundtrack as to what type of environment was being created with the set or where the characters were placed in the mise-en-scène, for example, I never got a feel for where the characters were. . .in the house. . .were they standing in corridors? Were they standing in a room? Were they moving from one room to the other? Were there any doors? I just didn’t get a sense of it. . .It seemed like if anything, it was like one of those big warehouse studio flats or something.
In normal listening conditions, as a talking person moves within a room, the level of their voice varies depending on how near or far they are to the listener relative to a constant diffuse room reverberation level. This is an important cue in discerning distance as well as environment (e.g., hall, bathroom, church, etc.). However, the standard practice within film and TV is to keep the dialogue level constant for intelligibility reasons, and if a reverberation effect is required, it is mixed into the constant level dialogue. Consequently, listening tests were undertaken to assess whether typical film and TV mix strategies for reverb were sufficient for conveying a sense of place and space, or whether more spatially accurate reverberation was required. To this end, subjects were presented a sample of speech (referred to here as the direct sound) over headphones where movement was simulated to five different positions within a room. Mono and surround (Ambisonic) reverberation was assessed. It was found that when spatially accurate reverberation is presented to the subject with correct direct to reverberant levels, subjects found it relatively easy to discern how many positions the character moved to in the scene. This became more difficult when the correct direct to reverberant speech level was presented in mono (error rate increased by 16%). Interestingly, standard film and TV practice that preserves intelligibility can still be used (i.e., constant speech level, with varying reverb level) but is most effective when the reverb is spatial (i.e., not mono).
Good examples of these findings in practice can be found in the corridor scene of Pearl (Scene 2). Here, as the nurse walks towards the camera, a distinct change in her footsteps’ level relative to a constant reverberation level can be heard as she approaches the screen. Dialogue is panned to create a more theatrical-style presentation, with the character Cecily panned to a 45° position to the left and the nurse settling to a 45° position to the right (see Figure 11). Footsteps and other character-driven sound effects are no longer centre-panned but instead are spatially distributed to aid discrimination of sources. Atmospheric effects, such as wind noise and water dripping, are placed spatially throughout the scene and all sources are rendered with accurate reverberation, which consists of a relatively narrow spatial field to reflect the claustrophobic nature of the corridor. This reverberation changes significantly when the characters move to the kitchen through the right-panned door. Here, it becomes more spacious, open (larger time delay between direct sound and reverberation), and slightly longer, allowing the listener to discern that the characters have clearly moved to a new and larger space. Again, source panning of dialogue and Foley for each character are undertaken to ensure that their spatial locations can be readily discerned.

Screenshot of Pearl, Scene 2.
I-voice
Despite the potential that sound design and spatial audio have in terms of creating accessible experiences, there are circumstances in which verbal descriptions are crucial, for example, when referring to colours, facial expressions, gestures, and the physical appearance of characters. For such moments, we have studied the use of first-person narration or, using terminology by Chion (1999), the I-voice. Such technique allows the incorporation of descriptions in the form of the inner thoughts of a character and the identification of audiences with the character in question. The I-voice is to be written by the scriptwriter with help of an accessibility expert, allowing for the creative team to be fully involved in the process. First-person narration has been previously trialled by Fels et al. (2006), who investigated its use for an episode of an animated series and demonstrated that it was embraced both by the creative team and by visually impaired audiences. It is worth noting that it is Fels et al’.s work that served as initial inspiration for this approach.
In the focus groups, most participants were in favour of how the first-person perspective could enhance AD and curious about the future outcomes of the research: . . .it would be like a poetic device. . .Like a radio play, inner monologues, I think that would be interesting. . .it might be more organic. . .a part of it, rather than a stranger’s voice cutting in.
In opposition to the above, some participants thought AD is best done in the conventional way, with an objective third-person narration. Some even worried that the first-person narration would add an extra level of complexity: For me, it might get a bit confusing. . .you are already trying to keep track of the plot and then you are trying to listen to them speak and then the AD on top and then another. . .
Furthermore, some expressed concern on how it might make the film seem ‘silly’: . . .cause sometimes you want just basic things doing as well, like setting the scene. . .the sun shining, you know. . .simple things, like that, is that really relevant to the first-person, doing that sort of thing? It might sound a bit stupid. We are in a street, we are walking down the street. One wouldn’t say, I mean, you’ve got to remember that if it’s in the first-person, it has to sound like what a first-person would say. . .whereas you can have a kind of level of clinical-ness about a third-person.
Expansions on the idea of the I-voice were also suggested: I think. . .you almost have to have three versions because. . .you chop and change in whose point of view you are listening from. . .because otherwise you will have a very biased. . .like character-based viewpoint of the film. . .
The I-voice for Pearl was written by professional screenwriter Lisa Holdsworth. Although the story is mostly told through Cecily’s perspective, the I-voice was written from Margaret’s perspective as she is the main character and the one the focus group members preferred following. Figure 12 demonstrates the use of the I-voice in the opening sequence.

I-voice extracts Scene 1.
Feedback analysis
To analyse the effectiveness of the design techniques described above, 50 volunteers were assigned to one of three groups. One of the groups would watch the film with its original soundtrack, without any accessibility features. The second would listen to it with a traditional AD track created by Sensor Media. The third one would listen to the Enhanced AD version (EAD). All evaluation sessions were conducted on a one-to-one basis using headphones. Regardless of the version assigned to them, all volunteers were asked the same questions and they were not told what version they would watch. Volunteers included 50% of blind participants, 28% who were blind with residual vision, and 22% partially sighted. Furthermore, 60% had congenital sight loss and 40% acquired. The age spread of the participants can be seen in Table 2. All responses presented in this section were analysed using a combination of descriptive statistics together with binomial, chi-square, one-way ANOVA tests, and post hoc Tukey tests. Results were considered significant in relation to a p-value of < .05, when values or differences are noted as significant they are in relation to this value.
Age spread of the participants.
Plot comprehension
To assess the clarity of the plot, we asked participants to summarise it and then compared responses to the key plot elements in the story as described in Table 3. Statistical tests conducted were a combination of chi-square (df = 2) and binomial, with significance set at p < .05.
Key plot elements in Pearl as identified by the research and creative teams.
The elements in bold and with an asterisk (*) correspond to the only key plot elements that presented statistically significant differences in responses depending on the type of soundtrack listened to. The recognition of the remaining plot elements did not present significant differences depending on the soundtrack listened to.
The analysis of Plot Elements 2 (Margaret produces pearls within her body), 9 (Margaret escapes towards the sea and has a seizure), and 10 (Margaret is taken away to the city/hospital) returned similar results. Both accessible versions allowed for a higher level of plot comprehension, when compared to the original version, and there were no significant differences between the accessible versions, both performing equally in terms of recognition. Furthermore, participants were asked about the ending of the film to corroborate the answers given regarding Plot Element 10. The analysis corroborated the result by indicating that a greater comprehension had been achieved for the accessible versions compared to the original one, and no significant differences were found between accessible versions.
The results for Plot Element 9 (Margaret escapes towards the sea and has a seizure) seem to indicate that the combination of design strategies applied to Scene 20 for the EAD version was particularly effective. This scene, as indicated during focus groups, required the most changes to become accessible. The addition of vocal sounds and footsteps helped establish the presence of characters, and the additional layering of environmental sounds established the space. In addition to this, the I-voice helped clarify elements on the narrative and the description of gestures was replaced by the conveyance of feelings through Margaret’s own voice. The lines of I-voice used in Scene 20 can be seen in Table 4.
Lines of I-voice present in Scene 20.
In relation to Plot Element 7 (The mother agrees to the doctor examining Margaret), significant differences between the original and AD version were found, but no significant differences were found between the original and EAD, and no significant differences between AD and EAD, meaning that it was the AD version that allowed for a greater recognition of this section of the plot (see Figure 13). However, it is worth noting that although the difference between the original and the EAD versions is not statistically significant, as Figure 13 shows, there is an improvement in recognition in the latter version when compared to the non-accessible one.

Comparison between the recognition (Yes) and lack (No) of Plot Element 7 depending on the type of soundtrack listened to.
Plot Element 8 (Cecily imagines something happening between Margaret and the doctor) was only recognised for the AD version of the film, with 18% of the participants mentioning it in the interviews. It was also interesting to note that Plot Element 4, which also pertains to Cecily’s hallucinations, presented no significant differences in recognition among the three different soundtracks. This is surprising as both scenes were redesigned and considerable changes were made, but it seems that this did not have an impact on the audiences, meaning that the approach was not as clear as the design team believed it would be and new approaches for such abstract scenes need to be considered. One of the participants who listened to the EAD version commented: When at first the doctor was in the room with Margaret and the mother was peeping through the keyhole and she could hear the voices, it was confusing until Margaret says she was imagining the worse, the voices, you could understand some of the lines that were coming over and over, but it was a guy’s voice, so it was a bit confusing there.
It is of interest for the field of sound design to reflect on why, with the lack of access to visuals, a technique often used to indicate hallucinatory states (cacophony of voices, reverberation) failed to deliver the desired effect. Such reflections can be linked to theories on modes of listening, that is, how the same sound or combination of sounds can be listened to by different people with different focuses and, as a result, what each person gains from listening to that specific sound can differ from one listener to the other (Chion, 1994; Tuuri & Eerola, 2012; Tuuri et al., 2007). For the scenes in question, the sound designers utilised a combination of the concepts of semantic, connotative, and empathetic listening (Chion, 1994; Tuuri & Eerola, 2012; Tuuri et al., 2007). Semantic listening refers to the act of interpreting sounds in relation to a pre-existing socio-cultural code; in this case, the use of similar techniques in several pre-existing films. Connotative listening relates to the act of creating associations based on cultural experiences, whereas empathetic listening is linked to the act of listening with the aim of understanding someone’s state of mind. However, despite of the sound designer’s intentions, it seems that most listeners favoured a critical mode of listening, in which they questioned the appropriateness of whispered voices that echoed past parts of the script, deeming the design of the sequence inappropriate and, as a result, unsuccessful. Such reflection invites designers to reconsider the approach to abstract sequences that are successful in audio–visual media when considering experiences for visually impaired audiences. Furthermore, it is also possible that the methods utilised were not part of the set of filmic conventions participants were familiar with; hence, the association the designers were expecting was not achieved.
In addition to the analysis of the individual plot points, an analysis of participants’ confidence in the understanding of the plot was also conducted across the three types of soundtracks (see Figure 14). A one-way ANOVA test demonstrated a significant difference among the versions (F = 3.860; p = .028), and a further post hoc Tukey test showed that the only significant difference was between the EAD version in comparison to the original version, demonstrating an increase in confidence levels with the former. It is worth pointing out that, although not statistically significant (p = .063), the AD version presents higher scores than the original version. Furthermore, there are no significant differences between the two accessible versions.

The ranking of the confidence on the understanding of the plot depending on the soundtrack listened to.
Location and time setting
The film is set in an isolated house on the beach, a location that was recognised by 88% of the participants who watched the original, 94% of the participants who listened to the AD version, and 100% of those who watched the EAD soundtrack, demonstrating that the techniques used in the latter contribute to determining the location. The film is set in modern times, but it takes place in a Victorian style house. From those volunteers who watched the original version, 38% identified the period as fairly modern, while also 19% identified Victorian elements. A greater awareness of the time period was gained with the accessible versions. For the AD version, 47% of volunteers identified the period as fairly modern and 24% identified Victorian/period drama elements; whereas for the EAD version, 47% of volunteers identified the period as fairly modern and 29% identified Victorian/period drama elements.
Character recognition
Pearl has four characters: Margaret, Cecily, June, and the doctor. A chi-square test determined that there were no statistically significant differences among the versions; that is, they did not have an impact on character recognition, chi-square = .318; df = 2; p = .853 (see Table 5), and the ways in which characters were recognised were not different among the versions (see Table 6). Unsurprisingly, voice/dialogue was the most valuable cue as to the different characters. It is interesting to point out that although listening tests as part of the project indicated the relevance of spatial positioning in determining the number of characters (see section ‘Spatial audio’), only two participants mentioned this as an aiding factor, meaning that it is possible that when faced with more creative contexts (than that of a scientific listening test), the narrative elements, such as the dialogue, take over and become much more important than spatial positioning. It is also possible that audience members are not aware that spatialisation is aiding character distinction as the technique is integrated to the aesthetics of the film and becomes a natural component instead of a gimmick that is drawn attention to.
Percentage of the participants who recognised all the characters.
AD: audio description; EAD: enhanced AD version.
Different strategies used by participants to recognise the different characters.
AD: audio description; EAD: enhanced AD version.
Recognition of physical spaces within the film world
The study of whether the recognition of spaces within the film was affected by the type of soundtrack was conducted through a combination of chi-square and binomial tests, with significance set at p > .05. Results demonstrated that the type of soundtrack impacted on the recognition of the corridor/hallway, the kitchen, the beach, and the car/city. The identification of the hallway/corridor was significantly higher for the AD version than for the original and EAD versions. This result is likely due to the greater emphasis on mentioning this space by the AD version, in which we hear, ‘Cecily walks down the corridor’. In the EAD version, the I-voice only mentions the space in a line that says, ‘In the hall, mother takes a moment to despair’. The latter provides lesser emphasis on the space and focuses on the audience’s understanding of Cecily’s emotions.
The results on this point can be related to Chion’s (1999) concept of vococentrism, which refers to a hierarchy of sounds in film soundtracks in which the voice takes precendence over all other types of sounds. Thom (2011) also recognises this effect when analysing the work of the sound designer in film. The notion of vococentrism means that even though the EAD version had sound effects and reverberation cues that may have helped audiences recognise the space, the verbal commentary in AD was something they were more drawn to.
When it comes to the recognition of the kitchen, the number of times it is recognised is significantly higher for the AD version when compared to the original version. There are no significant differences between the original and the EAD versions or between the AD and the EAD versions, meaning that they were both as successful as the other in conveying the kitchen space.
In both the cases of the beach and the car/city spaces, there is a significant increase in the recognition of the beach in the AD and EAD versions when compared to the non-accessible one but no significant differences between the two of them, indicating that they are both fulfilling their aim successfully even when using different methods to do so.
Volunteers indicated a number of factors as aiding the recognition of spaces within the film world (see Table 7). We can see that for the original and EAD versions, it was the sound effects that helped the most, whereas in the case of the AD version, it was the description that had the most impact. It is interesting to point out that in the absence of a traditional accessibility format, individuals seem to have turned to other elements to help identification; for example, sound placement, footsteps, and the sound of doors.
Cues used for the recognition of different spaces within the film world.
AD: audio description; EAD: enhanced AD version. Percentages in bold indicate the highest value for each type of soundtrack.
Recognition of sound effects
AD may sometimes describe sound effects that are deemed difficult to interpret without the accompanying visual elements, a trait that a number of focus group participants have noted as patronising and that the EAD version acknowledges.
When participants were asked whether they had difficulties recognising sound effects in Pearl, there were no significant differences between the different types of soundtrack, meaning that the addition of AD did not necessarily make the sound effects clearer. Furthermore, the number of people who found they had no difficulty interpreting the sound effects was equal to the number of people who said they had struggled, demonstrating that experiences and opinions were divided (see Figure 15).

Comparison between the difficulty or otherwise of recognising sound sources for sound effects depending on the soundtrack listened to.
When referring to the original version of the soundtrack, volunteers mentioned that it was at points confusing what was music and what was atmos. The opening of the film as well as the fact that she coughs up pearls were also considered to be unclear.
Comments on sounds effects for those who listened to the AD version also included the confusion between music and atmos: There was some roaring sound at the start of the film I wasn’t sure what it was, unless it was supposed to be the wind.
The lack of clarity in relation to the coughing up of pearls was also mentioned in relation to the AD version: When Margaret was coughing it wasn’t clear I guess that she was coughing up blood or anything in particular, she just seemed to be coughing and struggling to breathe.
Such elements were not mentioned as challenging in the EAD version, but one of the participants did comment on sounds seeming ‘unrealistic’, which might have been due to the fact that sound effects were over emphasised to make up for the reduction of verbal descriptions.
Spatialisation
When asked whether they were able to recognise the physical position of the different sound elements, results showed that there is an evident increase in the recognition of the spatialisation with the EAD version, but the difference is only significant when comparing the AD and the EAD versions (see Figure 16). The increase in recognition when the EAD version is compared to the original one is not statistically significant.

Comparison of ease in identifying spatial position of sound sources depending on the soundtrack listened to.
Most participants were positive regarding the use of binaural audio mentioning that it allowed them to know where the characters and objects in the rooms were as well as keeping track of movement. As a result, there was no need for AD to cover this information. It was also mentioned that the spatialisation made the plot more enjoyable: I think that the whole kind of richness of the audio content just made it a much more enjoyable experience than I think it would have been if I had just been listening to a radio drama in mono or something or a film with an AD track. I got much more atmosphere out of the whole thing, more feeling of place.
However, not all comments on spatialisation were positive: I didn’t always find that convincing. Sometimes I felt like I was directly in between two characters that were having a conversation and I didn’t feel like I should be. I heard voices coming from the left, like June, it was a bit strange, at some time I thought a real person to be to my left, so I looked to see if there was someone there. It didn’t bother me, I’d be a bit distracted if it was at home.
The latter comments might be related to the lack of familiarity with binaural experiences which, as Collins and Dockwray (2018) point out, is due to the fact that stereo is currently the predominant listening format and audiences may need time to adjust to a new auditory experience. The impact of lack of familiarisation on preferences was also found in experiments on stereo versus binaural in relation to popular music listening (Fontana et al., 2007), indicating that the creation of more binaural pieces and their wider availability might help break the barriers between technology and audiences.
Engagement and accessibility
An analysis of the engagement, accessibility, and adequacy of the accessibility levels among the different versions (Original, AD, and EAD) was also conducted (see Figures 17 to 19). A statistical study using a series of one-way ANOVA tests was carried out to compare the scores on engagement, accessibility, and adequacy of accessibility levels among the different types of soundtracks. We found that there were significant differences for all scores (engagement [F = 12.75; p < .01]; accessibility [F = 36.43; p < .01]; adequacy [F = 55.86; p < .01]). Post hoc Tukey tests were also carried out, which demonstrated in all cases that the statistically significant differences were between the original version and the accessible ones (p < .01 for the three tests) but not between the accessible versions (engagement p = .850; accessibility p = .565; adequacy p = .908). The lack of significant differences between the accessible versions demonstrates that they both perform as well as each other in those categories. The fact that the EAD version is performing as well as the AD version is considered a huge success within this project as familiarity with a system is key to relaxation and enjoyment (Kallio et al., 2011) and the fact that the visually impaired audiences who experienced this new form of accessibility found it as engaging, accessible, and adequate in terms of accessibility as traditional AD suggests that further familiarity with the system through wide dissemination would result in even higher scores.

The ranking of the level of engagement depending on the soundtrack listened to.

The ranking of the accessibility depending on the soundtrack listened to.

The ranking of the level of adequacy of the accessibility measures depending on the soundtrack listened to.
Discussion and conclusion
This article explored the process of applying new accessibility techniques to a short film, Pearl, in consultation with visually impaired volunteers. Sound effects were added to the original soundtrack to convey actions, elicit the presence of establishing shots, convey abstract scenes as well as indicate the presence of characters, time, and place. Furthermore, binaural audio was used to establish the position of objects, characters, and their movement as well as rendering accurate representations of the physical spaces within the film diegesis. Both sound effects and spatial audio aided in the reduction of verbal descriptions. For conveying elements, such as feelings, gestures, colours as well as actions linked to the fantasy genre, the I-voice (first-person narration) was applied.
When presented with the EAD version of Pearl, visually impaired volunteers were asked to leave their comfort zone and explore a form of integrated accessibility based on sound design techniques, which was unfamiliar to them. Familiarity with accessibility techniques is crucial to their acceptance, and the interviews conducted demonstrate that EAD is performing as well as AD in terms of engagement, accessibility, and the adequacy of access methods. This is extremely positive and points towards the potential for this method to be rolled out in future productions.
Furthermore, the use of I-voice was welcomed by volunteers: . . .the audio description was brilliant. . .it is the daughter doing the AD. . .At first, very briefly, I was a bit surprised but, almost immediately, I just accepted it as I would a narrator in a radio play. It was very clever how the sound and positioning of the voices made it easy to tell whether she was describing or simply talking. I settled in to it being a very natural way to do things – particularly for this film. I found like I was in the midst of the action too. Margaret telling the story worked for me. . .It’s a good idea to do it like that, it’s one way to do it. I enjoyed it, I didn’t think I would at the beginning, but I did. Liked that it was from her point of view, less detached than regular AD, felt more in tune with the character. In AD they describe the facial expressions, but the feelings of the character are more important than the expressions.
Spatialisation was overall well received, with volunteers commenting on its use to identify the positions of objects, characters, and movement, as well as mentioning that it increased enjoyment and enriched the filmic experience. However, issues of lack of familiarity with binaural audio were noted as a barrier towards the greater acceptance of the format. A problem that can be solved through the wider availability of binaural examples.
In terms of sound design strategies, results in relation to the abstract scenes caution against taking for granted that the success of design methods in audio–visual media for sighted audiences will be equally successful for visually impaired audiences. A further note of caution is in relation to the vococentrism experienced by participants, that is, the precedence of the voice over other sounds, which resulted in participants focusing more on what was being said than what was being denoted through other sounds and, as a result, favouring the traditional AD method. However, when the AD narration was not present, volunteers mentioned relying on other elements, such as sound effects and audio processing, to engage with the story.
A particular telling account on the effectiveness of the enhanced version was given by one of the volunteers who summarised his experience to the research team: It validated the fact that my experience as a blind person was just as valid as the one of a sighted person. I’m not just a poor blind person that needs to be told what I was missing. Only if things were like that all the time. . .I have had sight so, I love AD, but sometimes it reminds you of what you are not seeing, what everybody else is seeing and you are not. I felt in this experience it didn’t matter what colour dress someone was wearing or other, I was getting a full sensory experience. Everything should be like this. . .I wasn’t expecting to be that much blown away. . .I actually felt that it was entertaining me as an integral part of the artistic experience and not just like this poor man, we better do something to give him something, he’s missing the lovely colours and all that. Someone has bothered to think that you don’t always need to see.
By demonstrating that the EAD version of Pearl was as successful as traditional AD in providing information, enjoyment and accessibility to visually impaired audiences we have demonstrated that an equally successful alternative to inclusion is possible.
The creation of alternatives to AD, which can be offered in parallel, would help acknowledge the diversity of needs and preferences among visually impaired film and TV audiences. Such diversity may be linked to aspects, such as age, type of sight loss, and whether it is congenital or acquired. However, personal preferences are also likely to play a major part in accessibility choices and previous research by the authors indicated that 78% of the visually impaired people surveyed believed that accessibility should acknowledge diversity; while it also demonstrated that there was scepticism as to whether this was even possible (Lopez et al., 2018).
This article lays the foundations for a new system for accessibility to film and TV that sits within the field of accessible filmmaking and paves the way towards new practices while also inviting the creative sector to embrace accessibility.
Footnotes
Acknowledgements
The authors specially thank all the volunteers that participated in the screenings, focus groups and interviews. This project would not have been possible without their input. The research team would also like to thank RNIB, Beacon, Cam Sight, MySight York and My Sight Nottingham for their help in disseminating the research and providing access to facilities. Special thanks to Rebecca King for contributing to the first-person narration and to Julie Neubert for recording additional lines for the project. The Enhancing Audio Description project has worked on establishing links with the television and film industries as well as the accessibility sectors thanks to an advisory panel including Howard Bargroff (Sonorous), Jerry Gilbert (Cam Sight), Lisa Holdsworth (Screenwriter), Elfed Howells (formerly at Dolby), Sonali Rai (RNIB), Jez Watts (Sensor Media), John Whiston (ITV), and Warren Wilson (RNIB). The data collection processes conducted as part of this project were approved by the Ethics Committees at Anglia Ruskin University and the Department of Theatre, Film, Television and Interactive Media, as well as the Department of Electronic Engineering at the University of York.
Funding
This project was funded by the Arts and Humanities Research Council (AH/N003713/1) and was a collaboration between the University of York and Anglia Ruskin University.
